papers_we_read

Rethinking the Knowledge Distillation From the Perspective of Model Calibration

Albert Gu, Karan Goel, Christopher Ré, ICLR 2022

Summary

This paper builds up on the idea that more accurate teachers are not better teachers due to a mismatch in abilities. They go on to analyze these ideas through experiments on toy datasets. They confirm this observation and propose a simple calibration technique to address this. Results follow their hypothesis.

Contributions

Method

\(\begin{aligned} \mathcal{L} &=-\sum_{i=1}^{n} \log \left(\hat{\pi}\left(y_{i} \mid \mathbf{x}_{i}\right)\right) \\ \hat{q}_{i} &=\max _{k} \sigma_{\mathrm{SM}}\left(\mathbf{z}_{i} / T\right)^{(k)} \end{aligned}\)

Results

Two Cents

Resources