We propose RLKD, a novel approach for knowledge distillation (KD) that leverages reinforcement learning (RL) to dynamically adjust the instance temperature. To guide the RL agent, a state representation incorporating performance and uncertainty features is designed. However, applying RL to KD introduces a delayed reward problem due to batch training. To mitigate this, an instance reward calibration method based on reward decomposition is introduced. Additionally, an efficient exploration strategy is employed to accelerate the agent's learning in the early stages by focusing on high-quality training instances.
Solid lines represent the processing flow of the training instances in our framework, and dashed lines indicate the backpropagation process used for model (student model and agent) updates
Student network top-1 accuracy on CIFAR-100. Testing the performance of Vanilla KD as well as Vanilla KD with the incorpora- tion of instance temperature adjustment using CTKD and our RLKD, respectively
Student network Top-1 accuracy on CIFAR-100 dataset
@misc{zhang2024instancetemperatureknowledgedistillation,
title={Instance Temperature Knowledge Distillation},
author={Zhengbo Zhang and Yuxi Zhou and Jia Gong and Jun Liu and Zhigang Tu},
year={2024},
eprint={2407.00115},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2407.00115},
}