Instance Temperature Knowledge Distillation

Abstract

Knowledge distillation (KD) enhances the performance of a student network by allowing it to learn the knowledge transferred from a teacher network incrementally. Existing methods dynamically adjust the temperature to enable the student network to adapt to the varying learning difficulties at different learning stages of KD. KD is a continuous process, but when adjusting the temperature, these methods consider only the immediate benefits of the operation in the current learning phase and fail to take into account its future returns. To address this issue, we formulate the adjustment of temperature as a sequential decision-making task and propose a method based on reinforcement learning, termed RLKD. Importantly, we design a novel state representation to enable the agent to make more informed action (i.e. instance temperature adjustment). To handle the problem of delayed rewards in our method due to the KD setting, we explore an instance reward calibration approach. In addition,we devise an efficient exploration strategy that enables the agent to learn valuable instance temperature adjustment policy more efficiently. Our framework can serve as a plug-and-play technique to be inserted into various KD methods easily, and we validate its effectiveness on both image classification and object detection tasks.

Overview of our RLKD method.

We propose RLKD, a novel approach for knowledge distillation (KD) that leverages reinforcement learning (RL) to dynamically adjust the instance temperature. To guide the RL agent, a state representation incorporating performance and uncertainty features is designed. However, applying RL to KD introduces a delayed reward problem due to batch training. To mitigate this, an instance reward calibration method based on reward decomposition is introduced. Additionally, an efficient exploration strategy is employed to accelerate the agent's learning in the early stages by focusing on high-quality training instances.

Solid lines represent the processing flow of the training instances in our framework, and dashed lines indicate the backpropagation process used for model (student model and agent) updates

Results of our RLKD method.

Student network top-1 accuracy on CIFAR-100. Testing the performance of Vanilla KD as well as Vanilla KD with the incorpora- tion of instance temperature adjustment using CTKD and our RLKD, respectively

Student network Top-1 accuracy on CIFAR-100 dataset

BibTeX


    @misc{zhang2024instancetemperatureknowledgedistillation,
          title={Instance Temperature Knowledge Distillation}, 
          author={Zhengbo Zhang and Yuxi Zhou and Jia Gong and Jun Liu and Zhigang Tu},
          year={2024},
          eprint={2407.00115},
          archivePrefix={arXiv},
          primaryClass={cs.LG},
          url={https://arxiv.org/abs/2407.00115}, 
    }