Counterclockwise block-by-block knowledge distillation for neural network compression
Counterclockwise block-by-block knowledge distillation for neural network compression
Blog Article
Abstract Model compression is a technique for transforming large neural network models into smaller ones.Knowledge distillation (KD) is a crucial model compression technique that involves transferring knowledge from a large teacher model to a lightweight student model.Existing knowledge distillation methods typically facilitate the knowledge transfer from teacher to student models in one or two stages.This paper introduces a novel approach called click here counterclockwise block-wise knowledge distillation (CBKD) to optimize the knowledge distillation process.
The core idea of CBKD aims to mitigate the generation gap between teacher here and student models, facilitating the transmission of intermediate-layer knowledge from the teacher model.It divides both teacher and student models into multiple sub-network blocks, and in each stage of knowledge distillation, only the knowledge from one teacher sub-block is transferred to the corresponding position of a student sub-block.Additionally, in the CBKD process, deeper teacher sub-network blocks are assigned higher compression rates.Extensive experiments on tiny-imagenet-200 and CIFAR-10 demonstrate that the proposed CBKD method can enhance the distillation performance of various mainstream knowledge distillation approaches.