This framework aims to develop robust control policies for physical robot platforms by bridging the gap between simulation and real-world deployment. During the learning phase, a teacher policy \( \pi_t \) is trained to track objective goals and reference motions within a physics engine, leveraging domain randomization (DR) to achieve robustness against dynamic variations. The learned privileged knowledge is then transferred via distillation to a student policy \( \pi_s \), which operates without access to ground-truth simulation data. The student policy is subsequently deployed on real hardware, generating control actions \( a_t \) based solely on estimated states derived from onboard sensors such as IMUs and encoders.