Deep Deterministic Policy Gradient (DDPG)
Author: Tianjiang Says
Contributing institutions: School of Astronautics, Harbin Institute of Technology; National Key Laboratory of Rapid Design and Intelligent Swarm for Micro/Nano Spacecraft
References: Guan Yutong et al. Hyperparameter Auto-Tuning and Homotopy Methods for Spacecraft Long-Range Cooperative Rendezvous, Spacecraft Environment Engineering, 2026.
Definition
Deep Deterministic Policy Gradient (DDPG) is a deep reinforcement learning algorithm that combines the Actor-Critic framework with an experience replay mechanism, proposed by Lillicrap et al. in 2015. DDPG is suited for reinforcement learning tasks with continuous action spaces and is capable of learning deterministic policies. It has been widely applied in fields such as robotic control and spacecraft trajectory optimization.
Algorithm Architecture
DDPG employs a dual-network Actor-Critic structure:
- Actor network : Given a state , outputs a deterministic action
- Critic network : Evaluates the value of a state-action pair
- Target-Actor network \mu'(s|\theta^{\mu'): Stabilizes training
- Target-Critic network Q'(s,a|\theta^{Q'): Stabilizes training
Core Formulas
Loss function of the Critic network:
Gradient of the Actor network:
Application in Trajectory Optimization
In spacecraft cooperative rendezvous problems, DDPG is used for hyperparameter auto-tuning:
- State design: Stagnation time, duration, iteration progress, particle distribution dispersion, particle distribution direction
- Action output: HCPSO hyperparameters such as inertia weight and acceleration factors
- Reward function: Designed based on the difference between global best fitness and current fitness
Application by Zhao Han et al. (2026)
Zhao Han et al. combined DDPG with Hybrid Cluster Particle Swarm Optimization (HCPSO) to form the Reinforcement Learning Enhanced Particle Swarm Optimization (RLEPSO), which is used for:
- Initial costate optimization in cooperative rendezvous fuel-optimal problems
- Autonomous dynamic tuning of hyperparameters based on particle search conditions
- Improving the searchability and convergence speed of the optimization algorithm
Related Concepts
- Particle Swarm Optimization (PSO)
- Hybrid Cluster Particle Swarm Optimization (HCPSO)
- Reinforcement Learning Enhanced Particle Swarm Optimization (RLEPSO)
- Homotopy Method
References
- Lillicrap T P, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509.02971, 2015.
- Guan Yutong, Gao Changsheng, Hu Yudong, Zhao Han. Hyperparameter Auto-Tuning and Homotopy Methods for Spacecraft Long-Range Cooperative Rendezvous[J]. Spacecraft Environment Engineering, 2026. [in Chinese]
