Curriculum Learning
Definition
Curriculum Learning (CL) is a machine learning training strategy whose core idea is to have models learn from simple samples progressively transitioning to complex samples, mimicking the "step-by-step" process in human education[1]. In reinforcement learning (RL), CL helps agents achieve stable convergence in complex high-dimensional continuous control problems by designing a series of progressively more difficult task curricula.
Curriculum learning is particularly important in high-difficulty, long-horizon cislunar low-thrust trajectory optimization: direct training on the final difficulty level often fails to converge due to sparse rewards and chaotic dynamics, while curriculum learning reduces initial task difficulty to allow the agent to gradually build understanding of the problem.
Application in A2PPO
Ul Haq et al. (2026) applied curriculum learning to the A2PPO framework for cislunar low-thrust trajectory optimization[2], implementing curriculum design through progressive tightening of success thresholds:
Curriculum Structure
Define curriculum , where:
- : Incremental global training step thresholds
- : Corresponding terminal position and velocity tolerances
Threshold Progression
| Phase | Global Steps | Position Tolerance | Velocity Tolerance |
|---|---|---|---|
| Initial | 0 | ||
| Transition | |||
| Final |
The agent first learns to reach the vicinity of the target orbit under relaxed tolerances, then progressively transitions to precise orbit insertion.
Curriculum Scheduling
At each environment step, the current curriculum stage is determined by the global training step count :
The environment's success thresholds are then set to the corresponding .
Why Curriculum Learning Works
- Avoids sparse reward traps: In chaotic dynamics, the sparse reward for precise terminal arrival is nearly unobtainable in early exploration phases; relaxed thresholds allow the agent to frequently receive positive rewards
- Stabilizes gradient estimation: "Approximately correct" trajectories from early curriculum stages help the value function estimate accurately, reducing high variance in policy updates
- Avoids local optima: Starting from simple tasks allows the agent to explore a larger state space, providing good initialization when thresholds are later tightened
- Curriculum transfer: Control policies learned on simple tasks typically have positive transfer effects on similar complex tasks
Convergence Curve Characteristics
Curriculum learning training curves exhibit a characteristic "staircase" pattern: each time thresholds are tightened, terminal error and rewards temporarily drop (because the task suddenly becomes harder), after which the agent adapts and recovers stability. This phenomenon is observed across all four A2PPO scenarios (S1-S4).
Related Concepts
- A2PPO (Attention-Augmented PPO): The framework applying curriculum learning
- [Low-Thrust Transfer MDP]((/en/glossary/lt-transfer-mdp/): The RL problem formulation that curriculum learning serves
- Generalized Advantage Estimation (GAE): Advantage estimation method used with curriculum learning
References
- [1] Bengio Y, Louradour J, Collobert R, et al. Curriculum learning[C]. International Conference on Machine Learning, 2009.
- [2] Ul Haq I U, Dai H, Du C. Autonomous low-thrust trajectory optimization in cislunar space via attention-augmented reinforcement learning[J]. Aerospace Science and Technology, 2026.
