Cislunar Space Beginner's GuideCislunar Space Beginner's Guide
Satellite Orbit Simulation
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
Satellite Orbit Simulation
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
  • Site map

    • Home (overview)
    • Intro · what is cislunar space
    • Orbits · spacecraft trajectories
    • Frontiers · directions & labs
    • Glossary · terms & definitions
    • Tools · data & code
    • News · space industry archive
    • Topic · blue-team research
  • Cislunar glossary (terms & definitions)

    • Cislunar Space Glossary
    • Dynamics models

      • Circular Restricted Three-Body Problem (CR3BP)
      • CR3BP with Low-Thrust (CR3BP-LT)
      • A2PPO (Attention-Augmented Proximal Policy Optimization)
      • Curriculum Learning
      • Low-Thrust Transfer MDP Formulation
      • Generalized Advantage Estimation (GAE)
      • Direct Collocation
      • Birkhoff-Gustavson Normal Form
      • Central Manifold
      • Action-Angle Variables
      • Poincaré Section
      • Clohessy-Wiltshire (CW) Equation
      • Patched Method (拼接法)
      • Continuation (延拓)
      • Differential Correction (微分修正)
      • Poincaré Map (庞加莱图)
      • Impulsive Maneuver (脉冲机动)
      • Zero-Velocity Surface
      • Hill Three-Body Problem
      • Bicircular Four-Body Problem
      • Quasi-Bicircular Four-Body Problem
      • Strobe Map
      • Stability Set
      • Backward Stability Set
      • Capture Set
      • /en/glossary/dynamics/batch-deployment.html
      • /en/glossary/dynamics/state-dependent-tsp.html
      • /en/glossary/dynamics/q-law.html
      • /en/glossary/dynamics/mass-discontinuity.html
      • /en/glossary/dynamics/equinoctial-elements.html
      • /en/glossary/dynamics/dynamic-programming.html
      • /en/glossary/dynamics/coasting-arc.html
    • Mission orbits

      • Distant Retrograde Orbit (DRO)
      • Near-Rectilinear Halo Orbit (NRHO)
      • Earth-Moon L1/L2 Halo Orbit (EML1/EML2 Halo)
      • DRO Constellation
      • Orbit Identification
      • Transfer Orbit (转移轨道)
      • Perilune (近月点)
      • Apolune (远月点)
      • Retrograde (逆行)
      • Prograde (顺行)
      • Parking Orbit (停泊轨道)
      • Free-Return Trajectory (自由返回轨道)
      • Halo Orbit (Halo 轨道)
      • Lissajous Orbit (Lissajous 轨道)
      • Lyapunov Orbit (Lyapunov 轨道)
      • Cycler Trajectory
      • Multi-Revolution Halo Orbit
      • Ballistic Capture Orbit
      • Low-Energy Transfer Orbit
      • Full Lunar Surface Coverage Orbit
      • /en/glossary/orbits/hub-and-spoke.html
    • Navigation

      • X-ray Pulsar Navigation
      • LiAISON Navigation
    • Lunar minerals

      • Changeite-Mg (Magnesium Changeite)
      • Changeite-Ce (Cerium Changeite)
    • Other

      • Starshade
      • Noncooperative Target
      • Spacecraft Intention Recognition
      • Chain-of-Thought (CoT) Prompting
      • Low-Rank Adaptation (LoRA)
      • Prompt Tuning (P-tuning)
      • Cislunar Space (地月空间)
      • Low Earth Orbit / LEO (低地球轨道)
      • Lunar Gravity Assist / LGA (月球借力)
      • Powered Lunar Flyby / PLF (有动力月球借力)
      • Weak Stability Boundary / WSB (弱稳定边界)
      • /en/glossary/other/libration-point.html
      • Orbit Insertion (入轨)
      • /en/glossary/other/orbital-residence-platform.html
    • Organizations

      • Anduril Industries
      • Booz Allen Hamilton
      • General Dynamics Mission Systems
      • GITAI USA
      • Lockheed Martin
      • Northrop Grumman
      • Quindar
      • Raytheon Missiles & Defense
      • Sci-Tec
      • SpaceX
      • True Anomaly
      • Turion Space
    • Military space doctrine

      • Space Superiority
      • Competitive Endurance
      • DOTMLPF-P Framework
      • Mission Command
      • Force Design
      • Force Development
      • Force Generation
      • Force Employment
      • Space Force Generation Process (SPAFORGEN)
      • Mission Delta (MD)
      • System Delta (SYD)
      • Space Mission Task Force (SMTF)
      • Commander, Space Forces (COMSPACEFOR)
      • Component Field Commands
      • Space Domain Awareness (SDA)
      • Counterspace Operations
      • Resilient/Disaggregated Architecture
      • Operational Test and Training Infrastructure (OTTI)
      • Golden Dome
    • Observation techniques

      • Image Stacking
      • Shift-and-Add (SAA)
      • Synthetic Tracking
      • Sidereal Tracking
      • Signal-to-Noise Ratio (SNR)
      • Astrometry
      • Source Extraction
      • Ephemeris Correlation
      • Cislunar Moving Objects
      • Lunar Glare Zone
      • Image Registration
      • Background Star Elimination
      • Segmentation Map
      • Hot Pixel
    • Satellite Communication & TT&C

      • BeiDou Satellite System
      • Inter-Satellite Link (ISL)
      • All-Time Seamless Communication
      • Constellation Networking
      • Microwave Link
      • Laser-Microwave Communication

Curriculum Learning

Definition

Curriculum Learning (CL) is a machine learning training strategy whose core idea is to have models learn from simple samples progressively transitioning to complex samples, mimicking the "step-by-step" process in human education. In reinforcement learning (RL), CL helps agents achieve stable convergence in complex high-dimensional continuous control problems by designing a series of progressively more difficult task curricula.

Curriculum learning is particularly important in high-difficulty, long-horizon cislunar low-thrust trajectory optimization: direct training on the final difficulty level often fails to converge due to sparse rewards and chaotic dynamics, while curriculum learning reduces initial task difficulty to allow the agent to gradually build understanding of the problem.

Application in A2PPO

Ul Haq et al. (2026) applied curriculum learning to the A2PPO framework for cislunar low-thrust trajectory optimization, implementing curriculum design through progressive tightening of success thresholds:

Curriculum Structure

Define curriculum C={(Ni,Δdi,Δvi)}C = \{(N_i, \Delta d_i, \Delta v_i)\}C={(Ni​,Δdi​,Δvi​)}, where:

  • NiN_iNi​: Incremental global training step thresholds
  • Δdi,Δvi\Delta d_i, \Delta v_iΔdi​,Δvi​: Corresponding terminal position and velocity tolerances

Threshold Progression

PhaseGlobal Steps NiN_iNi​Position Tolerance Δd\Delta dΔdVelocity Tolerance Δv\Delta vΔv
Initial05×10−35 \times 10^{-3}5×10−35×10−35 \times 10^{-3}5×10−3
TransitionN1N_1N1​2×10−32 \times 10^{-3}2×10−32×10−32 \times 10^{-3}2×10−3
FinalN2N_2N2​1×10−31 \times 10^{-3}1×10−31×10−31 \times 10^{-3}1×10−3

The agent first learns to reach the vicinity of the target orbit under relaxed tolerances, then progressively transitions to precise orbit insertion.

Curriculum Scheduling

At each environment step, the current curriculum stage is determined by the global training step count GGG:

c=max⁡({j:G≥Nj}∪{1})c = \max(\{j: G \geq N_j\} \cup \{1\}) c=max({j:G≥Nj​}∪{1})

The environment's success thresholds are then set to the corresponding (Δdc,Δvc)(\Delta d_c, \Delta v_c)(Δdc​,Δvc​).

Why Curriculum Learning Works

  1. Avoids sparse reward traps: In chaotic dynamics, the sparse reward for precise terminal arrival is nearly unobtainable in early exploration phases; relaxed thresholds allow the agent to frequently receive positive rewards
  2. Stabilizes gradient estimation: "Approximately correct" trajectories from early curriculum stages help the value function estimate accurately, reducing high variance in policy updates
  3. Avoids local optima: Starting from simple tasks allows the agent to explore a larger state space, providing good initialization when thresholds are later tightened
  4. Curriculum transfer: Control policies learned on simple tasks typically have positive transfer effects on similar complex tasks

Convergence Curve Characteristics

Curriculum learning training curves exhibit a characteristic "staircase" pattern: each time thresholds are tightened, terminal error and rewards temporarily drop (because the task suddenly becomes harder), after which the agent adapts and recovers stability. This phenomenon is observed across all four A2PPO scenarios (S1-S4).

Related Concepts

  • A2PPO (Attention-Augmented PPO): The framework applying curriculum learning
  • Low-Thrust Transfer MDP: The RL problem formulation that curriculum learning serves
  • Generalized Advantage Estimation (GAE): Advantage estimation method used with curriculum learning

References

  • Bengio Y, Louradour J, Collobert R, et al. Curriculum learning[C]. International Conference on Machine Learning, 2009.
  • Ul Haq I U, Dai H, Du C. Autonomous low-thrust trajectory optimization in cislunar space via attention-augmented reinforcement learning[J]. Aerospace Science and Technology, 2026.
Improve this page
Last Updated: 4/29/26, 11:30 AM
Contributors: Hermes Agent, Cron Job
Prev
A2PPO (Attention-Augmented Proximal Policy Optimization)
Next
Low-Thrust Transfer MDP Formulation
地月空间入门指南
Cislunar Space Beginner's GuideYour guide to cislunar space
View on GitHub

Navigate

  • Home
  • About
  • Space News
  • Glossary

Content

  • Cislunar Orbits
  • Research
  • Resources
  • Blue Team

English

  • Home
  • About
  • Space News
  • Glossary

Follow Us

© 2026 Cislunar Space Beginner's Guide  |  湘ICP备2026006405号-1
Related:智慧学习助手 UStudy航天任务工具箱 ATK
支持我
鼓励和赞赏我感谢您的支持