Direct Collocation
Definition
Direct Collocation is a class of methods that directly discretize Optimal Control Problems (OCP) into Nonlinear Programming Problems (NLP) for numerical solution[1]. Unlike indirect methods that require analytical derivation of costate first-order optimality conditions, direct methods simultaneously discretize state and control variables through collocation, transforming the infinite-dimensional continuous OCP into a finite-dimensional NLP. It is currently one of the most widely used numerical methods in spacecraft trajectory optimization.
Basic Principles
Discretization Strategy
In direct collocation, the transfer interval is divided into sub-intervals, with simultaneous satisfaction at collocation points in each sub-interval:
- State dynamics constraint: Enforce at collocation points
- Boundary conditions: Initial state and terminal constraint
- Path constraints: Control constraints , obstacle avoidance constraints, etc.
Hermite-Simpson Collocation
The direct collocation implementation used in the A2PPO research employs the Hermite-Simpson collocation scheme[2]:
- On each sub-interval , state is interpolated using cubic Hermite polynomials
- Dynamics defect constraints are enforced at the interval midpoint
- State accuracy at collocation points is third-order, defect constraint accuracy is third-order
Comparison with A2PPO
Ul Haq et al. (2026) used trajectories generated by the A2PPO policy as initial guesses for direct collocation, verifying consistency between the two across four scenarios[2]:
| Scenario | A2PPO ToF (days) | Direct Collocation ToF (days) | A2PPO Fuel (kg) | Direct Collocation Fuel (kg) |
|---|---|---|---|---|
| S1 | 4.95 | 4.99 | 2.08 | 1.28 |
| S2 | 8.38 | 7.26 | 5.00 | 5.29 |
| S3 | 7.60 | 7.63 | 5.10 | 5.11 |
| S4 | 33.6 | 33.12 | 0.97 | 0.97 |
Direct collocation typically achieves better fuel efficiency due to exploitation of complete continuous optimality conditions, but:
- Requires good initial guesses (A2PPO provides high-quality initial solutions)
- High computational cost, requiring re-solution for each transfer
- Cannot be computed online in real-time
A2PPO, after training, can perform real-time inference, providing near-instantaneous trajectory solutions.
Direct vs. Indirect Methods
| Property | Direct Collocation | Indirect Methods |
|---|---|---|
| Derivation difficulty | Lower (no analytical costate equations) | Higher (requires PMP derivation) |
| Initial guess | More robust | Sensitive (to costate initial values) |
| Solution accuracy | Higher | Extremely high (satisfies first-order optimality) |
| Convergence | Better | Depends on initial guess quality |
| Computational efficiency | Moderate (NLP solvers like Ipopt) | Higher (but narrow convergence basin) |
NLP Solvers
NLPs resulting from direct collocation discretization are typically solved using Sequential Quadratic Programming (SQP) or interior point method solvers:
- Ipopt: Large-scale nonlinear programming solver based on interior point methods
- SNOPT: Sequential Quadratic Programming solver
- CasADi: Symbolic computation framework for constructing NLPs and calling the above solvers
References
- [1] Betts J T. Survey of numerical methods for trajectory optimization[J]. Journal of Guidance, Control, and Dynamics, 1998.
- [2] Ul Haq I U, Dai H, Du C. Autonomous low-thrust trajectory optimization in cislunar space via attention-augmented reinforcement learning[J]. Aerospace Science and Technology, 2026.
- [3] Hargraves C R, Paris S W. Direct trajectory optimization using nonlinear programming and collocation[J]. Journal of Guidance, Control, and Dynamics, 1987.
