Deep Reinforcement Learning
Author: CislunarSpace
Site: https://cislunarspace.cn
Definition
Deep Reinforcement Learning (DRL) combines deep learning's perception capabilities with reinforcement learning's decision-making abilities, learning optimal policies through environmental interaction. For stratospheric airship control, DRL can handle high-dimensional state spaces and complex nonlinear dynamics.
Basic Framework
Markov Decision Process (MDP)
RL problems are modeled as MDP:
| Element | Description |
|---|---|
| State space (position, velocity, altitude, etc.) | |
| Action space (thrust direction, magnitude, etc.) | |
| State transition probability $P(s' | |
| Reward function | |
| Discount factor |
Mainstream Algorithms
Policy Gradient Methods
| Algorithm | Characteristic | Application |
|---|---|---|
| REINFORCE | Monte Carlo estimation | Discrete actions |
| PPO | Trust region constraint | Continuous actions |
| SAC | Maximum entropy | Continuous actions |
Value Function Methods
| Algorithm | Characteristic | Application |
|---|---|---|
| DQN | Experience replay, target network | Discrete, low-dim |
| TD3 | Double critic | Continuous actions |
| DDPG | Actor-Critic framework | Continuous actions |
Applications in Stratospheric Airships
State Space Design
| State Variable | Dimension | Description |
|---|---|---|
| Position | 3 | Geographic coordinates |
| Velocity | 3 | Ground speed |
| Wind estimation | 3 | Perceived wind disturbance |
| Altitude | 1 | Absolute altitude |
| Helium state | 2 | Volume, temperature |
Related Concepts
References
- Sutton R S, Barto A G. Reinforcement Learning: An Introduction[M]. MIT Press, 2018.
- Mnih V, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015.
