Reinforcement Learning_Code_Value Function Approximation
Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in?Gymnasium's?Cart Pole environment.
RESULTS:
Visualizations of (i)?changes?in?scores, losses and epsilons, and (ii) animation results.
1. Monte Carlo


2. Sarsa
Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.?
Because in original implementation of? Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.?
So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.


3. Deep Q-learning
Here we use experience replay and fixed Q-targets.


CODE:
NetWork.py
MCAgent.py
SarsaAgent.py
ReplayBuffer.py
DQNAgent.py
train_and_test.py
The above code?are mainly based on rainbow-is-all-you-need[1] and extend?solutions to Monte Carlo and?Sarsa.
Reference
[1] https://github.com/Curt-Park/rainbow-is-all-you-need