dc.description.abstract | Despite the advancements of self-driving cars, autonomous on-ramp merging on highways still
proposes difficulties. To solve this merge problem a simulation was set up in the Unity game
engine and an agent was trained using two state of the art reinforcement learning algorithms,
Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), utilizing the Unity Learning
Agents Toolkit ML-Agents. The two algorithms are compared to each other with respects to
training speed, performance, stability and success rate. The robustness of the algorithms were
tested by having the traffic (1) vary in speed, (2) vary in starting positions and (3) switch lanes.
The agent had a similar performance with a success rate of 95% when employing either PPO
or SAC. Both algorithms showed their advantages and disadvantages. PPO had a more stable
performance and less variability in mean reward, while SAC was more sample efficient.
Results show that reinforcement learning is an avenue worth pursuing to reach fully
autonomous driving. Improvements to the results could still be made through hyperparameter
tuning, more complex neural network setup and a more realistic simulation, further proving the
advantage of reinforcement learning. | |