University of Surrey

Test tubes in the lab Research in the ATI Dance Research

End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with Temporal Context

Kuutti, Sampo, Bowden, Richard, Joshi, Harita, de Temple, Robert and Fallah, Saber (2019) End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with Temporal Context In: IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE - ITSC 2019, 2019-10-28-2019-10-30, Auckland, New Zealand.

[img]
Preview
Text
Kuutti - 2019 - End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with (1).pdf - Accepted version Manuscript

Download (409kB) | Preview

Abstract

Reinforcement learning has been used widely for autonomous longitudinal control algorithms. However, many existing algorithms suffer from sample inefficiency in reinforcement learning as well as the jerky driving behaviour of the learned systems. In this paper, we propose a reinforcement learning algorithm and a training framework to address these two disadvantages of previous algorithms proposed in this field. The proposed system uses an Advantage Actor Critic (A2C) learning system with recurrent layers to introduce temporal context within the network. This allows the learned system to evaluate continuous control actions based on previous states and actions in addition to current states. Moreover, slow training of the algorithm caused by its sample inefficiency is addressed by utilising another neural network to approximate the vehicle dynamics. Using a neural network as a proxy for the simulator has significant benefit to training as it reduces the requirement for reinforcement learning to query the simulation (which is a major bottleneck) in learning and as both reinforcement learning network and proxy network can be deployed on the same GPU, learning speed is considerably improved. Simulation results from testing in IPG CarMaker show the effectiveness of our recurrent A2C algorithm, compared to an A2C without recurrent layers.

Item Type: Conference or Workshop Item (Conference Paper)
Divisions : Faculty of Engineering and Physical Sciences > Mechanical Engineering Sciences
Authors :
NameEmailORCID
Kuutti, Sampos.j.kuutti@surrey.ac.uk
Bowden, RichardR.Bowden@surrey.ac.uk
Joshi, Harita
de Temple, Robert
Fallah, Sabers.fallah@surrey.ac.uk
Date : 30 June 2019
Funders : EPSRC - Engineering and Physical Sciences Research Council, Jaguar Land Rover
Copyright Disclaimer : Copyright 2019 The Authors
Related URLs :
Depositing User : Diane Maxfield
Date Deposited : 07 Aug 2019 09:56
Last Modified : 28 Oct 2019 10:46
URI: http://epubs.surrey.ac.uk/id/eprint/852359

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year


Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800