University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Enabling Serverless Deployment of Large-Scale AI Workloads

Christidis, Angelos, Moschoyiannis, Sotiris, Hsu, Ching-Hslen and Davies, Roy (2020) Enabling Serverless Deployment of Large-Scale AI Workloads IEEE Access, 8. pp. 70150-70161.

[img] Text
serverless_access_20200218.pdf - Accepted version Manuscript
Restricted to Repository staff only

Download (1MB)
servlerlessAI_ieee_access_2020.pdf - Version of Record
Available under License Creative Commons Attribution.

Download (9MB) | Preview


We propose a set of optimization techniques for transforming a generic AI codebase so that it can be successfully deployed to a restricted serverless environment, without compromising capability or performance. These involve (1) slimming the libraries and frameworks (e.g., pytorch) used, down to pieces pertaining to the solution; (2) dynamically loading pre-trained AI/ML models into local temporary storage, during serverless function invocation; (3) using separate frameworks for training and inference, with ONNX model formatting; and, (4) performance-oriented tuning for data storage and lookup. The techniques are illustrated via worked examples that have been deployed live on geospatial data from the transportation domain. This draws upon a real-world case study in intelligent transportation looking at on-demand, realtime predictions of flows of train movements across the UK rail network. Evaluation of the proposed techniques shows the response time, for varying volumes of queries involving prediction, to remain almost constant (at 50 ms), even as the database scales up to the 250M entries. The query response time is important in this context as the target is predicting train delays. It is even more important in a serverless environment due to the stringent constraints on serverless functions’ runtime before timeout. The similarities of a serverless environment to other resource constrained environments (e.g., IoT, telecoms) means the techniques can be applied to a range of use cases.

Item Type: Article
Divisions : Faculty of Engineering and Physical Sciences > Computer Science
Authors :
Christidis, Angelos
Hsu, Ching-Hslen
Davies, Roy
Date : 2 April 2020
Funders : EPSRC - Engineering and Physical Sciences Research Council
DOI : 10.1109/ACCESS.2020.2985282
Copyright Disclaimer : Copyright 2020 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
Uncontrolled Keywords : Intelligent Transportation, Predicting train delays, AWS, Functions as-a-service, Lambda, NoSQL, Serverless, Libraries; Load modeling; Computer architecture; Real-time systems; Predictive models; Optimization; Intelligent transportation; Predicting train delays; AWS; Functions as-a-service; Lambda; NoSQL; Serverless; Resource-constrained; Serverless codebase optimization; Rail traffic big data; Resource-constrained, Serverless Codebase Optimisation, Rail traffic big data;
Additional Information : This work was supported in part by the EIT Digital IVZW through the Real-Time Flow Project under Grant 18387–SGA201, in part by the EPSRC IAA Project AGELink under Grant EP/R511791/1, and in part by the National Natural Science Foundation of China under Grant 61872084
Depositing User : James Marshall
Date Deposited : 26 Mar 2020 09:35
Last Modified : 24 Apr 2020 18:07

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800