University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Indoor scene understanding from visual analysis of human activity

Fowler, Sam (2020) Indoor scene understanding from visual analysis of human activity Doctoral thesis, University of Surrey.

samfowler_phd_thesis_final_201219.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (97MB) | Preview


Visual scene understanding studies the task of representing a captured scene in a manner emulating human-like understanding of that space. Considering indoor scenes are designed for human use and are utilised everyday, attaining this understanding is crucial for applications such as robotic mapping and navigation, smart home and security systems, and home healthcare and assisted living. However, although we as humans utilise such spaces in our day-to-day lives, analysis of human activity is not commonly applied towards enhancing indoor scene-level understanding. As such, the work presented in this thesis investigates the benefits of including human activity information in indoor scene understanding challenges, aiming to demonstrate its potential contributions, applications, and versatility. The first contribution of this thesis utilises human activity to reveal scene regions occluded behind objects and clutter. Human poses recognised from a static sensor are projected into a top-down scene representation recording belief of human activity over time. This representation is applied to carve a volumetric scene map, initialised on captured depth, to expose the occupancy of hidden scene regions. An object detection approach exploits the revealed occluded scene occupancy to localise self-, partially-, and, significantly, fully-occluded objects. The second contribution extends the top-down activity representation to predict the functionality of major scene surfaces from human activity recognised in 360 degree video. A convolutional network is trained on simulated human activity to segment walkable, sittable, and interactable surfaces from the top-down perspective. This prediction is applied to construct a complete scene 3D approximation, with results showing scene structure and surface functionality are predicted well from human activity alone. Finally, this thesis investigates an association between the top-down functionality prediction and the captured visual scene. A new dataset capturing long-term human activity is introduced to train a model on combined activity and visual scene information. The model is trained to segment functional scene surfaces from the capture sensor perspective, with evaluation establishing that the introduction of human activity information can improve functional surface segmentation performance. Overall, the work presented in this thesis demonstrates that analysis of human activity can be applied to enhance indoor scene understanding across various challenges, sensors, and representations. Assorted datasets are introduced alongside the major contributions to motivate further investigation into its application.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Fowler, Sam
Date : 31 January 2020
Funders : EPSRC, BBC Audio Research Partnership
DOI : 10.15126/thesis.00853281
Grant Title : S3A: Future Spatial Audio for an Immersive Listener Experience at Home
Projects : S3A: Future Spatial Audio for an Immersive Listener Experience at Home
Contributors :
Uncontrolled Keywords : indoor scene understanding, human activity, visual analysis, computer vision
Related URLs :
Depositing User : Sam Fowler
Date Deposited : 07 Feb 2020 13:08
Last Modified : 07 Feb 2020 13:09

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800