University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Appreciation of structured and unstructured content to aid decision making - from web scraping to ontologies and data dictionaries in healthcare.

Michalakidis, Georgios (2016) Appreciation of structured and unstructured content to aid decision making - from web scraping to ontologies and data dictionaries in healthcare. Doctoral thesis, University of Surrey.

Georgios_Michalakidis_PhD_Thesis_UniS_v3.pdf - Version of Record
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (7MB) | Preview


A systematic approach to the extraction of data from disparate data sources is proposed. The World Wide Web is a most diverse dataset; identifying ways in which this large database provides means for data quality verification with concepts such as data lineage and provenance allows to follow the same approach as a means to aid decision-making in sensitive domains such as healthcare. Through lessons learned from research in the UK and internationally, we conclude that emphasis on interoperable and model-based support of the data syndication can enhance data quality, an issue still current (American Hospital Association, 2015) and with data barriers in healthcare due to governance concerns. To improve on the above, we start by proposing a system for solution-orientated reporting of errors associated with the extraction of routinely collected clinical data. We then explore key concepts to assess the readiness of data for research and define an ontology-driven approach to create data dictionaries for quality improvement in healthcare. Finally, we apply this research to facilitate the enablement of consistent data recording across a health system to allow for service quality comparisons. Work deriving from this research and built by the author commissioned and aided by the UK NHS, University of Surrey, Green Cross Medical, particularly in creating and testing software systems in real-world scenarios, has facilitated: quality improvement in healthcare data extraction from GP practices in the UK, a state-of-art system for Web-enabling Hospital Episode Statistics (HES) data for dermatology and, finally, an online system designed to enable cancer Multi-Disciplinary Teams (MDTs) to self-assess and receive feedback on how their team performs against the standards set out in ‘The Characteristics of an Effective MDT’ provided by NHS IQ, formerly part of National Cancer Action Team (NCAT), which in 2016 won the Quality in Care Programme’s “Digital Innovation in the Treatment of Cancer” award. Further experimentation shows there is potential for the methods proposed to be applicable in other sectors such as the investment sector (initial investigation has happened through the early stages of this research) but it is suggested that this potential be explored further.

Item Type: Thesis (Doctoral)
Subjects : data analytics, web scraping, collective intelligence, health data
Divisions : Theses
Authors :
Date : 31 October 2016
Funders : -
Contributors :
ContributionNameEmailORCID, Paul
Depositing User : Georgios Michalakidis
Date Deposited : 03 Nov 2016 09:09
Last Modified : 31 Oct 2017 18:44

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800