University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Named Entity Recognition: A Local Grammar-Based Approach.

Traboulsi, Hayssam N. (2006) Named Entity Recognition: A Local Grammar-Based Approach. Doctoral thesis, University of Surrey (United Kingdom)..

Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (6MB) | Preview


In general, the task of Named Entity Recognition (NER) is an information extraction subtask which seeks to identify and classify proper nouns in a document as being a person, organisation, place, date, time, monetary value, or percentage. In this thesis the recognition of the most open types of named entities - person and organisation names - is investigated. This task has proved to be significant to information retrieval, machine translation, document indexing, and a necessary prerequisite to more complex information extraction and question-answering tasks. Two of the most difficult problems encountered by the developers of NER systems are those of portability and system performance: a practical NER system is expected to have the ability to correctly recognise named entities in new domains of texts or new languages at a minimal cost. The main contributor to such problems is the manual effort that has always been needed to develop symbolic or statistical recognition rules for these systems from large tagged text corpora. In this research we introduce a prototype called LG-Finder to automatically acquire linguistic recognition rules (local grammars) for person and organisation names from untagged text corpora through the use of techniques in corpus linguistics, including frequency, collocation and concordance analyses. So far, LG-Finder has been successfully tested on English news texts, but it can be applied straightforwardly to other European languages. In addition, we present a local grammar-based NER prototype (NExtract) which incorporates finite state transducers implementing local grammars acquired by LG-Finder. The success rates scored by NExtract when evaluated against data sets from Reuters and Wall Street Journal are promising and comparable with those achieved in the DARPA-sponsored MUC-7 named entity evaluation. Finally, we present a question-answering prototype, which makes use of the local grammars to answer a small set of questions seeking information on people and organisations in the financial domain. The evaluation results of this prototype are encouraging and motivate further investigations using the local grammar approach in this domain.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors : Traboulsi, Hayssam N.
Date : 2006
Additional Information : Thesis (Ph.D.)--University of Surrey (United Kingdom), 2006.
Depositing User : EPrints Services
Date Deposited : 14 May 2020 14:56
Last Modified : 14 May 2020 15:01

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800