University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Bilingual word sketches: three flavours VIDEO

Kilgarriff, A, Kovar, V and Frankenberg-Garcia, A (2013) Bilingual word sketches: three flavours VIDEO In: Electronic lexicography in the 21st century: thinking outside the paper, 2013-10-17 - 2013-10-19, Tallinn.

Full text not available from this repository.


Word sketches are one-page, automatic corpus-based accounts of a word’s grammatical and collocational behaviour (Kilgarriff et al 2004). Since their introduction in 1998 they have come to be widely used in lexicography, often serving as the first port of call for a lexicographer analysing a word. Until recently, they have been monolingual. While various people have said how good it would be to have bilingual word sketches, it is not clear what they would be. We have explored three interpretations, bics, bips and bims. Bics are bilingual word sketches based on comparable corpora. They require a bilingual dictionary, as well as two comparable corpora, as input. They pursue the ‘cross product’ method first proposed by Grefenstette: to find translations for compositional collocations like Englsih work group (into, eg, French) we can look up work and group in an English-French dictionary, where we may find three translations for work, five for group. That gives 3x5=15 possible combinations. We check to see which is commonest in a French corpus, and that is probably a fair translation. We use this core method, together with grammatical filters and salience statistics, to include likely French translations for the collocations in the English word sketch for, e.g., work. Bips are bilingual word sketches based on parallel corpora. Here, a dictionary is not needed because the connections between the languages can be inferred from the words found in the two languages in aligned sentences. We first count occurrences in aligned sentences to identify translation candidates for the headword. Then, for each source-language collocate, we find which target-language collocations tend to occur in the aligned chunks and present them as candidate translations. Bims are bilingual word sketches based on manual selection of headwords. In this approach the user chooses the two words (from two different languages – typically mutual translations) whose word sketches they want to compare, and all that the software does is put them side by side in the same window. (This can be done by opening two browser windows side by side, with one word sketch in each.) We shall present these three flavours of bilingual word sketch and discuss first experiences, strengths and weaknesses and initial evaluations of each. References Gregory Grefenstette 1999. The World Wide Web as a Resource for Example-Based Machine Translation Tasks. Translating and the Computer 21. London Adam Kilgarriff, Pavel Rychlý, Pavel Smrz, David Tugwell 2004. The Sketch Engine Proc. Euralex. Lorient, France

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions : Surrey research (other units)
Authors :
Kilgarriff, A
Kovar, V
Date : 2013
Depositing User : Symplectic Elements
Date Deposited : 16 May 2017 15:28
Last Modified : 23 Jan 2020 14:56

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800