Findings Of The 2016 WMT Shared Task On Cross-lingual .

7m ago
208.67 KB
18 Pages

Findings of the 2016 WMT Shared Taskon Cross-lingual Pronoun PredictionLiane GuillouLMU [email protected] StymneUppsala UniversityDept. of Linguistics & [email protected] CettoloFondazione Bruno KesslerTrento, [email protected] HardmeierUppsala UniversityDept. of Linguistics & [email protected] Computing Res. [email protected]̈rg TiedemannYannick VersleyUniversity of HelsinkiDept. of Modern [email protected], [email protected] WebberAndrei Popescu-BelisILCC, University ofEdinburgh, Scotland, entWe describe the design, the evaluationsetup, and the results of the 2016 WMTshared task on cross-lingual pronoun prediction. This is a classification task inwhich participants are asked to providepredictions on what pronoun class labelshould replace a placeholder value in thetarget-language text, provided in lemmatised and PoS-tagged form. We providedfour subtasks, for the English–French andEnglish–German language pairs, in bothdirections. Eleven teams participated inthe shared task; nine for the English–French subtask, five for French–English,nine for English–German, and six forGerman–English. Most of the submissionsoutperformed two strong language-modelbased baseline systems, with systems using deep recurrent neural networks outperforming those using other architectures formost language pairs.1Preslav NakovIdiap Research InstituteMartigny, [email protected] have an umbrella. It is red.I have an umbrella. It is raining.He lost his job. It came as a totalsurprise.Figure 1: Examples of three different functionsfulfilled by the English pronoun “it”.Problems arise for a number of reasons. In general, pronoun systems in natural language do notmap well across languages, e.g., due to differences in gender, number, case, formality, or animacy/humanness, as well as due to differences inwhere pronouns may be used.To this is added the problem of functional ambiguity, whereby pronouns with the same surfaceform may perform multiple functions (Guillou,2016). For example, the English pronoun “it” mayfunction as an anaphoric, pleonastic, or event reference pronoun. An anaphoric pronoun coreferswith a noun phrase (NP). A pleonastic pronoundoes not refer to anything, but it is required by syntax to fill the subject position. An event referencepronoun may refer to a verb phrase (VP), a clause,an entire sentence, or a longer passage of text. Examples of each of these pronoun functions are provided in Figure 1. It is clear that instances of theEnglish pronoun “it” belonging to each of thesefunctions would have different translation requirements in French and German.IntroductionPronoun translation poses a problem for current state-of-the-art Statistical Machine Translation (SMT) systems (Le Nagard and Koehn, 2010;Hardmeier and Federico, 2010; Novák, 2011;Guillou, 2012; Hardmeier, 2014).525Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pages 525–542,Berlin, Germany, August 11-12, 2016. c 2016 Association for Computational Linguistics

2The problem of pronouns in machine translationhas long been studied. In particular, for SMT systems, the recent previous studies cited above havefocused on the translation of anaphoric pronouns.In this case, a well-known constraint of languageswith grammatical gender is that agreement musthold between an anaphoric pronoun and the NPwith which it corefers, called its antecedent. Thepronoun and its antecedent may occur in the samesentence (intra-sentential anaphora) or in different sentences (inter-sentential anaphora). MostSMT systems translate sentences in isolation, sointer-sentential anaphoric pronouns will be translated without knowledge of their antecedent andas such, pronoun-antecedent agreement cannot beguaranteed. The accurate translation of intrasentential anaphoric pronouns may also causeproblems as the pronoun and its antecedent mayfall into different translation units (e.g., n-gram orsyntactic tree fragment).Task DescriptionThe WMT 2016 shared task on cross-lingual pronoun prediction is a classification task in whichparticipants are asked to provide predictions onwhat pronoun class label should replace a placeholder value (represented by the token REPLACE)in the target-language text. It requires no specificMachine Translation (MT) expertise and is interesting as a machine learning task in its own right.Within the context of SMT, one could think of thetask of cross-lingual pronoun prediction as a component of an SMT system. This component maytake the form of a decoder feature or it may beused to provide “corrected” pronoun translationsin a post-editing scenario.The design of the WMT 2016 shared task hasbeen influenced by the design and the results ofa 2015 shared task (Hardmeier et al., 2015) organised at the EMNLP workshop on Discourse inMT (DiscoMT). The first intuition about evaluating pronoun translation is to require participantsto submit MT systems — possibly with specificstrategies for pronoun translation — and to estimate the correctness of the pronouns they output. This estimation, however, cannot be performed with full reliability only by comparing pronouns across candidate and reference translationsbecause this would miss the legitimate variation ofcertain pronouns, as well as variations in genderor number of the antecedent itself. Human judgesare thus required for reliable evaluation, following the protocol described at the DiscoMT 2015shared task on pronoun-focused translation. Thehigh cost of this approach, which grows linearlywith the number of submissions, prompted us toimplement an alternative approach, also proposedin 2015 as pronoun prediction (Hardmeier et al.,2015). While the structure of the WMT 2016 taskis similar to the shared task of the same name atDiscoMT 2015, there are two main differences,one conceptual and one regarding the languagepairs, as specified hereafter.In the WMT 2016 task, participants are asked topredict a target-language pronoun given a sourcelanguage pronoun in the context of a sentence.In addition to the source-language sentence, weprovide a lemmatised and part-of-speech (PoS)tagged target-language human-authored translation of the source sentence, and automatic wordalignments between the source-sentence wordsand the target-language lemmata.The above constraints start playing a role in pronoun translation in situations where several translation options are possible for a given sourcelanguage pronoun, a large number of options being likely to affect negatively the translation accuracy. In other words, pronoun types that exhibit significant translation divergencies are morelikely to be erroneously translated by an SMT system that is not aware of the above constraints.For example, when translating the English pronoun “she” into French, there is one main option, “elle” (exceptions occur, though, e.g., in references to ships). However, several options existfor the translation of anaphoric “it”: “il” (for anantecedent that is masculine in French) or “elle”(feminine), but also “cela”, “ça” or sometimes“ce” (non-gendered demonstratives).The challenges of correct pronoun translationgradually raised the interest in a shared task, whichwould allow the comparison of various proposalsand the quantification of their claims to improvepronoun translation. However, evaluating pronoun translation comes with its own challenges,as reference-based evaluation cannot take into account the legitimate variations of translated pronouns, or their placement in the sentence. Building upon the experience from a 2015 shared task,the WMT 2016 shared task on pronoun prediction has been designed to test capacities for correctpronoun translation in a framework that allows forobjective evaluation, as we now explain.526

The selection of the source-language pronounsand their target-language prediction classes foreach subtask is based on the variation that isto be expected when translating a given sourcelanguage pronoun, i.e., the translation divergencies of each pronoun type. For example, whentranslating the English pronoun “it” into French,a decision must be made as to the gender of theFrench pronoun, with “il” and “elle” both providing valid options. Alternatively, a non-genderedpronoun such as “cela” may be used instead. Thetranslation of the English pronouns “he” and “she”into French, however, does not require such a decision. These may simply be mapped one-to-one,as “il” and “elle” respectively, in the vast majority of cases. The translation of “he” and “she”from English into French is therefore not considered an interesting problem and as such, thesepronouns are excluded from the source-languageset for the English–French subtask. In the opposite translation direction, the French pronoun “il”may be translated as “it” or “he”, and “elle” as“it” or “she”. As a decision must be taken as tothe appropriate target-language translation of “il”and “elle”, these are included in the set of sourcelanguage pronouns for French–English.In the translation, the words aligned to a subsetof the source-language third-person subject pronouns are substituted by placeholders. The aimof the task is to predict, for each placeholder, theword that should replace it from a small, closedset of classes, using any type of information thatcan be extracted from the documents. In this way,the evaluation can be fully automatic, by comparing whether the class predicted by the systemis identical to the reference one, assuming thatthe constraints of the lemmatised target text allowonly one correct class (unlike the pronoun-focusedtranslation task which makes no assumption aboutthe target text).Figure 2 shows an English–French examplesentence from the development set. It containstwo pronouns to be predicted, indicated by RE PLACE tags in the target sentence. The first “it”corresponds to “ce” while the second “it” corresponds to “qui” (equivalent to English “which”),which belongs to the OTHER class, i.e., does notneed to be predicted as is. This example illustratessome of the difficulties of the task: the two sourcesentences are merged into one target sentence, thesecond “it” becomes a relative pronoun instead ofa subject one, and the second French verb has arare intransitive usage.2.1The two main differences between theWMT 2016 and DiscoMT 2015 tasks are asfollows. First, the WMT 2016 task introducesmore language pairs with respect to the 2015 task.In addition to the English–French subtask (samepair as the DiscoMT 2015 task), we also providesubtasks for French–English, German–Englishand English–German. Second, the WMT 2016task provides a lemmatised and PoS-tagged reference translation instead of the fully inflected textprovided for the DiscoMT 2015 task. The use ofthis representation, whilst still artificial, could beconsidered to provide a more realistic SMT-likesetting. SMT systems cannot be relied upon togenerate correctly inflected surface form words,and so the lemmatised, PoS-tagged representationencourages greater reliance on other informationfrom the source and target-language sentences.English–FrenchThis subtask concentrates on the translation ofsubject-position “it” and “they” from English intoFrench. The following prediction classes exist forthis subtask (the class name, identical to the mainlexical item, is highlighted in bold, but each classmay include additional lexical items, indicated inplain font between quotes): ce: the French pronoun “ce” (sometimes withelided vowel as “c’ ” when preceding a wordstarting by a vowel) as in the expression“c’est” (“it is”); elle: feminine singular subject pronoun; elles: feminine plural subject pronoun; il: masculine singular subject pronoun; ils: masculine plural subject pronoun; cela: demonstrative pronouns, including“cela”, “ça”, the misspelling “ca”, and therare elided form “ç’ ” when the verb following it starts with a vowel;The following sections describe the set ofsource-language pronouns and the target-languageclasses to be predicted, for each of the four subtasks. The subtasks are asymmetric in terms ofthe source-language pronouns and the predictionclasses. on: indefinite pronoun; OTHER: some other word, or nothing at all,should be inserted.527

ce OTHER ce PRON qui PRON It ’s an idiotic debate . It has to stop .REPLACE 0être VER un DET débat NOM idiot ADJ REPLACE 6 devoir VER stopper VER . . 0-0 1-12-2 3-4 4-3 6-5 7-6 8-6 9-7 10-8Figure 2: English–French example sentence from the development set with two REPLACE tags to bereplaced by “ce” and “qui” (OTHER class), respectively. The French reference translation, not shown toparticipants, merges the two source sentences into one: “C’est un débat idiot qui doit stopper.”2.2French–English it: non-gendered singular subject pronoun; they: non-gendered plural subject pronoun;This subtask concentrates on the translation ofsubject-position “elle”, “elles”, “il”, and “ils”from French into English.1 The following prediction classes exist for this subtask: you: second person pronoun (with bothgeneric or deictic uses); this: demonstrative pronouns (singular), including both “this” and “that”; he: masculine singular subject pronoun; she: feminine singular subject pronoun; these: demonstrative pronouns (plural), including both “these” and “those”; they: non-gendered plural subject pronoun; there: existential “there”; it: non-gendered singular subject pronoun; OTHER: some other word, or nothing at all,should be inserted. this: demonstrative pronouns (singular), including both “this” and “that”; these: demonstrative pronouns (plural), including both “these” and “those”;33.1 there: existential “there”;English–GermanThis subtask concentrates on the translation ofsubject-position “it” and “they” from English intoGerman. It uses the following prediction classes:3.1.1 TED TalksTED