Pre-editing and post-editing (MT)
Pre- and post-editing technologies for Machine Translation (MT) are one of the most recent research focuses of the Department of Translation Techonology (referred to by its French acronym TIM). In the last decade, MT has progressed significantly and has therefore become highly prevalent in the translation industry as a solution to meet increasing demands for tighter turnaround times and to reduce translation costs. Nevertheless, the output produced by most engines is still far from perfect. To maximize the effectiveness of MT and to obtain a final high-quality translation, two complementary processes are usually included in the MT workflow: pre-editing and post-editing.
- Pre-editing
Pre-editing consists in processing the texts before machine translation. It typically involves correcting mistakes in the source text (mainly grammar, punctuation and spelling), removing ambiguities and simplifying structures. For statistical MT, it may also involve adapting the text in such a way that the input text is a closer match to the texts the engine has been trained with, which can help the MT engine perform better.
- Post-editing
Post-editing is the process by which professionally trained translators or linguists review and correct the MT output to remove both semantic and linguistic errors. Post-editing can be roughly divided into two categories: light/rapid and full post-editing. The first focuses mainly on transferring the correct meaning, while ignoring any stylistic issues. In a full post-editing scenario, by contrast, the text should attain a high-quality level comparable to that of a human translation and all issues need to be dealt with.
Research work
From 2012 to 2015, we focused our efforts on the ACCEPT European Project. The project aimed at improving statistical machine translation (SMT) of user-generated content by investigating minimally-intrusive pre-editing techniques, SMT improvement methods and post-editing strategies
The project ended early in 2015. At that moment, we took over the academic leadership and installed the technologies that we have used during the project at our facilities. This will allow us to continue improving the tools and conduct studies on pre-editing and post-editing with our students. One of the main outcomes of the project was the creation of a fully-integrated online platform that combines the typical modules of an MT workflow and is specifically designed for academic purposes: the ACCEPT Academic Portal.
Ties to the industry
The FTI will be testing the implementation of machine translation at Swiss Post for DE>EN/FR/IT translations. The project will compare several statistical machine translation engines (commercial and open source; phrase-based and neural) and will test different MT-integrated translation environments with the translators. It involves several students and collaborators of the TIM Department, including Lise Agnellet, Sabrina Girletti and Jonathan Mutal (FaMAF, Universidad de Córdoba). The project, which began in May, is under the direction of Pierrette Bouillon (FTI) and Paula Estrella (FaMAF, Universidad de Córdoba).
Training students
As the MT technology continues to improve, having the necessary human skills in both pre-editing and post-editing activities will become vital in the translation industry. For this reason, we train our students accordingly to help them improve their MT skills. Traduction Automatique 1 and Traduction Automatique 2 give students an overview of the whole MT process. Besides, during their studies students may have the opportunity to broaden their understanding of MT through internships in well-known organisations and enterprises.
Main Publications
- Pre-editing by forum users: a Case Study , Bouillon P., Gaspar L., Gerlach J., Porro V., Roturier J., in: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC), CNL Workshop, Reykjavik, Islande, 2014.
- Combining pre-editing and post-editing to improve SMT of user-generated content , Gerlach J., Porro V., Bouillon P., Lehmann S., in: Proceedings of the Machine Translation Summit XIV, Nice, France, 2013.
- La préédition avec des règles peu coûteuses, utile pour la TA statistique des forums ? , Gerlach J., Porro V., Bouillon P., Lehmann S. , in: 20ème conférence sur le Traitement Automatique des Langues Naturelles(TALN), Sables d'Olonne, France, 2013.
- Two Approaches to Correcting Homophone Confusion in a Hybrid Machine Translation System , Bouillon P., Gerlach J., Germann U., Haddow B., Rayner M., in: Second ACL Workshop on Hybrid Approaches to Translation (HyTra), Sofia, Bulgaria, 2013.
- Comparing forum data post-editing performance using translation memory and machine translation output: a pilot study , Morado Vázquez L., Rodríguez Vázquez S., Bouillon P., in: Proceedings of the Machine Translation Summit XIV, Nice, France, 2013.
- Using Source-Language Transformations to Address Register Mismatches in SMT , Rayner M., Bouillon P., Haddow B., in: Proceedings of AMTA, San Diego, CA, US, 2012.