Machine Translation Scientist

Translated Published: September 29, 2021
Location
Rome, Italy
Job Type

Description

Translated relies on its own machine translation technology, ModernMT, both for fully automatic content translation and for semi-automatic translation, to produce suggestions for its professional translators. Ever since its foundation in 1999, Translated's professionals have post-edited millions of sentences to turn suggestions into high-quality translations.

Now, in the traditional approach to MT training, parallel text corpora constitute training data for machine translation models, but only the final and best version of the translated text is retained for this purpose. The original suggestion on which the translator's work was based is usually discarded.

However, we know that post-edits, i.e. corrections provided by linguists on machine translation suggestions ("negative information") are highly informative. This is true for human learners, in the form of feedback, and to a growing degree for algorithms as well. This research direction has been relatively little explored so far, mainly because large post-editing data sets are difficult to obtain. Instead, at Translated, since our foundation, we have patiently collected this data.

With this project, Translated is going to exploit its vast, high-quality corpus of sentence corrections to improve the quality of the machine translation.

Your role

As a Machine Translation Scientist, you will lead Translated's research effort on this project. You will be embedded in Translated's AI team, which works on several products all centered around translation, such as expressive speech synthesis, Bayesian data analysis of translation quality data, and company-internal ML products.

You will work in a dynamic research and development group composed by young and expert people, based in Rome and Trento, in Italy. Optionally, you could work remotely for a limited time.

We are offering a 2-year renewable contract.

In this role, you will

  • Develop and fine-tune your research roadmap
  • Design, set up and evaluate your experiments using our GPU cluster
  • Access Translated's databases to extract and prepare the training data for your algorithms
  • Run quality evaluations by enrolling the support of professional translators to evaluate translation quality over many language pairs.
  • Discuss your research direction and findings within the AI team and report to technical management.
  • Progress industrial state of the art

From a scientific point of view, we will investigate new MT learning paradigms to make use of post-edits, such as

  • contrastive/adversarial learning
  • reinforcement learning
  • ranking loss

We like to publicize our achievements. Most of our technology stems from components which we originally open-sourced, such as Matecat and ModernMT. Where possible, you will be encouraged to publish your research in the best conferences of the field. We have research collaborations and contacts with several leading groups in the field.

Desired qualifications

  • PhD in a relevant field of MT, NLP or ML
  • Good programming skills: Python,  Java, C/C++, scripting languages
  • Familiarity with Unix, its command-line tools, system architecture
  • Interest in carrying out experimental research
  • Strong expertise in machine learning

Benefits and perks

Our working environment is both relaxed and intense. We are passionate about our mission, and our work is highly regarded in our industry.

  • Competitive and exciting work environment. You will be surrounded by innovators and experts working at Pi Campus, a venture fund and startup ecosystem. Great environment to grow your skills.
  • We host regular tech and entrepreneurship talks and events, to which you can take part as a Pi Citizen.
  • Work hard and stay fit. In the campus you'll find a gym, a swimming pool, a personal trainer for spinning, TRX and pilates classes.

About Translated

Translated is on a mission to make content in all languages accessible to everyone. We are a technology-powered professional translation provider. We partner with over 180 000 professional translators. Our 140 000 clients range from the private person who needs their CV translated to the very big, like Google and Airbnb.

We have invested over 6 000 000 € in R&D in the past years. Our EU FP7 project Matecat produced the translation GUI which allows our translators to edit sentence translations without having to worry about document layout and formatting at all. With ModernMT, an EU H2020 project which won an accolade from the EU as one of the 3 best projects of the call, we developed our neural adaptive machine translation technology.

We focus on serving large businesses, startups and innovative companies that need to speed-up and automate their globalization processes. Thanks to our innovative approach to language technology, we have been chosen by Google to create new services for flagship products such as YouTube and for the translation of apps published in the Google Play Store. Moreover, corporations like Microsoft, eBay, Airbnb use our technologies and services in their localization processes.

More technical context

ModernMT is at the technical state-of-the-art and is perfectly tuned to its intended, interactive, usage: human-computer interaction happens between linguists and the machine translation algorithms through our translation GUI. Our technology gets regularly reviewed by top industry leaders (for instance here for a very recent piece), was ranked the top custom MT product by Inten.to in 2020, and was named (the only) Cool Vendor by Gartner.

One of Translated's most important assets is the huge collection of post-edits generated every day by its linguists. Each post-edit is essentially a tuple composed by the original source text, the automatic suggestion (provided by ModernMT), the correction produced by a professional translator, and (optionally) a final version of the translation provided by an expert reviewer.

ModernMT relies on such knowledge for customization, which allows to significantly and continuously improve the quality of the automatic suggestions in terms of preferential genre, lexicon, and so on. As of now, ModernMT exploits only a part of the available information in the post-edits of a  customer. Basically, only source text and final corrected text are retained for customization.

ModernMT is usually employed in a symbiotic scenario where translators are assisted by reliable MT technology that, at the same time, continuously evolves by learning from translators activity. Of course, the interaction between the translator and ModernMT (generation of automatic suggestions and production of the corrections) happens in real time. Henceforth, a crucial performance indicator of ModernMT is the speed of the computation, which should not impact on the translator’s workflow smoothness.

Powered by JazzHR

APPLY HERE 

Related Jobs