Could Google Translate maintain a central codex “language” therefore bypassing artifacts that come from English-as-central-language issue?

By: | Post date: 2016-09-27 | Comments: No Comments
Posted in categories: General Language, Linguistics

Google Translate, like many machine translation projects, does not maintain [math]n^2[/math] language pairs when adding languages to its bank; it appears to maintain just n:English mappings—so that a translation from, say, Greek to Persian is pretty clearly via English as an interlanguage. That is a clear scalability issue, if you’re going to maintain the number of languages that Google does.

Is there a better interlanguage than English? Maybe, if you’ve got the resources to handcraft one. Esperantists are familiar with the Distributed Language Translation project in the 80s and 90s, which was using Esperanto as an interlanguage for European Union translating. (An Esperanto with a fair few tweaks, and with rule-based translation.) Predictably, it ran out of funding in 1997.

And if you’re using statistical methods rather than handcrafting rules (which has been the mainstream in machine translation for a very long time now), then any target language is going to have to be a human language, for which you can get a big enough corpus to do statistics to begin with. That means, unfortunately, that English as an interlanguage for machine translation between a large number of pairs of languages actually is as good as you’re going to get.

What you’d hope is that other language pairs, not involving English, get their own statistical training; for all I know, that is happening. But that will still have to be prioritised by demand: Japanese–Chinese or French–German is more likely to be realised than Greek–Persian.

Leave a Reply

  • Subscribe to Blog via Email

    Join 300 other subscribers

  • February 2018
    M T W T F S S
    « Jan    
     1234
    567891011
    12131415161718
    19202122232425
    262728  
%d bloggers like this: