Why do we not use morpheme analyzers for English language?

By: | Post date: 2017-07-05 | Comments: No Comments
Posted in categories: English, Linguistics

Do you mean, why is something as ludicrously unlinguistic as Snowball the state of the art of stemming in English? And why do we stem words, instead of doing detailed analysis of affixes, when we parse words in Natural Language Processing of English?

Because English lets us get away with it.

  • There’s not a lot of morphology compared to other languages, and its morphophonemics is relatively clean, with some respelling rules—so stripping off suffixes is doable.
  • Syntax does more work than inflection, so it’s not as critical to understand the inflections to work out what is going on in the sentence meaning.
  • There’s limited and predictable amounts of inflection in English, so stemming is not that onerous. (The Snowball stemmer for English is quite a small program.)
  • Our derivational morphology is only somewhat productive—so we can throw that work back on the lexicon; you couldn’t do that as readily for Turkish.

As a linguist, reading Snowball is deeply offensive. And if anyone is building a search engine for English text, and *not* adding a list of exceptions to your Snowball stemmer, you are doing your users a disservice. But English is such that it’s not as big a deal as it would be for other languages.

Leave a Reply

  • Subscribe to Blog via Email

    Join 325 other subscribers

  • December 2023
    M T W T F S S
    « Jul    
%d bloggers like this: