Is there any NLP tool that can extract affix and stem of English words?

By: Nick Nicholas | Post date: 2016-03-13 | Comments: No Comments
Posted in categories: English, Linguistics

Yes, the Porter Stemmer is the most popular approach by far. See A survey of stemming algorithms in information retrieval for a survey, nltk.stem package for NLTK implementations, and Porter Stemming Algorithm for Porter’s own description of it. There are tweaks of it around, but noone has gone for anything different; and English being the way it is, there’s no real interest in the more powerful lemmatisers, which would do actual dictionary work.

As a linguist, I (and I’m sure many another linguist) am aghast at what the Porter Stemmer doesn’t do. stupider for example goes to stupid, but bigger does not go to big: Porter does not touch bisyllabic words—there’s too much risk of error. Similarly, Porter has no idea or interest in irregular forms.

It is a decent compromise on doing too much versus doing too little (and doing too much is a real problem). What people always forget is that it has to be customised, to deal with the vocabulary you’re likely to encounter, with an exceptions list. That applies in particular to its use in Lucene/SOLR.

Answered 2016-03-13

[Originally posted on http://quora.com/Is-there-any-NLP-tool-that-can-extract-affix-and-stem-of-English-words/answer/Nick-Nicholas-5]

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Ἡλληνιστεύκοντος

Set Union of Greek and Linguistics

Pages

Categories

Is there any NLP tool that can extract affix and stem of English words?

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Blog via Email

See also

Top posts

Recent Comments

Archives

Meta

Ἡλληνιστεύκοντος

Set Union of Greek and Linguistics

Pages

Categories

Is there any NLP tool that can extract affix and stem of English words?

Share this:

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Blog via Email

See also

Top posts

Recent Comments

Archives

Meta