Subscribe to Blog via Email
Comparison, TLG BC and AD: log-likelihood
Helma Dik left a comment on my post on comparing TLG AD and BC through Wordle, suggesting I use Dunning’s Log-Likelihood measure of differential word frequencies in corpora, as Wordled by Martin Mueller. That lets you work out what the real shifts in frequency are, rather than trying to eyeball them through the aggregate word counts.
Here for instance is his comparison of the Iliad to the Odyssey—which words are more frequent in the one, or the other:
I looked up Ted Dunning’s paper, failed to understand it 🙁 , and used instead the walkthrough of the computation on the user manual of the Wordhoard corpus software package.
And this is the more statistically sound Wordle comparison. Words more frequent BC are in red, words more frequent AD are in black. I’m leaving in stop words this time, and not cleaning up the ambiguity, because this says some interesting things about the changes in Greek grammar between Classical and Late Greek. Do click:
Here’s my impressionistic notes, that haven’t already been covered in the previous post (where I was working through rankings):
- Both corpora talk about θεός God, but the big jump, of course, is Χριστός Christ. The second biggest jump is in ἅγιος holy, displacing ἱερός. (Was ἱερός too pagan-sounding?)
- But the biggest discrepancy between BC and AD Greek is the avoidance of δέ but, on the other hand, followed by avoidance of μέν on the one hand. That tells you that AD Greek used different sentence structures, such as a lot more ἀλλά but. Tucked away, there’s also more καί and (i.e. more coordinating constructions) and a lot less τε and (a very archaic phrase-second construction).
- There are a lot more ἤγουν and τουτέστι that is, and a lot less ἐάν if and ἄρα therefore; I’m tempted to think that says something about changing rhetoric in the genres popular in the respective periods—less logic, more exemplification. It’s foolhardy, but not impossible.
- There is a lot more τίς who? being reported, and that’s an error in ambiguity, but it’s an illuminating error. τοῦ in Attic (though not Late Greek) is ambiguous between “whose?”, and the genitive definite article. And there are a lot more definite articles in Late Greek, as you can see by the black ὁ. (My friend Io Manolessou actually wrote her PhD on that shift; nice to see it visually confirmed.)
- There’s also more ἵνα in order to, which suggests Late Greek was already moving towards more subjunctive constructions rather than participles and infinitives, even before Early Modern Greek made the switch completely.
- Clearly less ὦ Ο!—A very Classical way of addressing people.
- Some of the odder looking words more prevalent in BC Greek are there because there are a lot more geometric texts in the BC corpus: Ἄβ is actually mistakenly picking up the line ΑΒ, and you can also see in smaller print ΑΒΓ, ΒΔ, ΓΔ, ΕΖ, ΞΖ.
Hm. Yes, that was somewhat more illuminating. Thanks, Helma!