Subscribe to Blog via Email
November 2020 M T W T F S S « Mar 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Lerna IIIa: Why we do not count word instances
This blogpost in the ongoing thread on the Lernaean Text and counting words in Greek (see Lerna II, Lerna I) may be misdirected to the readership of this blog. It goes through basic notions in linguistics that some of you will be familiar enough with to be annoyed at. And given how the Lernaean text has been propagated, this post should be in Greek. Then again, Nikos Sarantakos has been posting brilliantly about it in Greek for a decade—not that this has singed off any of the heads of the Hydra, because it really is a Hydra. From Lerna.
Still, I have to ritually cast out the implications of “my language has more words than your language, nyuh”, before I start counting the words of Greek on the record. And obvious statements are worth writing down too. Especially despite them not being obvious down at the fevered swamps of Lerna. Besides, I have a mission statement for the blog: “making Greek more googleable” (through English).
I arrived at this mission statement in corresponding with my one-time student Matt Treyvaud, who has long been making Japanese more googleable at No-Sword. Read ye his blog, for it is hale: thou hast exceeded thy sensei! 🙂
The previous post already goes into reasons why the size of the corpus you’re using doesn’t mean all that much. The millions of times /malaka/ gets said daily (let alone the hundreds of millions of times /fʌkɪŋ/ gets said) does not outweigh the smaller number of words surviving from Greek antiquity. The fact that five times more people speak French than Dutch does not make French a five times better language. The fact that the Mahabharata is ten times longer than the Iliad does not make it an inherently better poem. Nor an inherently worse poem.
We could go on with this. Let’s. And let’s go to the real point of promoting the size of the TLG corpus in the Lernaean text: the fact that this is not a count of any old words, but of the words of Classical Greek literature.
Size as a metric for literary quality. Doesn’t sound convincing, does it? Of course, that’s not why the figure of 90 million got inserted: it got inserted because the writers really had no idea what the difference is between a word instance count and a lemma count. Among all the other things they had no idea about. But let’s spend some paragraphs on this strawman anyway.
Reading the erudite though self-important Esperanto literary journal Literatura Foiro, I came across a quote from Italo Calvino that obviously a widely spoken language would produce greater literature than a small language like Bulgarian. I don’t know anything about Italo Calvino, and after that quote, I didn’t care to. I did recently ask a friend who did know something about Italo Calvino, and it makes sense, given his conscious cosmopolitanism, that Italian would be better at producing the kind of literature he valued than would Bulgarian. The response to that of course is, English would be a lot better still, if contemporary cosmopolitanism is your primary aesthetic criterion. It’s not the only aesthetic criterion in existence, and it’s not like Calvino wrote in English anyway; so I wouldn’t use population counts or readership size or extent of bilingualism to invalidate as inferior the literature of Bulgarian. Or Italian. Or Greek. Or Esperanto.
Besides, what’s “greatness” about? Esperantists were hankering for an Epic of their own, and were overjoyed when William Auld gave them The Infant Race (La Infana Raso) in 1956—although, it being 1956, the cantos it sounded like were Pound’s not Alighieri’s, and it eulogised the perpetuation of the species, not the rage of the son of Thetis. The poem is good; Auld did good short poems too, though they’re not why he kept being nominated for the Nobel Prize. But the jewel of Esperanto Modernism was Victor Sadler’s Self-Criticism (Memkritiko) in 1968; and it was a jewel because it thought Small, not Big. Greek readers whose eyes are glazing over about now might want to compare how much pound per verse you get out of Palamas and Kazantzakis, versus Cavafy and Karyotakis. The fact that they wrote Big is not an argument against Palamas and Kazantzakis, any more than it is against the Mahabharata. But it decidedly isn’t an argument against Cavafy and Karyotakis, either.
Even within the TLG’s ambit, the size of the Byzantine corpus is bigger than the Ancient corpus. A lot bigger. All up, it’d be at least 10 times bigger, depending on how you count. Now, that does not mean the Byzantine corpus is worthless: judgement has been severe on Byzantine literature, and the artificiality of the learnèd language did not help, but a thousand years of writing did not produce nothing. Still, students don’t enrol in Ancient Greek classes to read Theodore Prodromus or John Chrysostom. It would be cool if they did, but they don’t. They enrol to read Plato and Homer, or Mark and Paul. If they do read Prodromus and Chrysostom, it’s after they’ve read Plato and Mark. And there’s a lot less of Plato, or Mark, than of John Chrysostom.
Which brings us back to the actual Classical Greek corpus. As we’ll see, the TLG is not just Classical but Byzantine Greek, and the actual Classical Greek corpus that has survived is not 90 million words: not even close. What did survive, survived precisely because of how great its impact was, and the impact was out of proportion to its word count. Nor is the actual Classical Greek corpus particularly prolix: at its best, it used words carefully and frugally. It wasn’t in a race to come up with lots of words: Aristophanes has a little fun now and again, but he doesn’t go to town like Constantine of Rhodes did
The literary corpus from Homer to Aristotle is 5 million words, not 90. Do we really want to say that makes it 18 times less important?
OK, we now put such grocers’ calculations aside. And we move to other grocers’ calculation in following posts. (Lerna IIIa is the first of four.)
To take us out, some Constantine of Rhodes, which I cited in the Entertaining Tale of Quadrupeds, pp. 91-92, to illustrate… well, the race to come up with lots of words. Leo Choerosphactes, you must have really gotten beneath this guy’s skin: