Subscribe to Blog via Email
October 2024 M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Tag: TLG
TLG Updates, May 2010
The TLG has just released the latest updates to its text collection. This is what has been added, from the oldest to the most recent texts, with Early Modern Greek texts separate: Philodemus (i BC): On Anger (ed. Indelli, 1988) Philodemus is a Hellenistic philosopher, who we know about mainly thanks to Mt Vesuvius, carbonising […]
What are the longest words of Greek?
Everyone knows (or should know) about the longest word of Greek ever—the word that broke the title bar of Wikipedia, Aristophanes’ fantastical dish of 17 ingredients at the end of the Ecclesiazusae, that lopado-temacho-thing: λοπαδοτεμαχοσελαχογαλεοκρανιολειψανοδριμυποτριμματοσιλφιολιπαρομελιτοκατακεχυμενοκιχλεπικοσσυφοφαττοπεριστεραλεκτρυονοπτοπιφαλλιδοκιγκλοπελειολαγῳοσιραιοβαφητραγανοπτερυγών (172 chars) Ah. It breaks blogspot too. 🙂 λοπαδοτεμαχοσελαχογαλεοκρανιολειψανοδριμυποτριμματοσιλφιολιπαρομελιτοκατακεχυμενοκιχλεπικοσσυφοφαττοπεριστεραλεκτρυονοπτοπιφαλλιδοκιγκλοπελειολαγῳοσιραιοβαφητραγανοπτερυγών (172 chars) Have you ever wondered what the next longest words […]
New TLG words in DGE VII
As I posted last month, the new volume of DGE (Diccionario Griego–Español) has appeared, spanning ἐκπελλεύω–ἔξαυος. As with any lexicographic work of an older language, some philology and textual emendation has been involved; this paper by Eugenio Luján Martínez gives four such instances, in Epicurus, Aretaeus, Nicander, and Galen. I have gone through this volume […]
Comparison, TLG BC and AD: log-likelihood
Helma Dik left a comment on my post on comparing TLG AD and BC through Wordle, suggesting I use Dunning’s Log-Likelihood measure of differential word frequencies in corpora, as Wordled by Martin Mueller. That lets you work out what the real shifts in frequency are, rather than trying to eyeball them through the aggregate word […]
TLG updates
The TLG has just released a new update to its corpus. As of tonight, the automatic recognition of lemmata in the TLG which I’ve been working on has just reached 95% of all wordforms. With these two milestones, I’ll be posting a few things about the current corpus; I’ve already put up some Wordles, as […]
Comparison, TLG BC and AD
In the previous post, I used Wordle to illustrate stop words in Greek (and, by the by, the exponential distribution of function words following Zipf’s Law). After getting rid of a whole bunch of stop words, I ended up with a Wordle of the lemmata of the TLG:But I stopped short of making sense of […]
Wordle and Greek stop words
Some of you may be familiar with Wordle, an online tool which displays the words in a text with different sizes, depending on their frequency. Wordle is a convenient tool for seeing what the frequently mentioned concepts are in a text, so it gets a fair amount of use in blogs. It’s the same concept […]
The 23 to 29 Apolloniuses of Classical Literature
I’m parking this posting here for lack of somewhere else to park it. (It’s not strictly language-related, but I’m realising philology posts are probably better pitched here than in The Other Place.) In my day-job capacity, I’m posting on the fluidity of identity in repositories—how, particularly if you’re relying on computer deduplication of identity, there […]
Lerna VIIc: Variants
The various counts of lemmata that I’ve been putting out for the last while have made little mention of the difficulty in deciding whether two forms belong to variants of the same lemma, or distinct lemmata. The judgement call is difficult enough within a homogeneous language, with slight variations in derivational morphology. It’s even worse […]
Lerna VIIb: Lemma counts and proportion of text recognised
We can keep dredging lemmata up to move towards a target of 300,000. But of course for a living language, as Modern Greek now is and as Ancient Greek once was, there is no ceiling in lemmata: people can always make up new words, and do. And because dictionaries will never exhaust what words people […]