Moby Multiple Language Lists of Common Words
Moby Multiple Language Lists of Common Words
This is a monster of a resource: hundreds of thousands of words across five languages, compiled into accessible reference lists. The French section alone contains over 138,000 entries; German tops 160,000. Japanese, Italian, and Spanish add hundreds of thousands more. What makes this work remarkable is not just its scale but its utility. Originally created as part of the Moby lexical project and released into the public domain, these lists serve anyone who needs comprehensive vocabulary coverage: translators building bilingual databases, linguists studying word frequency, programmers developing language processing tools, or polyglots constructing their own study materials. The format is deliberately plain text, stripped of unnecessary formatting so the data remains portable and searchable. There is no narrative here, no argument to follow, just pure lexical raw material waiting to be used. For language learners who have exhausted basic frequency lists and need the next layer of vocabulary, or for anyone building linguistic resources, this compilation offers raw material that would take decades to assemble manually. It is less a book to read than a tool to use.
About Moby Multiple Language Lists of Common Words
Chapter Summaries
- Documentation Notes
- The documentation opens with a declaration that the MOBY Language II software and database are public domain material, granted by the author in January 2001. This establishes the legal status and free availability of the word lists.
- Historical Note
- This section provides historical context about the Ward word lists, noting they were among the largest public domain word lists when added to Project Gutenberg in 2007. It explains the technical limitations of the era, including the use of phonetic spelling with backslashes instead of proper Unicode characters. The note acknowledges these lists are no longer state-of-the-art but may still have utility.
- Contents and Quick Start
- The final section details the practical contents of MOBY Language II, listing word counts and file sizes for French, German, Italian, Japanese, and Spanish vocabularies. It provides installation instructions for MSDOS systems and notes the total collection contains 560,101 words across 5.9 megabytes.
Key Themes
- Public Domain Knowledge
- The work represents a philosophy of freely shared linguistic resources, with the author explicitly granting public domain status to enable unrestricted use and distribution.
- Technological Evolution
- The documentation captures a moment in computing history where ASCII limitations required creative workarounds for representing non-English characters, highlighting how technology shapes our ability to preserve and share language.
- Multilingual Preservation
- The collection preserves vocabulary from five major world languages, representing an early effort at comprehensive multilingual digital archiving.
Characters
- Grady Ward(mentioned)
- The author and creator of the MOBY Language word lists. He granted the work to the public domain in January 2001.











