Mesoamerican Languages Documentation Project:
the dictionaries

Terrence Kaufman and John Justeson, directors

An online version of each Project dictionary will be accessible online at this website. Hardcopy dictionaries for the Mije-Sokean languages will be published in the monograph series of SUNY Albany's Institute for Mesoamerican Studies, which is distributed by the University of Texas Press. We intend to arrange for hardcopy publication of dictionaries from the other language families at a later date.

Data collection

These dictionaries do not claim to be complete, but it is fair to say that they are quite large. They range in size between 4000 and 8000 lexical items. Some are the product of 5 solid months of elicitation [8 hours a day, 6 days a week]. Others are based on considerably more elicitation.

These dictionaries are novel/innovative in several ways.

Every effort has been made to collect all the morphemes of the language.

Identifying all the grammatical morphemes of a language is not a major problem. It requires careful elicitation and a fairly extensive collection of texts in various genres.

Getting all the root morphemes requires testing for all possible roots; the possible root shapes must be tested in a fairly large number of grammatical contexts in order to determine both their existence and their proper classification. Our procedures test for all possible monosyllabic roots, and those disyllabic roots ending in a vowel: at least 95% of the total native root stock.

Systematic elicitation of ethnobiological and ethnomedical terminology and concepts is undertaken. Plant and animal names make up as much as 25% of a neotropical language's lexicon.

Sound symbolism is studied: this consititutes a domain of variable size cross-linguistically, but one that fills out the evidence for, and often tests, models of grammatical structure.

Other semantic domains are also investigated -- given names, surnames, nicknames, place names, kinship, astronomy.

All lexical material and all roots are carefully and repeatedly tested for their grammatical behavior, and the results of this testing used to determine the classes of roots and lexemes, and these are encoded for each item. Outside of the Mayan field, this kind of research is novel for Meso-American languages, or at least is rarely reported on.

Any existing material that has been collected by linguists and published or otherwise made available to us has been checked with our language consultants.

Older documentation, especially from the colonial period, is gone over for what it might yield.

One feature that these dictionaries do not have is systematic exemplification of all lexemes. The examples that appear in these dictionaries were extracted from texts, offered during elicitation, or elicited with the purpose of establishing grammatical or semantic parameters/features. Different dictionaries coming out of the PDLMA have greater or lesser numbers of examples for the lexical material.

In all cases, the dictionaries are based on a thorough knowledge of the inflexional and derivational patterns of the language.

For many of the languages documented lexically and in texts by the PDLMA, preparations of grammatical descriptions have been undertaken by the linguists primarily responsible for the dictionaries. The preparation of such grammars is not currently part of the scope of the PDLMA, but may become so in the future if circumstances are favorable.

As a component of the Project, the reconstruction of Mije-Sokean phonology, grammar, and lexicon has been undertaken by Kaufman, and has in fact been in progress since 1959, but not continuously. The reconstruction of Sapotekan has been under way since 1965 by Kaufman, also not continuously, and it is hoped and expected that other Project members will be involved in this effort using the documentation produced by the Project. For Nawa, a pan-dialectal dictionary is a logical outcome of the work of this Project, when combined with existing materials from other forms of Nawa.

These dictionaries may appear in more than one edition. All the data that they contain is believed to be accurate. Data that has not been fully checked out is omitted until fully verified; it will appear in later editions. On-line versions of the lexical databases are also being made available.

The research plan for production of the dictionaries was designed by Kaufman, and Kaufman is the final editor of all the dictionaries. Justeson has been responsible for overseeing the development of the databases, and configuring them for printing and on-line access.

Online editions of the dictionaries

As of the time of the start-up of this website,

[1] each online dictionary consists of several thousand lexical entries that are thoroughly edited and vetted. Some lexical material for the language may be on hand that has not been thoroughly checked out and so is not available on this site;

[2] each dictionary is provided with an introduction that explains how it is set up and how to use it.

Eventually, each dictionary will contain a structural sketch of the language it represents.

Each dictionary will also contain an introductory chapter providing a structural outline of the language family the language belongs to.

Several of the dictionaries represent languages for which research is ongoing. Although it is believed that as of the time of their first issue/posting, virtually all native lexical/root morphemes, and essentially all grammatical morphemes are included in the dictionary, it is certain that more lexical material will be uncovered as more texts are collected and analyzed, and as additional semantic fields are looked into.

In light of this, it is expected that at least some of the lexical databases will be updated from time to time. Those interested in keeping abreast of such updates should return to this website from time to time. It is unlikely that updates will be posted more often than once a year.

The grammatical classifications of lexemes, roots, and affixes are occasionally overdifferentiated and not all redundancies have been eliminated, nor have all specifiable generalizations been worked out. The work is ongoing.

There are rather more instances than we would like of English glosses being not totally reliable. We will be working on remedying this for the next edition. The data was gathered through the medium of Spanish, and the Spanish glosses we use were provided by the speakers, and subsequently tweaked as needed. The dictionaries would be adequate if glossed only in Spanish: however, providing English glosses has two advantages -- it makes the material accessible to non-users of Spanish who know English, and it requires the analyst to focus on whether s/he really knows what the word means.

Users of this website may feel that there are certain ways in which the structure and content of our postings could be improved. We will be happy to receive such suggestions, at [email protected]. We will acknowledge any suggestions that we follow and had not already thought of beforehand. We will not necessarily acknowledge suggestions that we choose not to follow; however if any such suggestions are made by several different people, we may offer a brief statement as to why we did not choose to follow such advice. In such cases we will identify the proponents of the positions we do not accept only if they ask to be named.


The phonological representations of the languages documented by PDLMA are practical ASCII-based orthographies with a minimum (ideally an absence) of non-linear diacritics.

Page layout

The alphabetical order (sort order) used in the dictionary is cited at the bottom of each page.

The dictionary entry(not all fields are represented):

Lemmas (entry keys) are cited in alphabetical order in bold type at the left margin. All known morphemes, desinences, and lexical items appear as entries. Roots that do not function as lexical items without some kind of derivational material being added to them are cited with a preceding <%>. Different entries with the same spellings are distinguished with a following [1], [2], etc. Bound morphemes are marked on either the left or right edge by a code for their status: compounded or incorporated (prepound, postpound, incorporee) <=>, derivational/lexical <.>, inflexional <->, shifter <>>, clitic <=>.

The representation of the morpheme or lexical item is underlying, or at least with all phonological processes operating at their margins under affixation or cliticization unpacked. Internal phonological processes may be unpacked or not, according to the decision of the individual linguist, in consultation with the overall editor (Kaufman).

Surface phonology. When the surface phonology is fairly different from the underlying representation, it may be provided between forward slashes /ABC/ when taxonomic phonemic, and between square brackets [ABC] when showing allophonic representation.

Variant forms. Unpredictable variant pronunciations are given, when known. As needed, they are also listed as lexical entries, but all detailed information is cross-referenced to the "main" (or canonical) pronunciation.

Grammatical class. Next is provided a code indicating for a lexeme its inflexional behavior and certain aspects of its derivational history. A separate key to grammatical classes is provided for each dictionary.

Principal parts. Any inflected or "shifted" forms needed to establish the grammatical behavior of the lexeme (and support the analysis given) are cited, with codes as to their grammatical content following them in parentheses.

Gloss(es). Senses (distinguishable meanings) of a lexeme are subdivided 1, 2, 3, etc. Each sense is glossed in Spanish, then in English, with a double forward slash between the Spanish and the English.

Synonyms. Known synonyms are cited, with reference to senses distinguished under Gloss(es), when necessary.

Semantic Field(s). The semantic field(s), especially of ethnobiological terms may be supplied, keyed to any multiple senses of the lexeme.

Example(s). Examples, usually example sentences, are numbered as needed. They appear in either underlying or surface phonological representation, or both. The Spanish and English glosses/translations of the examples are separated by a double forward slash.

Supplemental forms. Forms of a word that do not necessarily occur according to the class it belongs to, but if they do, do not create new lexical items -- such as participles, gerunds, passives, and antipassives -- are cited under the main lexical form, and not given separate entries unless they have special semantics or syntax.

Grammatical class(es) of supplemental form(s).

Gloss(es) of supplemental form(s).

Example(s) of supplemental form(s).

Historical source. When known we cite the historical source, whether an ancestral stage of the language in question, or the source of a borrowing from a known (or suspected/hypothesized) other language.

Root(s). For each lexeme, all its roots are named, classified grammatically, and glossed (unless the lexeme is a single root). A separate key to root classes/types is provided for each dictionary.

Morpheme-by-morpheme gloss. The morpheme-by-morpheme breakdown normally falls out of the representation of the lemma; the morpheme-by-morpheme gloss is also provided here.

Cross-references. References may be made to places in the dictionary where more information or related information is to be found.

Data source. The source for the data in the dictionary is given, whether collected by the compiler, or found in an earlier source. A separate key to data sources is provided for each dictionary.

Superordinate forms. The immediately antecedent lexical form(s) to the current lexical entry is (are) named. Any further information about them should be sought at their position in the lexicon.

Subordinate forms. Lexical items that are based on the current lexical entry are named. Any further information about them should be sought at their position in the lexicon.

You may search the following online dictionaries:
  • Oluta Popoluca
  • San Miguel Chimalapa Zoque

    This page was last revised on February 20, 1998