Project for the Documentation of the Languages of Mesoamerica (PDLMA)
John Justeson and
Roberto Zavala Maldonado, directors
El Proyecto para la Documentación de las Lenguas de Mesoamérica (PDLMA)
This site presents the aims, history, and results of research by the
Project for the Documentation of the Languages of Mesoamerica,
internally known as the "Snake Jaguar Project". The pages that describe the aims and history of the Project, and instructions
for access to and use of posted materials, are updated at moderately frequent intervals.
The online databases
and the NO FRAMES version
that we are
making available will be updated only at intervals of about a year or more.
Papers by project members are posted
This page was last revised on April 17, 2001.
Aims and history
In 1993 we began a project to document the lexicon, phonology, and morphosyntax of selected Mije-Sokean languages, which by 1995
was extended to all living Mije-Sokean languages. Besides the value of the work for its own sake, this documentation was undertaken in
order to facilitate a reconstruction of the proto-Mije-Sokean protolanguage. This reconstruction, and the documentation of the individual
Mije-Sokean languages, was to serve as a resource for revising and extending the decipherment of Epi-Olmec writing (Justeson and
Kaufman 1993, 1996 , 1997).
In 1995 the Project began research on 5 [JCH, CHI, CHO, LCH, ZEN] of a projected 11 Sapotekan languages. These were to be
documented, the ancestral proto-Sapotekan language was to be reconstructed, and the reconstruction, along with the documentation of
the individual Sapotekan languages, was to help in the decipherment of Sapoteko hieroglyphic writing, which had been under way since
1992. In 1996 research on 4 more Sapotekan languages [ATE, ZAN, COA, YAI] was started. Work is not yet effectively under way on
CUI and YTZ. There are arguably more than 11 Sapotekan languages; they fall into 6 branches: since we could not reasonably
document them all, at least one language from each of the branches had to be documented, plus any additional languages that promised
to be straightforwardly useful for reconstructing proto-Sapotekan, proto-Sapoteko, and proto-Chatino. This set of languages contained
In 1997 we began work on Matlatzinka [MTL] and Mecayapan Gulf Nawa [MEC].
In 1998 we began work on Tlawika (Okwilteko) [TLW].
In 1999 we began work on Zongolica Nawa [ZNG], Huehuetla Tepewa [HUE] and Otlaltepec Popoloka [OTL].
In 2000 we will begin work on Zapotitlán Totonaco [ZPT] and Yatzachi-Zoogocho Northern Sapoteko [ZOO].
The preparation of a dictionary for each language was undertaken by a different linguist; some of these linguists were
advanced graduate students, others post-PhD professionals; one is a beginning graduate student.
A major feature of the Project is that a set of specialists in each language family is trained in the context of regular and
long-term interaction, helping to generate a body of lore that is tested through discussion and comparison of the results of
Although not all of these languages are radically underdocumented, it is fair to say that there is not yet a theory or model of
Mije-Sokean, Sapotekan, or Oto-Pamean grammar. We hope that such will eventuate from the work we have begun.
Languages being investigated and their abbreviations/codes
||Texistepec Gulf Sokean
||Soteapan Gulf Sokean
||Ayapa Gulf Sokean
||Santa María Chimalapa Soke
||San Miguel Chimalapa Soke
||Southern: San Baltasar Loxicha
|Nawa languages and dialects|
||Mecayapan Gulf Nawa
||Pajapan Gulf Nawa
||Chontla Eastern Huasteca Nawa
||Chicontepec Eastern Huasteca Nawa
||Los Ajos Eastern Huasteca Nawa
||Coxcatlán Western Huasteca Nawa
||Tampamolón Western Huasteca Nawa
||Tuzantla Western Huasteca Nawa
||Cuatlamayán Western Huasteca Nawa
Sources of funding
The work of the Project has been supported by major grants from The National Geographic Society [NGS] and The National Science
Foundation [NSF], with smaller amounts of narrowly-targeted and occasional funds from the University of Pittsburgh and
|NSF ||#SBR-9411247 ||(1994-1995)|
| ||#SBR-9511713 ||(1995-1998)|
| ||#SBR-9809985 ||(1998-2001)|
An online version of each Project dictionary will be accessible at this website. Hardcopy dictionaries for the Mije-Sokean languages
will be published in the monograph series of SUNY Albany's
Institute for Mesoamerican Studies,
which is distributed by the University
of Texas Press. We intend to arrange for hardcopy publication of dictionaries from the other language families at a later date.
These dictionaries do not claim to be complete, but it is fair to say that they are quite large. They range in size between 5000 and 10,500
lexical items. Some are the product of 5 solid months of elicitation [8 hours a day, 6 days a week]. Others are based on considerably
These dictionaries are novel/innovative in several ways.
Every effort has been made to collect all the morphemes of the language.
Identifying all the grammatical morphemes of a language is not a major problem. It requires careful elicitation and a fairly extensive
collection of texts in various genres.
Getting all the root morphemes requires testing for all possible roots; the possible root shapes must be tested in a fairly large number of
grammatical contexts in order to determine both their existence and their proper classification. Our procedures test for all possible
monosyllabic roots, and those disyllabic roots ending in a vowel: at least 95% of the total native root stock.
Systematic elicitation of ethnobiological and ethnomedical terminology and concepts is undertaken. Plant and animal names make up as
much as 25% of a neotropical language's lexicon.
Sound symbolism is studied: this consititutes a domain of variable size cross-linguistically, but one that fills out the evidence for, and
often tests, models of grammatical structure.
Other semantic domains are also investigated -- given names, surnames, nicknames, place names, kinship, astronomy.
All lexical material and all roots are carefully and repeatedly tested for their grammatical behavior, and the results of this testing used to
determine the classes of roots and lexemes, and these are encoded for each item. Outside of the Mayan field, this kind of research is
novel for Meso-American languages, or at least is rarely reported on.
Any existing material that has been collected by linguists and published or otherwise made available to us has been checked with our
Older documentation, especially from the colonial period, is gone over for what it might yield.
One feature that these dictionaries do not have is systematic exemplification of all lexemes. The examples that appear in these
dictionaries were extracted from texts, offered during elicitation, or elicited with the purpose of establishing grammatical or semantic
parameters/features. Different dictionaries coming out of the PDLMA have greater or lesser numbers of examples for the lexical
In all cases, the dictionaries are based on a thorough knowledge of the inflexional and derivational patterns of the language.
For many of the languages documented lexically and in texts by the PDLMA, preparations of grammatical descriptions have been
undertaken by the linguists primarily responsible for the dictionaries. The preparation of such grammars is not currently part of the
scope of the PDLMA, but may become so in the future if circumstances are favorable.
As a component of the Project, the reconstruction of Mije-Sokean phonology, grammar, and lexicon has been undertaken by Kaufman,
and has in fact been in progress since 1959, but not continuously. The reconstruction of Sapotekan has been under way since 1965 by
Kaufman, also not continuously, and it is hoped and expected that other Project members will be involved in this effort using the
documentation produced by the Project. For Nawa, a pan-dialectal dictionary is a logical outcome of the work of this Project, when
combined with existing materials from other forms of Nawa.
These dictionaries may appear in more than one edition. All the data that they contain is believed to be accurate. Data that has not been
fully checked out is omitted until fully verified; it will appear in later editions. On-line versions of the lexical databases are also being
The research plan for production of the dictionaries was designed by Kaufman, and Kaufman is the final editor of all the dictionaries.
Justeson has been responsible for overseeing the development of the databases, and configuring them for printing and on-line access.
As of the time of the start-up of this website,
 each online dictionary consists of several thousand lexical entries that are thoroughly edited and vetted. Some lexical material for the
language may be on hand that has not been thoroughly checked out and so is not available on this site;
 each dictionary is provided with an introduction that explains how it is set up and how to use it.
Eventually, each dictionary will contain a structural sketch of the language it represents.
Each dictionary will also contain an introductory chapter providing a structural outline of the language family the language belongs to.
Several of the dictionaries represent languages for which research is ongoing. Although it is believed that as of the time of their first
issue/posting, virtually all native lexical/root morphemes, and essentially all grammatical morphemes are included in the dictionary, it is
certain that more lexical material will be uncovered as more texts are collected and analyzed, and as additional semantic fields are
In light of this, it is expected that at least some of the lexical databases will be updated from time to time. Those interested in keeping
abreast of such updates should return to this website from time to time. It is unlikely that updates will be posted more often than once a
The grammatical classifications of lexemes, roots, and affixes are occasionally overdifferentiated and not all redundancies have been
eliminated, nor have all specifiable generalizations been worked out. The work is ongoing.
There are rather more instances than we would like of English glosses being not totally reliable. We will be working on remedying this
for the next edition. The data was gathered through the medium of Spanish, and the Spanish glosses we use were provided by the
speakers, and subsequently tweaked as needed. The dictionaries would be adequate if glossed only in Spanish: however, providing
English glosses has two advantages -- it makes the material accessible to non-users of Spanish who know English, and it requires the
analyst to focus on whether s/he really knows what the word means.
Users of these online dictionaries may feel that there are certain ways in which the structure and content of our postings could be improved.
We will be happy to receive such suggestions, at
We will acknowledge any suggestions that we follow and had not already
thought of beforehand. We will not necessarily acknowledge suggestions that we choose not to follow; however if any such suggestions
are made by several different people, we may offer a brief statement as to why we did not choose to follow such advice. In such cases
we will identify the proponents of the positions we do not accept only if they ask to be named.
The phonological representations of the languages documented by PDLMA are practical ASCII-based orthographies with a minimum
(ideally an absence) of non-linear diacritics.
The alphabetical order (sort order) used in the dictionary is cited at the bottom of each page.
The dictionary entry (not all fields are represented):
Lemmas (entry keys) are cited in alphabetical order in bold type at the left margin. All known morphemes, desinences, and lexical items
appear as entries. Roots that do not function as lexical items without some kind of derivational material being added to them are cited
with a preceding <%>. Different entries with the same spellings are distinguished with a following , , etc. Bound morphemes are
marked on either the left or right edge by a code for their status: compounded or incorporated (prepound, postpound, incorporee) <=>,
derivational/lexical <.>, inflexional <->, shifter <>>, clitic <+>.
The representation of the morpheme or lexical item is underlying, or at least with all phonological processes operating at their margins
under affixation or cliticization unpacked. Internal phonological processes may be unpacked or not, according to the decision of the
individual linguist, in consultation with the overall editor (Kaufman).
Surface phonology. When the surface phonology is fairly different from the underlying representation, it may be provided between
forward slashes /ABC/ when taxonomic phonemic, and between square brackets [ABC] when showing allophonic representation.
Variant forms. Unpredictable variant pronunciations are given, when known. As needed, they are also listed as lexical entries, but all
detailed information is cross-referenced to the "main" (or canonical) pronunciation.
Grammatical class. Next is provided a code indicating for a lexeme its inflexional behavior and certain aspects of its derivational
history. A separate key to grammatical classes is provided for each dictionary.
Principal parts. Any inflected or "shifted" forms needed to establish the grammatical behavior of the lexeme (and support the analysis
given) are cited, with codes as to their grammatical content following them in parentheses.
Gloss(es). Senses (distinguishable meanings) of a lexeme are subdivided 1, 2, 3, etc. Each sense is glossed in Spanish, then in English,
with a double forward slash between the Spanish and the English.
Synonyms. Known synonyms are cited, with reference to senses distinguished under Gloss(es), when necessary.
Semantic Field(s). The semantic field(s), especially of ethnobiological terms may be supplied, keyed to any multiple senses of the
Example(s). Examples, usually example sentences, are numbered as needed. They appear in either underlying or surface phonological
representation, or both. The Spanish and English glosses/translations of the examples are separated by a double forward slash.
Supplemental forms. Forms of a word that do not necessarily occur according to the class it belongs to, but if they do, do not create
new lexical items -- such as participles, gerunds, passives, and antipassives -- are cited under the main lexical form, and not given
separate entries unless they have special semantics or syntax.
Grammatical class(es) of supplemental form(s).
Gloss(es) of supplemental form(s).
Example(s) of supplemental form(s).
Historical source. When known we cite the historical source, whether an ancestral stage of the language in question, or the source of a
borrowing from a known (or suspected/hypothesized) other language.
Root(s). For each lexeme, all its roots are named, classified grammatically, and glossed (unless the lexeme is a single root). A separate
key to root classes/types is provided for each dictionary.
Morpheme-by-morpheme gloss. The morpheme-by-morpheme breakdown normally falls out of the representation of the lemma; the
morpheme-by-morpheme gloss is also provided here.
Cross-references. References may be made to places in the dictionary where more information or related information is to be found.
Data source. The source for the data in the dictionary is given, whether collected by the compiler, or found in an earlier source. A
separate key to data sources is provided for each dictionary.
Superordinate forms. The immediately antecedent lexical form(s) to the current lexical entry is (are) named. Any further information
about them should be sought at their position in the lexicon.
Subordinate forms. Lexical items that are based on the current lexical entry are named. Any further information about them should be
sought at their position in the lexicon.
Click to learn how to access
the online databases and
the NO FRAMES version.