Lay Summary
This paper studies how the Generic Preposition of Epigraphic Mayan developed during the Classic period (ce 200–900). Reconstructed as *tja to Proto-Mayan, it had shifted to *tə in Proto-Ch’olan, but soon after had diverged into two variants, tə and ti, with the former already attested during the Late Preclassic period (300 bce-ce 200), and the latter appearing first around ce 379 and ce 416. The paper traces the spread of the innovative variant ti across time and space, and also attempts to assess whether social factors influenced its distribution. Because of the paucity of information on the social profiles of the scribes, the paper utilizes proxies —indirect means of assessing social factors. The paper shows that the Generic Preposition variable was unstab…
Lay Summary
This paper studies how the Generic Preposition of Epigraphic Mayan developed during the Classic period (ce 200–900). Reconstructed as *tja to Proto-Mayan, it had shifted to *tə in Proto-Ch’olan, but soon after had diverged into two variants, tə and ti, with the former already attested during the Late Preclassic period (300 bce-ce 200), and the latter appearing first around ce 379 and ce 416. The paper traces the spread of the innovative variant ti across time and space, and also attempts to assess whether social factors influenced its distribution. Because of the paucity of information on the social profiles of the scribes, the paper utilizes proxies —indirect means of assessing social factors. The paper shows that the Generic Preposition variable was unstable, a case of change-in-progress, during the Classic period, and that its popular spread may have started in the Southeast region, at the cities of Copan and Quirigua, and it also shows that social factors were likely important in its adoption, and that it can be defined as a “change from above,” likely promoted by scribes in more official or formal contexts. Last, thanks to the distribution of the conservative variant tə, the paper also reveals a previously unidentified pattern pointing to the influence of the scribes and elites from the West region of the Maya lowlands, where the ancestors of the contemporary Yokot’an language resided, on the inscriptions and probably also politics of the Northern lowlands region, where the ancestors of the contemporary Yucatecan languages resided at the time.
Introduction
This paper investigates the historical sociolinguistics of Epigraphic Mayan (henceforth EMY, after its ISO 639-3 code, emy). EMY is a “logosyllabic” writing system that was innovated and used between ca. 400 bce–ce 1700 by Ch’olan(-Tzeltalan) and Yucatecan speakers, two of the subgroups of the Mayan language family, primarily in the Maya lowlands region (in parts of southeastern Mexico, northern Guatemala, Belize, eastern Honduras).[1] More specifically, as part of a broader project investigating multiple scriptal and linguistic variables (Mora-Marín, 2011, 2017, 2019, 2020, 2021a, 2021b, 2023a, 2025a, 2025b, n.d.), this paper studies the spread of the innovative variant of a morphological variable —the Generic Preposition (GP), tä (tə) ~ ti— in the Maya lowlands during the Classic period (ca. ce 200–900), and its possible sociohistorical associations and motivations.
Historical sociolinguistics can be traced to the articulation of Weinreich, Labov, and Herzog’s (1968) “structured” or “orderly heterogeneity,” one that is not only linguistically but also socially motivated, and one that considers the problems of transition, embedding, and evaluation of innovations (Romaine, 2005, p. 1696; Roberge, 2006, p. 2310). To investigate such problems, the temporal dimension is crucial. Assuming that “synchronic variation of the type investigated by sociolinguists represents a stage in long term change” (Romaine, 2005, p. 1696), a real-time variationist study would contribute to both sociolinguistics and historical linguistics, and in the process further contribute to our understanding of the sociocultural history of a society. Ironically, Nevalainen and Raumolin-Brunberg (2003, p. 56) have argued, it is such a real-time approach that has been neglected in historical linguistics. Those authors have even defined historical sociolinguistics as “the real-time dimension of sociolinguistics” (Nevalainen; Raumolin-Brunberg, 2012, p. 26), a discipline that could very well fill in the “real-time” gap.
In this regard, EMY texts offers an exciting testing ground for historical sociolinguists: 1) there is a readily accessible and comprehensive online database of EMY texts, the Maya Hieroglyphic Database (MHD) by Looper and Macri (1991–2025); 2) a majority of texts bear absolute dates correlated with the Gregorian calendar, allowing for precise characterizations of the real-time distributions of variables; 3) though texts tend to be brief, they span a wide temporal and geographic range, allowing for comprehensive treatments of regional developments; 4) despite ongoing debates and uncertainties about linguistic affiliations, historical stages, and orthographic conventions, a great deal of scriptal and linguistic variation is attested in the texts, making for a fruitful corpus for historical sociolinguistics research; 5) few language families of the continent are as thoroughly studied from a historical linguistic perspective as the Mayan language family; and 6) both the linguistic typology of Mayan languages and the logosyllabic nature of the writing system offer a refreshing counterweight to the abundance of Indo-European and alphabetic case studies that are the norm in the field. Thus, EMY texts offer opportunities to investigate real-time variation and change by means of comprehensive datasets for both scriptal and linguistic variables, while representing a refreshing comparative case. These characteristics entice us to focus on the linguistic variables that are most amenable to study in the datasets, the patterns that such variables exhibit, and their association with factors relevant to sociocultural and political processes revealed by the content of EMY texts, rather than factors imposed, a priori, on the basis of contemporary Western social categories. This is, at least in part, what a data-driven approach calls for, as proposed by Lauersdorf (2018, p. 209–210). There exist already highly fruitful examples of data-driven approaches in the Mayanist literature, such as that by Munson et al. (2016) and Munson, Looper, and Scholnick (2024), which employ sophisticated quantitative methods to identifies “ritual networks” and diffusion of ritual terms along such networks based on patterns in the hieroglyphic data, though these examples sometimes conflate graphemic and linguistic variables.
Given the aforementioned objectives of the field and the nature of the EMY corpus, this paper has three objectives: 1) to characterize the real-time diachronic and geographic distribution of a the GP morphological variable; 2) to assess to what extent that variable can be correlated with linguistic and social factors, or in the latter case, their proxies; and 3) to reconcile the epigraphic evidence with the results from historical linguistics, as well as the known sociocultural and political processes and events of the Maya lowlands during the Classic. The broader goal of the paper is to illustrate the application of an exploratory historical sociolinguistics framework, and the preparation and quantitative analysis of comprehensive datasets based on the MHD, while testing previous proposals for the temporal and geographic distribution of the GP variable.
The paper is organized as follows. Section (1) provides necessary background to the study of EMY, Classic Mayan society, and the linguistic varieties of relevance. Following this, section (2) introduces the GP variable, the linguistic and orthographic assumptions, and the statistical methods for analyzing them, and the definition of proxies for social factors. Section (3) presents the results of the statistical analyses, beginning with the temporal and geographic distribution of the GP variable, followed by statistical results relevant to linguistic and social factors influencing its distribution. This section also contains a detailed discussion of some interesting traits of the Northern region with likely significant historical sociolinguistic implications. Section (4) discusses the implications of the results in light of prior research on this variable, the reconciliation with the historical and comparative data. Finally, section (5) offers conclusions and directions for future research.
1. Background to Epigraphic Mayan
1.1. Chronology, Geography, Linguistic Diversity
Epigraphers divide the history of EMY in three periods, p. Late Preclassic (400 bce-ce 200), Classic (ce 200-900), Postclassic (ce 900-1521). The Classic period is further subdivided into Early Classic (ce 200-600) and Late Classic (ce 600-900), with the latter —especially the second half of the eighth century ce— constituting the peak of text production (Looper et al., 2015; Looper; Macri, 2022, p. 3). The Terminal Classic (ce 800–950) is another category used in discussions of the decline and collapse of the southern Maya lowland polities. The Postclassic saw a sharp drop in text production, with only a very few stone inscriptions known from this period, and the primary sources being the four surviving paper books, known as codices(códices).
****The present paper applies a regional categorization following that in Munson and Macri (2009, Fig. 5), illustrated in Figure 1. This characterization is not arbitrary: Munson and Macri identified these regions on the basis of frequency of interactions (i.e. relative number of interactions among sites), and so, they serve as a preliminary definition of broad (multi-site) interaction networks. The following labels will be used: Northern, Central, Eastern, West, Usumacinta, Pasion, Southwest, and Southeast. Given the dearth of data from the Southwest region (with only a single text from the site of Chinkultic represented in the datasets analyzed in this paper), and the high frequency of data from the Southeast region (e.g. Copan, Quirigua), the one example from the Southwest has been excluded, and the term Southern has been applied to the Southeast region.
Figure 1. **Figure 1.**Map of the Maya region. Regional divisions follow those in Munson and Macri (2009, p. 434, Fig. 6b). Used with permission of those authors.
Figure 2, from Josserand (2011, p. 170, Fig. 6.5), presents the distribution ca. ce 1500 of all the Mayan languages except Huastec/Wasteko (located far to the northwest, in the Huasteca region of northern Veracruz). The region corresponding to the archaeological Maya sites that can be characterized as part of “Lowland Mayan” society or civilization overlaps primarily with the region where Ch’olan (Ch’ol, Chontal/Yokot’an, Acalá Chol, Manche Chol, Ch’olti’, and Ch’orti’) and Yucatecan (Yucatec, Lacandon, Itzaj, Mopan) languages are spoken. Nevertheless, as Justeson et al. (1985) and subsequent authors have shown, other Mayan languages participated in the Lowland Mayan interaction, resulting in a Greater Lowland Mayan interaction sphere that included also Tzeltalan (Tzeltal, Tzotzil), some Greater Q’anjob’alan (especially Chujean, including Chuj and Tojol Ab’al), and some Greater K’ichee’an (especially K’ichee’, Poqom, and Q’eqchi’). During the Classic period the Ch’olan languages likely formed a continuous northwest-to-southeast strip across the lowlands, with both Ch’olan and Yucatecan speakers along the northern part of the strip, and Tzeltalan speakers in the highlands of Chiapas, in the southwestern part of the strip.[1]
Figure 2. Figure 2. Map of the Maya region showing distribution of Mayan languages ca. ce 1500, in relation to some of the major archaeological sites of relevance to this paper. Ch’olan languages are shown in black rectangles, Tzeltalan in dark grey, and Yucatecan in light grey. Used with permission from Nicholas Hopkins, after Josserand (2011, p. 170, Fig. 6.5).
1.2. Mayan Historical Linguistics
The classification of the Mayan language family assumed in the present paper is that by Kaufman (1976, 2015, 2017), seen in Figure 3. To understand the lexicon and grammar of EMY texts, the most important subgroups are Ch’olan-Tzeltalan and Yucatecan.[1] Figure 3 also shows the split of the Ch’olan-Tzeltalan into Ch’olan and Tzeltalan. It also shows the split of Ch’olan into Eastern (Ch’olti’-Ch’orti’) and Western (Ch’ol-Yokot’an) branches proposed by Kaufman and Norman (1984), supported with additional data in Mora-Marín (2009a, 2009b) and Law (2009). A Ch’olan variety known from a Colonial manuscript called Acalan, closely affiliated with Yokot’an, is also of relevance, but not illustrated in Figure 3.
Figure 3. Figure 3. Tree classification of the Mayan languages by Kaufman (2017, pp. 66–67), prepared by John Justeson and available at https://www.academia.edu/37842946/Justeson_Mayan_classification_for_Kaufman_2017_fig_2_pdf.
Kaufman (1976, 2017), Kaufman and Justeson (2007, 2008), and Dahlin, Quizar, and Dahlin (1987) have synthesized a variety of sources of evidence —archaeological, environmental, historical linguistic, lexicostatistic, epigraphic. They generally agree, proposing a differentiation of Ch’olan into its Eastern and Western branches by ca. ce 500/600. Dahlin et al. (1987, p. 368), correlate this split with the major settlement failures (and associated population movements) that took place during the Terminal Preclassic-to-Early Classic transition (around ce 100–500). Dahlin et al. (1987, p. 367–368) further posit another wave of linguistic differentiation events following the Terminal Classic-to-Early Postclassic transition (ca. ce 900–1300), following the even more dramatic settlement failures associated with the decline and collapse of centralized rulership throughout the southern Maya lowlands (i.e. the Maya lowlands minus the Northern region).
There remains much disagreement among epigraphers regarding the nature of the linguistic varieties that influenced the development of EMY, specifically, whether Classic texts reflect linguistic traits pointing to an undifferentiated Ch’olan language, corresponding to a Proto-Ch’olan stage (Justeson, Fox, 1989; Mora-Marín, 2003, 2009a; Mora-Marín; Hopkins; Josserand, 2005, 2009a), or a post-differentiation variety, whether a Western Ch’olan variety (Hopkins, 1985; Josserand; Hopkins, 2002) or an Eastern Ch’olan (“Classic Ch’olti’an”) variety (Robertson, 1998; Houston; Robertson; Stuart, 2000; Hruby, 2002). Epigraphers generally recognize a high degree of “uniformity” in EMY texts throughout the Maya lowlands, some adopting the concept of a “conservative” or “traditional” basis of EMY writing based on Ch’olan followed by its adoption and adaptation by Yucatecan and possibly also Tzeltalan speakers (Justeson; Fox, 1989), others referring to a “standard” or “prestige” written language based on Eastern Ch’olan referred to as “Classic Ch’olti’an” (Houston; Robertson; Stuart, 2000). Josserand and Hopkins (2002, p. 357) compare the situation in the Maya lowlands during the Classic period to that of “medieval Latin in Europe, where a codified standard was kept from changing while the Latin vernaculars evolved into the Romance family of languages” (2002, p. 358). These authors suggested a diglossic situation was in place, one in which “the older Maya language of Yucatan [Yucatecan] provided a linguistic substratum that was overlaid by a later influx of population that spoke an early form of Cholan Maya” (2002, p. 358); that Yucatecan substratum, Hopkins (1984, 1985) has argued on the basis of morphological traits (i.e. ergative and absolutive pronominal agreement markers), influenced the Ch’olan-Tzeltalan superstratum, resulting in the differentiation between Ch’olan and Tzeltalan speakers.
The present paper will not attempt a resolution of the historical stage, the nature of the uniformity of the written language, or the question of superstratum/substratum acculturation. Instead, this paper will offer observations on how the results of the present analysis would be interpreted under a pre-differentiation model versus a post-differentiation model, and if relevant, how they may reflect evidence of such contact between speakers of different varieties.
1.3. Linguistic Structure of the Writing System
Mayan writing reflects the basic structural characteristics of Mayan languages in general: VOS/VOA order in transitive clauses, VS order in intransitive clauses, predicate-initial order in non-verbal clauses, general typological patterning for VO languages (except for the common “exception” of adjectives before nouns), agglutinating word morphology, morphological ergativity (ergative markers on transitive verbs for A arguments, absolutive markers on transitive verbs for O arguments and intransitive verbs for S arguments), some syntactic ergativity (certain constructions apply only to absolutive S/O arguments, excluding A arguments), and evidence of “status” marking on verbs (i.e. transitives and intransitives are distinguished by means of portmanteau suffixes that code transitivity, aspect, mood, and main/subordination status all at once), among others.[1]
Phonologically, the written language agrees very closely with what is known of early stages of the contemporary Ch’olan(-Tzeltalan) and Yucatecan languages, as systematically laid out by Fox and Justeson (1982) and Justeson and Fox (1989), and supported by many studies since then. Table 1 provides the Proto-Ch’olan sound inventory as reconstructed by Kaufman and Norman (1984), mostly representative of the phonological structure of EMY writing. Nevertheless, the script lacks evidence for a sixth vowel, *ä, which means that it was probably innovated prior to the shifts of pre-Ch’olan *a: > Proto-Ch’olan *a, and pre-Ch’olan *a > Proto-Ch’olan *ä.[1] Also, so far, the script lacks evidence of a distinct set of p’V syllabograms, suggesting that EMY was innovated prior to the development of */p’/ from instances of both /b’/ and /p/, a development that was likely an instance of areal diffusion involving distinct Ch’olan, Tzeltalan, and Yucatecan speech communities (Kaufman; Norman, 1984, p. 127, 130; Campbell, 1996; Wichmann, 2006).
Figure 4. Table 1. Proto-Ch’olan sound inventory (Kaufman; Norman 1984, p. 85–89). Angled brackets correspond to the practical orthography from the PLFM for the Mayan languages of Guatemala, and the addition of <7> for the glottal stop by Kaufman (2015), while <#> is used to mark areal diffusion.
EMY texts are visually organized on the basis grids, each grid cell called a “glyph block,” a squarish or rectangular arrangements of signs that generally correspond to a word or small syntactic constituent (cf. Knudsen, 2023).[1] Mayan graphemes include three basic types: logograms, syllabograms, and diacritics/determinatives. Logograms are graphemes representing lexemes, sets of words based on the inflection and derivation of a specific root or stem. A syllabogram is a grapheme with a <CV> value (e.g. Ca values like ʔa, b’a, cha, ch’a, etc.); though most Proto-Ch’olan roots are /CV(:/h)C/ in shape, when a suffix is added, such as a /-VC/ suffix (the most frequent suffix shape), a stem with the syllabification /CV(:/h).C-VC/ results, so that an open <CV> syllabogram is very well suited for syllabification, especially since the complex codas (i.e. /CV:C, CVhC, CVjC/) were not distinguished from simplex codas (i.e. /CVC/) directly.[2] Diacritics or determinatives are graphemes that cue a deviation or disambiguation of the value of another grapheme: the duplication dots, a grapheme consisting of two dots, generally tells the reader to read a grapheme twice, and is represented in transliterations by means of a superscript <2> (e.g. ʔAJAW-le****2 for *ʔajaw-(a)l-el); lexical determinatives (Mora-Marín, 2022a, 2023b), more generally known as semantic determinatives (Hopkins 1994; Hopkins; Josserand, 1999; Mora-Marín, 2008), combine with a polyvalent grapheme to determine its specific lexical value, and may also be superscripted in transliterations (e.g. cartoucheʔAJAW(AL)for *ʔajaw(al)**‘Lord (20th day name)’), though most scholars do not transliterate them.
Figure 4 presents examples of EMY spellings, with logograms rendered in uppercase, bold letters, and syllabograms in lowercase, bold letters. Figures 4a–c show spellings of the same word, k’ay-om (sing-agentivizer) ‘singer’, with the first spelling (Figure 4a) showing a lexographic spelling K’AYOM, the second (Figure 4b) a lexosyllab(ograph)ic spelling K’AYOM(-ma), and the third (Figure 4c) a syllabic spelling k’a-yo-m(a). The logogram stands for the derived stem k’ay-om.
Figure 5. Figure 4. Illustration of logograms and syllabograms, and the use of syllabograms as phonographic determiners (“phonetic complements”). The abbreviations for specific texts correspond to the unique object codes used for the “objabbr” field queries of the MHD (e.g. COLK0519). a) Glyph at D6 on conch shell trumpt (COLK0519). Drawing by the author after . b) Glyph J on polychrome pottery vessel from Tikal Burial 196, Structure 5D73 (TIKMT176). Drawing by the author based on photo #8008 by Justin Kerr (http, p. //research.mayavase.com/kerrmaya.html). c) Glyph at A5–B5 on conch shell trumpt (COLK0519). Drawing by the author. Drawings in a) and c) after photograph in Coe (1982, p. 120–123, Fig. 63).
The variety of textual genres in Mayan writing included, in order of increasing grammatical and lexical complexity, the following: object-tags, proprietary statements, dedicatory statements, brief quotative texts, ritual almanacs, political narratives, and cosmological narratives. Two datasets were compiled for this study: the Generic Prepositions Dataset, which is a comprehensive compilation of the GP variable (across genres), and the Accession Statements Dataset, which consists of political narratives commemorating key events in the political career of a ruler. Whatever their degree of distance with respect to the spoken varieties of the time (Schneider, 2004), such texts exhibit patterned variation, and they should be studied in spite of their thematic, discoursive, and social biases, to the best of our abilities (cf. Nevalainen; Raumolin-Brunberg, 2003:26).
1.4. Social Structure and Literacy
Maya society, throughout the Maya lowlands and the entirety of the Classic period, displayed a wide variety of social and political organizational structures. At the very least, two distinct groups, elites and commoners, can be differentiated, but at some sites, the distinction in wealth between the low-status elites and the most successful commoners (e.g. some military specialists, artists, merchants) may have been blurred (Martin, 2020, p. 325–326), so that at some sites at least, one can speak of a rising “middle class” (Chase; Chase, 2004).
By the beginning of the Classic period, the region of relevance was organized into a few dozen kingdoms of varying sizes, each governed by a ruling dynasty based on hereditary kingship with a k’uhul ʔajaw‘holy king’ at the top. A half dozen long-enduring kingdoms exhibited enormous sway over others through conquests or alliances of various types. Diplomatic strategies included royal visits on the occasion of major events (such as the accession to power of a local ruler) and intermarriages between dynasties. The authority of the royal dynasties and holy kings began to decline and collapse by the end of the eighth century, and in the process the inscriptional record of the southern Maya lowlands came to an end (cf. Ebert et al., 2014). By the beginning of the tenth century the system had collapsed in the southern lowlands, marking a major depopulation of the major cities, coinciding with population movements to the Northern region, where a different political system, a more decentralized system, took hold during the Postclassic period.
There were communication routes of two types: inland routes, whether terrestrial or riverine, including in the former case road networks, some of them quite elaborate and extensive closer to major centers; and the circumpeninsula coastal route. The Maya region was never under the hegemonic control of a single polity, and yet, the high degree of cultural uniformity across the lowlands indicates an intense level of dissemination of information, including language, writing, artistic styles, etc., as noted by Martin (2020, p. 304–306).
Only about 1.6% of EMY texts contain scribal signatures; of these, only a few provide explicit evidence of the scribes’ social profiles (i.e. gender, age, rank, place of origin). Generally, scribes were elites, some of them bearing the title ʔajaw ‘lord, ruler’ (but likely with the meaning of ‘high-ranking noble’). Several important works pertain to the identification of scribal hands and signatures (Stuart, 1989; Tate, 1994), and the distribution of intrasite and intersite authorship, including the diffusion of scribal art and writing between polities, typically between primary centers and their satellites (Houston, 1993; Montgomery, 1995; Van Stone, 2000, 2005), a topic that has been given a thorough recent review (Houston, 2016), as well as an extremely detailed case study (Matsumoto, 2021). Some scribes and artisans were almost certainly attached to specific kings or dynasties, who served as their royal patrons. Under such patronage, scribes likely functioned as a means of exchange of information between overlords and vassals at different sites, as the evidence appears to indicate for at least some Maya sites (Houston, 1993, p. 135, 2016, p. 403; Martin; Houston; Zender, 2015; Houston, 2016).
These lines of research open avenues for understanding the nature of scribal practices, their institutionalization, and their sociopolitical significance. Nevertheless, for now, this approach is unlikely to provide statistically significant clues to the relationship between the social profiles of scribes and the spread of scriptal and linguistic innovations, except perhaps for a very few sites (e.g. Piedras Negras) during a very brief period of time (e.g. late eighth century).
It must be assumed that EMY texts were, at the very least, representative of the linguistic practices and ideologies of the uppermost elite groups of Classic Maya society (Justeson, 1985, p. 326–334), in the sense that such groups were the ones commissioning their creation, and also the ones who had a vested interest in their reception among other elites, minimally, and possibly within the larger population, given the likelihood that texts were performed orally and publicly (e.g. Houston; Stuart, 1992, p. 591). Extreme evidence that this was the case is provided by the common and recurring practice of destruction of inscriptions at some sites (cf. Moholy-Nagy, 2003, 2016). The great investment in the production of art and writing by ancient Maya kings and other elites presupposes the existence of a significant audience, but we simply do not know much about literacy rates during the Classic period (cf. Houston; Stuart, 1992, p. 591–592).
2. Assumptions and Methods
2.1. Sociolinguistic Variables
Linguistic variables are cases of variation, two or more ways of saying “the same thing,” predictably constrained by independent factors, linguistic or otherwise. Change, for example the spread of an innovative variant of a linguistic variable, is not abrupt, but instead, a continuum (Chambers, 2013, p. 316), and is characterized by “a period of variation and coexistence between new and old forms in the process of change” (Wolfram and Schilling-Estes, 2004, p. 715), preventing disruptions in communication. The present paper explores both linguistic and non-linguistic factors using a variationist model in an attempt to deal with the transition from one linguistic form to another, its linguistic embedding, and its social evaluation and embedding (Weinreich; Labov; Herzog, 1968, p. 184; Labov, 1982, p. 27–28, 60; Roberge, 2006, p. 2310).
Due to the paucity of explicit information about the ancient scribes social profiles, linguistic variation can be initially approached on the basis of regional and stylistic variation (cf. Winter, 1999, p. 75). The first step for any such approach is to assume a version of the Uniformitarian Principle, stated for historical linguistics as “the understanding that basic mechanisms of linguistic change in the past (e.g., phonetic change, reanalysis, extension, etc.) were not substantially different from those observable in the present” (Rankin 2003, p. 186), and as “the linguistic processes taking place around us are the same as those that have operated to produce the historical record” (Labov 1972, p. 101). It was reformulated for historical sociolinguistics by Romaine, somewhat vaguely as “the present is the key to the past, the past is the key to the present” (1982, p. 122, 127; 2005, p. 1697), and more concretely as “sociolinguistically speaking, [Uniformitarianism] means that there is no reason for believing that language did not vary in the same patterned ways in the past as it has been observed to do today” (1988, p. 1454). Joseph and Wallace (1992, p. 117) seem to abide by this version of Uniformitarianism in connection with ancient Rome.
This is of course where the historical paradox comes in, as articulated by Labov, “[t]he task of historical linguists is to explain the differences between the past and the present; but to the extent that the past was different from the present, there is no way of knowing how different it was” (1994, p. 21). Given this paradox, a historical sociolinguist who assumes that language in the past exhibited “the same” type of patterning with regard to social factors as in the present must define what they mean by “the same.” Also, Labov’s resignation (“there is no way of knowing how different it was”) seems to negate the validity of any historical enterprise; a historical sociolinguist should instead acknowledge the sources of historical information —social, political, cultural, linguistic— and how they will be used to glean the past. In this regard, a much more constrained discussion of Uniformitarianism as applied to linguistics in general, and historical linguistics in particular, is presented by Walkden (2019:5), who notes that Uniformitarianism is, or should be at best, a methodological assumption, a kind of null hypothesis, one that is open to the possibility of significant differences between the past and the present, or presumably, across different social and cultural contexts. Nevalainen and Raumolin-Brunberg (2003, p. 54) appear to assume such a version of the Uniformitarian Principle, keeping an open mind to major disjunctions; they even highlight a “chief difference between Tudor and Stuart England and the present day: late medieval and early modern Englishwomen did not promote language changes that emanated from the world of learning and professional use, which lay outside their own spheres of ‘being’.”
Once Uniformitarianism is assumed, heuristically, it can be proposed, following Labov (1972), that stylistic variation may reflect social differentiation in the past, much as it does today, and that such a relationship could offer the means for elaborating a more principled framework for “uncovering social context in historical records” (Romaine, 1982, p. 122–124). In other words, it may be possible to utilize stylistic variation to infer the presence of variation defined by social factors (cf. Roberge 2006, p. 2311), even if the details of such factors are unknown or unclear or different from their particular permutations in present-day case studies.
A more detailed framework for analyzing linguistic variation has been proposed and elaborated over the years by Labov (1972, p. 314, 1994, p. 78, 2001, p. 196), who defines three types of sociolinguistic variables according to the parameters of social awareness, stylistic variation, and social stratification, p. indicators, markers, stereotypes. These can be characterized as in Table 2, generally following Romaine’s (1982, p. 265–266) schematization, with examples for each type borrowed from the literature. As argued below, the evidence from the GP variable in EMY texts likely points to a sociolinguistic marker at work.
Figure 6. Table 2. Labov’s three major types of sociolinguistic variables (indicators, markers, stereotypes).
These types of patterns, which point to shared communal norms and valuations (Labov, 1972, p. 120–121; Chambers, 2012, p. 300), are not static or fixed, but instead vulnerable to reevaluation and shift, as evidenced in the social re-evaluation of postvocalic “r” in New York City after World War II described by Labov (1972, p. 64–65), as well as the case of “t-glottaling” in Glasgow (Fabricius, 2002), the latter cited in Chambers (20, p. 300). Given the difficulty of assessing social differentiation directly, this paper will pay attention to deviations from expected patterns as clues to possible instances of behavior resulting from social awareness, whatever social factors may underlie it. Labov of course employed the “crossover” phenomenon that he labeled hypercorrection (cf. “Labov-hypercorrection” in Chambers and Trudgill (2004, p. 82)) to confirm the relationship between stylistic variation and social differentiation. As Kerswill (2004, p. 23) notes, “The symptom of change is the “crossover” pattern, by which, in more “monitored” styles […] the group leading the change exceeds the usage by the next higher group in the social hierarchy.” Such unusual or deviant patterns could be identified as evidence of social awareness and socially motivated linguistic behavior.
Instability in the distribution of a variant, whether identified by means of an apparent-time or a real-time approach, is often characterized as a so-called S-Curve pattern, the typical trajectory inferred (apparent time) or documented (real time) for the spread of an innovation. This S-Curve pattern has been described as composed of three stages by Chambers (2013, p. 312), including initial stasis, rapid rise, and tailing off; Labov (1994, p. 67, 79–83) posits five stages, including incipient (below 15%), new and vigorous (15–35%), mid-range (36–65%), nearing completion (66–85%), and completed (above 85%). The goal in the present paper will be to describe the relative temporal stability or instability of the GP morphological variable, at different regional scales, along with its pattern of spatial diffusion. The regional categories adopted from Munson and Macri (2009) are thus assumed correspond to nested speech communities (Kerswill, 2004, p. 30), and the goal will be to trace the spread of the innovative GP variant, ti, assuming that diffusion across space recapitulates diffusion within a social group, with both showing the characteristic stages of change depicted by the S-curve (Bailey et al., 1993, p. 366). Thus, rather than attempting to infer patterns of change by highlighting the first appearances of innovative variants in EMY texts, as attempted by Grube (2004, p. 79–81) for the case of the *h:*j > /j/ merger, or by Lacadena and Wichmann (2000, 2002, 2005) for the cases of the ‘intransitivizer of positionals’ and ‘abstractivizer of nouns’, the present paper will investigate spatial diffusion by means of the overall proportions of the innovative variant of the generic preposition variable in the various subregions of the Maya lowlands, assuming that it takes time for an innovation to spread both within and between communities at various levels.
Lastly, since there is a great deal of information about the historical events and processes that transpired in the Maya lowlands between ce 300–909, such evidence can be adapted to serve as a proxy for social factors, as described in Section (3.4). Additional evidence of general historical processes will also be considered, especially with regard to the discussion of the Northern region in Section (4.4).
2.2. Epigraphic Variables
2.2.1. Types of Variables and Orthographic Resolution
There is no shortage of evidence of variability in EMY texts. The problems lies, at times, in determining what type of variability is at work. Mora-Marín (2019, 2020, 2021a, 2021b, 2022c) has distinguished four types of variables: graphic (different designs of the same grapheme), graphemic (different graphemes with the same value, i.e. allography), orthographic (different spellings of the same word), and linguistic (different variants of the same phoneme or same morpheme, for example). The variable of interest in this paper is linguistic, and more specifically, morphological: tä (tə)~* ti*‘generic preposition’. Mora-Marín (2020, 2021a, 2021b, 2022c, 2023a) has also introduced a distinction between high and low orthographic resolution variables. High-resolution variables are those whose orthographic representation is straightforward, allowing for an unambiguous identification of the phonological shape of each variant. Low-resolution variables are those whose orthographic representation is not straightforward due to the common abbreviatory spelling practices of the scribes. The GP variable is thankfully a high-resolution variable: being a grammatical particle of /CV/ shape, <CV> syllabograms can be used to unambiguously distinguish the two variants, tafor *tä (tə)*and tifor ti(though potentially also tiʔ). It can also be studied as a graphemic variable: five allograms, different graphemes with the same value, could be used to spell it, three allograms with the value ta(Figure 5a), and two with the value ti (Figure 5b). However, this paper does not address this graphemic variable, a task left for a future treatment.
Figure 7. Figure 5.****Allomorphs of ta and ti used to spell the GP variable. a) ta allomorphs. b) tiallomorphs. Drawings by Matthew Looper (, 1991–2025), used with permission. Alphanumeric codes from sign catalog in Looper et al. (2022).
2.2.2. GP Variable
The GP Dataset consists of a total of 1,074 cases of the GP variable, spread across a total of 773 texts; of these, 182 contain two or more cases, and of those, 38 (20.9%) exhibit intratext variation. The Accession Statements Dataset consists of 161* non*-null cases spread across a total of 119 texts; of these, 18 contain two or more cases of the GP variable, and only one of those exhibits intratext variation. Null cases (10.44% of GP variable cases in Accession Dataset) are those where the scribe omitted the spelling of the GP variable despite its being grammatically required in a particular context.
Based on the data available to them at the time, Kaufman and Norman (1984, p. 81–82) argued that the comparative evidence for this morpheme could not be reconciled with the Eastern Ch’olan/Western Ch’olan differentiation model. The fact is both variants are present in both branches, as the more detailed documentation that followed those authors’ work has shown (Table 3). It is now known that both variants are widely represented across the Ch’olan languages, though in some cases a variant is preserved only in a highly idiomatic or grammaticalized context. The evidence now suggests, as proposed here, that Proto-Ch’olan had *tä ~ *ti.
Figure 8. Table 3. GP variable attestations and reconstructions.
Table 4 presents the GP variable as attested in EMY texts (cf. Figure 5). It is a clear example of a high-resolution variable, though it is possible that in the Northern region ti may have been intended to spell Proto-Yucatecan* *tiʔ*, in which case the final /ʔ/ would not have been made explicit.
Figure 9. Table 4. GP Variable as high-resolution variable. Prepositions Dataset (no null cases).
The tä (tə) variant is earlier than the ti variant, appearing in Late Preclassic (400 bce-ce 200) texts between ca. 100 bce–ce 120, originally spelled with T51/T53/3M3 ta(Mora-Marín, 2001, p. 167, 248, 267, 282–288). The earliest dated examples of innovative tiare found on Tikal Stela 4, dated to ce 379, and the Tikal Ballcourt Marker, dated to ce 416, both of which are also the earliest cases of intratext variation between täand ti. The two cases of the GP variable on the Ballcourt Marker are seen Figure 6, where it appears as tä(Figure 6a), and as ti (Figure 6b). A few decades prior to this, also on a text from Tikal (Stela 39) dated to ce 376, the first confirmed example of the syllabogram tiin a purely syllabic function is found, in the spelling ʔu-ʔUH(T)-tifor the verbal expression ʔu[h]t-i-Ø(finish[mediopassive]-completive:intransitive-third.person.singular.absolutive) ‘it got finished/made; it happened’.
Figure 10. Figure 6. Examples of prepositional temporal phrases headed by the GP variable. a) TIKBCM:E01. Excerpt from drawing #2059 by Linda Schele (http://research.famsi.org/schele.html). b) TIKBCM:F07. Excerpt from drawing #2059 by Linda Schele (http://research.famsi.org/schele.html).
Bricker and Orie (2014, p. 197–198, Fig. 4) have proposed that ancient scribes, like the later Colonial Yucatec and Acalan (Yokot’an) scribes, may have alternated between ta and ti as a means of attempting to indicate the vowel /ə/: thus, those authors would analyze the ta~ tivariation as spellings of a form tä (tə). If so, such alternations would instead point to the Proto-Ch’olan change of **a>* *ə*, a fascinating possibility. Nevertheless, most other items where Proto-Ch’olan */ə/ was expected are represented exclusively with Ca syllabograms (e.g. ya-k’a-wa for y-äk’-aw-Ø‘s/he gives/gave/put it’, never yi-k’a-wa*; b’a-la-ma for b’ahläm‘jaguar’, never b’a-li-mV*; ka-ka-wafor käkäw‘cacao’, never** ki-ki-wV***; ma-ka for mäk‘to cover’, never mi-kV*; pa-ta-wa-nifor pät-wän-i-Ø‘it formed’, never **pi-ti-wa-ni***or pi-ti-wi-ni*; ʔu-tz’a-pa-wafor u-tz’äp-aw ‘s/he planted it’, never ʔu-tz’i-pi-wa*; ta-tafor tät‘thick (liquid)’ never ti-ti*, etc.). This fact supports the notion that the ta ~ ti alternation could simply be representing the expected tä (tə)~ ti alternation.[1]
The epigraphic scholarship on this variable is significant (Mathews; Justeson, 1984, p. 187–203, 221–223, 226, 229; Justeson, 1985, p. 470; Justeson; Fox, 1989, p. 15–16, 24–25; Macri, 1988, 1991, 2021; Carter, 2009, p. 6–8, 17–21; Kelly, 2022, p. 101–107). Justeson (1985, p. 470) had already argued for the earlier use of tä relative to ti, and following Mathews and Justeson (1984), supported the notion that tiwas likely diffused, likely from Yucatecan. Macri (1988) had observed a strong preference of ta spellings in the at Palenque and Tortuguero, with Carter (2009) agreeing and adding Tonina to the group, and Kelly (2022, p. 101–107) further supporting this distribution. More will be said below, in Section (4.4), regarding the distribution of this variable in the Northern region, particularly in connection with Macri’s (2021, p. 11) and Kelly’s (2022, p. 101–107) observations of the frequency of ta in that region despite the fact that ti would be expected to be canonical, given the exclusive presence of tiʔamong the Yucatecan languages. Lastly, Carter (2009, p. 20–21) has also suggested that innovative ti may have spread due to the influence of the Kan Dynasty (Snake Kingdom), following up on Lacadena and Wichmann’s (2002, p. 309–310) suggestion that this dynasty promoted the spread of Western Ch’olan traits in particular. More recently, Kelly (2022, p. 239–243) also has examined the possibility of a prominent role by the Snake Kingdom in the spread of linguistic and orthographic traits, though not specifically the GP variable. This idea that will be reviewed and discussed in Section (3.3).
This paper supports prior suggestions that the tivariant in Ch’olan may have been innovated as a result of influence from Proto-Yucatecan *tiʔ (cf. Mathews; Justeson, 1984, p. 187–203), but not necessarily as a direct loan, since Ch’olan speakers should have easily borrowed such a form as /tiʔ/. I offer two alternatives to account for this discrepancy: 1) perhaps it was borrowed as ti to avoid homophony with Proto-Ch’olan *tiʔ ‘mouth; speech’; and/or 2) perhaps it was borrowed as ti because Yucatecan scribes were spelling it with ti, and thus, Ch’olan scribes may have borrowed it through the filter of spelling pronunciation. In either scenario, this form can be added to the inventory of grammatical morphemes that Hopkins’ (1984, 1985) proposed Ch’olan-Tzeltalan superstratum borrowed from the Yucatecan substratum, facilitating the linguistic differentiation of Ch’olan from Tzeltalan.
2.3. Quantitative Methods
Descriptive and inferential statistics have been employed in this paper, the latter type with the goal of determining whether certain variables exhibit a statistically significant association with each other that could point to influential/predictive factors. The inferential tests include hypothesis and correlation tests (e.g. parametric and non-parametric, including Analysis of Variance, Hierarchical Cluster Analysis, Friedman Test, Kruskal-Wallis Test, Spearman Correlation, Mann-Whitney U-Test, Logistic Regression), and almost all have been carried out with DATAtab (DATAtab Team, 2025), but a very few with StatPlus for Mac. Initially, to assess the likelihood of a relationship between a linguistic variable (nominal) and one of the potential independent variables, a Chi-Square Test of Independence (nominal vs. nominal), Kruskal-Wallis Test, Pearson Correlation, or Mann-Whitney U-Test was carried out. If fruitful, the independent variables in question would then be used in a Logistic Regression analysis, to assess to what extent, if any, such independent variables were influential in the distribution of each linguistic variable when considered at the same time with other independent variables. The Logistic Regression summaries presented below are interpreted on the basis of each independent factor: the summarized results indicate which categories (e.g. portable or monumental) of an independent variable (e.g. Text Type) were more influential on the dependent variable (GP variable), and if significant (p-value ≤ .05), whether it the influence was positive or negative (Coeff. B), and what the odds (Odds Ratio) are favoring that category over the reference category. This paper reports primarily the results from this last step.
In addition, to illustrate the distribution of variables with respect to time, measured in Gregorian years based on correlations between the Mayan calendar and the Gregorian calendar, raw frequencies per arbitrary units of time (50 Gregorian years) were used to produce charts showing combined relative cumulative frequencies over time. (A future study could attempt to calculate more appropriate periodizations according to the amount of data.) This is preferred over raw frequencies to make up for the temporally imbalanced inscriptional record (cf. Munson; Macri, 2009, p. 430, Fig. 3b). The cumulative frequencies are proportional, allowing one to compare across regions more faithfully.
2.4. Proxies
Text Type (portable vs. monumental) will be used, preliminarily, as a proxy for a combination of style (i.e. “formal” vs. “informal”) and register (“official” vs. “unofficial”), with portable texts likely reflecting less formal and less official language, and monumental texts more formal and more official. The difference may have to do with intended audiences, with many or most