Supplementary materials for: ‘Tonogenesis: a diachronic typology’

Steven Moran, Etian Grossman and Lilja Maria Sæbø

26 April, 2025

Overview
Setup
Basics of the database contents
Tables for the paper
Examples from the database for the paper
A table showing the number of cases/langauges for each type in each region
Multiple paths to the same result
Patterns in level vs contour height
New tables for revise and resubmit

Overview

Supplementary materials for “Tonogenesis: a diachronic typology” by Lilja Maria Sæbø, Eitan Grossman and Steven Moran, accepted in Diachronica.

The CLDF data are available here:

https://github.com/cldf-datasets/tonodb

Setup

Load the libraries.

library(tidyverse)
library(knitr)
library(kableExtra)
library(xtable)
library(ggalluvial)

Load the tonodb CLDF data.

values <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/values.csv'))
languages <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/languages.csv'))
contributions <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/contributions.csv'))
parameters <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/parameters.csv'))

Basics of the database contents

We have this many languages in our sample.

nrow(languages)

## [1] 97

And this many observations.

nrow(values)

## [1] 259

Let’s map our data points. We note some rows are removed because the lat/long figures are NA due to them being listed as dialects or language families.

ggplot(data=languages, aes(x=Longitude, y=Latitude)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

These are the missing data points for geographic location.

languages %>% filter(is.na(Latitude)) %>% select(ID, Name, Macroarea, Latitude, Longitude) %>% kable()

ID	Name	Macroarea	Latitude	Longitude
atha1247	Athabaskan	NA	NA	NA
auks1239	Aukshtaitish	Eurasia	NA	NA
cant1236	Cantonese	Eurasia	NA	NA
cent2346	Central Tibetan	NA	NA	NA
coas1300	Coast Tsimshian	North America	NA	NA
east2280	Eastern Baltic	NA	NA	NA
extr1245	Extreme Southern New Caledonian	NA	NA	NA
kere1287	Keresan	NA	NA	NA
mang1393	Mangbetu-Asua	NA	NA	NA
metn1237	Metnyo	Papunesia	NA	NA
midd1319	Middle Franconian	Eurasia	NA	NA
moha1257	Mohawk-Oneida	NA	NA	NA
newc1243	New Caledonian	NA	NA	NA
nort3160	North Germanic	NA	NA	NA
podo1243	Podoko	NA	NA	NA
pwoo1239	Pwo	NA	NA	NA
raja1258	Raja Ampat Maya	NA	NA	NA
sind1278	Sindhi-Lahnda	NA	NA	NA
slav1255	Slavic	NA	NA	NA
taik1256	Tai-Kadai	NA	NA	NA
tere1281	Terena	South America	NA	NA
utsa1239	Lhasa Tibetan	Eurasia	NA	NA
yeni1252	Yeniseian	NA	NA	NA
zhuo1234	Zhuoni	Eurasia	NA	NA

We’ve gone through by hand and added approximate geocoordinates for visualization purposes, e.g., using Glottolog’s Swedish latitude and longitude for North Germanic.

Merge in the hand attributed geocoordinates.

# There must be a saner way to do this!
hc <- read_csv('hand_coordinates.csv')
tmp <- left_join(languages, hc, by=c("ID"="ID", "Name"="Name"))
tmp <- tmp %>% mutate(Latitude.x = coalesce(Latitude.x, Latitude.y))
tmp <- tmp %>% mutate(Longitude.x = coalesce(Longitude.x, Longitude.y))
tmp <- tmp %>% select(-Latitude.y, Longitude.y)
tmp <- tmp %>% rename(Latitude = Latitude.x)
tmp <- tmp %>% rename(Longitude = Longitude.x)
languages <- tmp

Redo the map.

ggplot(data=languages, aes(x=Longitude, y=Latitude)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()

Here we can add some color by language family.

ggplot(data=languages, aes(x=Longitude, y=Latitude, color=family_id)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw() +
  theme(legend.position="none")

  # ggtitle("Language varieties colored for language family")

How many data points per macroarea? (Note again several NAs.)

table(languages$Macroarea, exclude=FALSE)

## 
##        Africa       Eurasia North America     Papunesia South America 
##            11            40            17             7             6 
##          <NA> 
##            16

Some Glottolog macroareas are missing, e.g., languages that don’t have Glottocodes or are family level codes.

languages %>% filter(is.na(Macroarea))

## # A tibble: 16 × 18
##    ID       Name  Macroarea Latitude Longitude Glottocode ISO639P3code family_id
##    <chr>    <chr> <chr>        <dbl>     <dbl> <chr>      <chr>        <chr>    
##  1 atha1247 Atha… <NA>        60.5      -151.  atha1247   <NA>         atha1245 
##  2 cent2346 Cent… <NA>        28.4        90.2 cent2346   <NA>         sino1245 
##  3 east2280 East… <NA>        56.8        24.3 east2280   <NA>         indo1319 
##  4 extr1245 Extr… <NA>       -22.1       167.  extr1245   <NA>         aust1307 
##  5 kere1287 Kere… <NA>        35.5      -106.  kere1287   <NA>         <NA>     
##  6 mang1393 Mang… <NA>         0.268      27.3 mang1393   <NA>         cent2225 
##  7 moha1257 Moha… <NA>        43.7       -74.7 moha1257   <NA>         iroq1247 
##  8 newc1243 New … <NA>       -20.9       167.  newc1243   <NA>         aust1307 
##  9 nort3160 Nort… <NA>        59.8        17.4 nort3160   <NA>         indo1319 
## 10 podo1243 Podo… <NA>        10.9        14.0 podo1243   <NA>         afro1255 
## 11 pwoo1239 Pwo   <NA>        18.0        99.6 pwoo1239   <NA>         sino1245 
## 12 raja1258 Raja… <NA>        -0.173     130.  raja1258   <NA>         aust1307 
## 13 sind1278 Sind… <NA>        30.1        75.3 sind1278   <NA>         indo1319 
## 14 slav1255 Slav… <NA>        49.9        15.1 slav1255   <NA>         indo1319 
## 15 taik1256 Tai-… <NA>        24.1       110.  taik1256   <NA>         <NA>     
## 16 yeni1252 Yeni… <NA>        63.8        87.5 yeni1252   <NA>         <NA>     
## # ℹ 10 more variables: parent_id <chr>, bookkeeping <lgl>, level <chr>,
## #   description <lgl>, markup_description <lgl>, child_family_count <dbl>,
## #   child_language_count <dbl>, child_dialect_count <dbl>, country_ids <chr>,
## #   Longitude.y <dbl>

# tmp <- languages %>% filter(is.na(Macroarea)) %>% select(ID, Name, Macroarea)
# write_csv(tmp, 'get_macroareas.csv')

# There must be a saner way to do this!
hc <- read_csv('hand_macroareas.csv')
tmp <- left_join(languages, hc, by=c("ID"="ID", "Name"="Name"))
tmp <- tmp %>% mutate(Macroarea.x = coalesce(Macroarea.x, Macroarea.y))
tmp <- tmp %>% select(-Macroarea.y)
tmp <- tmp %>% rename(Macroarea = Macroarea.x)
languages <- tmp
table(languages$Macroarea, exclude = FALSE)

## 
##        Africa       Eurasia North America     Papunesia South America 
##            13            48            20            10             6

And a quick look at our areas.

contributions %>% filter(is.na(Area)) #

## # A tibble: 1 × 9
##      ID Contributor  Citation      Glottocode LanguageVariety Family Area  Notes
##   <dbl> <chr>        <chr>         <chr>      <chr>           <chr>  <chr> <chr>
## 1   105 Lilja Saeboe Lilja Saeboe… <NA>       Montagnais      Algic  <NA>  <NA> 
## # ℹ 1 more variable: BibTex <chr>

table(contributions$Area, exclude=FALSE)

## 
##        Africa          Asia        Europe North America     Papunesia 
##            13            39            14            21            10 
## South America          <NA> 
##             6             1

Tables for the paper

Create tables for the paper. First merge the tonodb tables.

tonodb <- left_join(values, languages, by=c("Language_ID"="ID"))

# Reduce the Contributor table and get the TonoDB Area column
tmp <- contributions %>% select(ID, Family, Area)
tonodb <- left_join(tonodb, tmp, by=c("Inventory_ID"="ID"))

# tonodb %>% filter(is.na(family_id))

# Rename wordtype to syllable-count -- TODO replace when database is updated
tonodb <- tonodb %>% mutate(Type = str_replace(Type, "wordtype", "syllable"))

# Fix the mistakes (TODO: rerun the CLDF creation script, which will fix these typos below)
tonodb$Ordering <- str_replace(tonodb$Ordering, "broad", "Broad")
tonodb$Ordering <- str_replace(tonodb$Ordering, "strict", "Strict")
tonodb %>% filter(Ordering=="broad")

## # A tibble: 0 × 54
## # ℹ 54 variables: ID <dbl>, Parameter_ID <chr>, Value <chr>, Language_ID <chr>,
## #   Inventory_ID <dbl>, LanguageVariety <chr>, Ordering <chr>, Ongoing <chr>,
## #   TriggeringContext <chr>, Tone <chr>, Extra <chr>, Height <chr>,
## #   Contour <chr>, Phonation <chr>, ToneDescription <chr>, ChaoNumerals <chr>,
## #   RestrictedEnviroment <chr>, Notes <chr>, EffectOnPitch <chr>,
## #   ResultantSystem <chr>, Type <chr>, Onset <chr>, OnsetManner <chr>,
## #   OnsetVoicing <chr>, OnsetAspiration <chr>, Coda <chr>, …

tonodb %>% filter(is.na(Ordering))

## # A tibble: 1 × 54
##      ID Parameter_ID     Value Language_ID Inventory_ID LanguageVariety Ordering
##   <dbl> <chr>            <chr> <chr>              <dbl> <chr>           <chr>   
## 1   259 8D966B2253A9170… high  <NA>                  NA <NA>            <NA>    
## # ℹ 47 more variables: Ongoing <chr>, TriggeringContext <chr>, Tone <chr>,
## #   Extra <chr>, Height <chr>, Contour <chr>, Phonation <chr>,
## #   ToneDescription <chr>, ChaoNumerals <chr>, RestrictedEnviroment <chr>,
## #   Notes <chr>, EffectOnPitch <chr>, ResultantSystem <chr>, Type <chr>,
## #   Onset <chr>, OnsetManner <chr>, OnsetVoicing <chr>, OnsetAspiration <chr>,
## #   Coda <chr>, CodaPhonation <chr>, CodaGlottal <chr>, CodaManner <chr>,
## #   Stress <chr>, SyllableCount <chr>, NucleusATR <chr>, NucleusLength <chr>, …

Distribution of the languages, families and cases of tonogenesis across different areas

x <- tonodb %>% select(Area, Language_ID) %>% distinct() %>% group_by(Area) %>% summarise(Languages = n())
y <- tonodb %>% select(Area, family_id) %>% distinct() %>% group_by(Area) %>% summarize(Families = n())
z <- tonodb %>% select(Area, TriggeringContext) %>% group_by(Area) %>% summarize(`Cases of tonogenesis` = n())

tmp <- left_join(x, y)
tmp <- left_join(tmp, z)
tmp <- tmp %>% arrange(desc(`Cases of tonogenesis`))
tmp %>% kable()

Area	Languages	Families	Cases of tonogenesis
Asia	37	9	157
North America	20	10	33
Europe	12	2	22
Africa	13	5	21
Papunesia	10	1	16
South America	6	3	8
NA	1	1	2

# Still getting some NAs, let's drop them
table(tonodb$family_id, exclude = FALSE)

## 
## afro1255 algi1248 araw1281 atha1245 atla1278 aust1305 aust1307 cadd1255 
##        3        7        1        5        6       15       27        1 
## cent2225 chim1311 gong1255 hmon1336 indo1319 iroq1247 koma1264 kore1284 
##        4        1        2       10       23        8        6        2 
## maya1287 mong1349 nada1235 sino1245 taik1256 tsim1258 tuca1253 ural1272 
##        4        2        3       58       28        1        4        1 
## utoa1244 waka1280     <NA> 
##        2        2       33

tmp <- tmp %>% filter(!is.na(Area))
tmp %>% kable()

Area	Languages	Families	Cases of tonogenesis
Asia	37	9	157
North America	20	10	33
Europe	12	2	22
Africa	13	5	21
Papunesia	10	1	16
South America	6	3	8

print(xtable(tmp, type = "latex", caption="Distribution of the languages, families and cases of tonogenesis across different areas"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:40 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrr}
##   \hline
## Area & Languages & Families & Cases of tonogenesis \\ 
##   \hline
## Asia &  37 &   9 & 157 \\ 
##   North America &  20 &  10 &  33 \\ 
##   Europe &  12 &   2 &  22 \\ 
##   Africa &  13 &   5 &  21 \\ 
##   Papunesia &  10 &   1 &  16 \\ 
##   South America &   6 &   3 &   8 \\ 
##    \hline
## \end{tabular}
## \caption{Distribution of the languages, families and cases of tonogenesis across different areas} 
## \end{table}

Number of languages in different families

tmp <- tonodb %>% select(family_id, LanguageVariety) %>% distinct() %>% arrange(family_id, LanguageVariety) %>% group_by(family_id) %>% summarize(`Number of varieties` = n(), Languages = str_c(LanguageVariety, collapse=", "))

# We need the Glottolog family names
glottolog <- read_csv('data/languoid.csv')
families <- glottolog %>% filter(id %in% tmp$family_id) %>% select(id, name)
tmp <- left_join(tmp, families, by=c("family_id"="id"))
tmp <- tmp %>% select(name, `Number of varieties`, Languages)
tmp <- tmp %>% rename(Family = name)

tmp %>% kable()

Family	Number of varieties	Languages
Afro-Asiatic	2	Iraqw, Podoko
Algic	4	Arapaho, Cheyenne, Kickapoo, Maliseet-Passamaquoddy
Arawakan	1	Terena
Athabaskan-Eyak-Tlingit	3	Proto-Athabaskan (tonal dialects) group one, Proto-Athabaskan (tonal dialects) group two, Sanya-Henya Tlingit
Atlantic-Congo	5	Bantu D30, Bila, Kohumono, Moba, Nupe
Austroasiatic	5	Hu, U, Vietnamese, Wester Kammu, Western Kammu
Austronesian	12	Cem, Central North New Caledonian languages, Far South New Caledonian langauges, Magey Matbat, Metnyo Ambel, Moor, Phan Rang Cham, Pre-proto-North Huon Gulf, Proto-Maˈya, Samoan, Utsat, Yerisiam
Caddoan	1	Caddo
Central Sudanic	2	Languages of the Mangbetu-Asua subgroup with three tones, Western Lugbara
Chimakuan	1	Quileute
Ta-Ne-Omotic	2	Gimira, Shinasha
Hmong-Mien	1	White Hmong
Indo-European	14	Auktaitian dialects of Lithuanian, Central Franconian, Central Scandinavian, East Baltic (Latvian and Lithuanian), East Slesvig, Late Proto-Slavic, Latvian, Limburgish, Lithuanian, Proto-Nordic, Punjabi, Scottish gaelic (Bernera), West Baltic (Prussian, Zealand Danish
Iroquoian	3	Cherokee, Mohawk, Proto-Mohawk-Oneida
Koman	2	Proto-Gwama, Proto-Opo
Koreanic	1	Korean
Mayan	4	Mocho’, San Bartolo Tzotzil, Uspanteko, Yucatec
Mongolic-Khitan	1	Mongour
Naduhup	1	Eastern Naduhup
Sino-Tibetan	20	Baima Tibetan, Brokpa, Burmese, Cantonese, Chitabu (bwe), Dzongkha, Geba, Khaling, Kurtöp, Lahu, Lhasa Tibetan, Middle Chinese, Phlong, Pwo Karen, Rikeze Tibetan, Sgaw Karen, Tokpe Gola (Tibetan), T’ientsin, Zhibo Tibetan, Zhuoni Tibetan
Tai-Kadai	4	Nakhon Si Thammarat Thai, Proto-Tai, Shan, Yung Chiang Kam
Tsimshian	1	Coast Tsimshian
Tucanoan	4	Barasana, Kubeo, Máíhɨ̃ki, Tatuyo
Uralic	1	Estonian
Uto-Aztecan	1	Hopi
Wakashan	1	Heiltsuk
NA	9	NA

# print(xtable(tmp, type = "latex", caption="Number of languages in different language families"), include.rownames=FALSE)

Cases of tonogenesis sorted by triggering context

z <- tonodb %>% select(Type, LanguageVariety) %>% separate_rows(Type)
x <- z %>% group_by(Type) %>% summarize(`Cases of tonogenesis` = n()) %>% arrange()
y <- z %>% select(Type, LanguageVariety) %>% distinct() %>% group_by(Type) %>% summarize(`Number of languages` = n()) %>% arrange()

tmp <- left_join(x, y)

## Joining with `by = join_by(Type)`

tmp <- tmp %>% arrange(desc(`Cases of tonogenesis`))

# Remove NAs
# tmp <- tmp %>% filter(!is.na(Type))
# tmp %>% kable()

# rename to syllable-count
tmp <- tmp %>% mutate(Type = str_replace(Type, "syllable", "syllable-count"))

tmp %>% kable()

Type	Cases of tonogenesis	Number of languages
onset	133	41
coda	70	43
count	27	20
nucleus	27	16
syllable-count	27	20
stress	12	8
other	6	5
NA	1	1

print(xtable(tmp, type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrr}
##   \hline
## Type & Cases of tonogenesis & Number of languages \\ 
##   \hline
## onset & 133 &  41 \\ 
##   coda &  70 &  43 \\ 
##   count &  27 &  20 \\ 
##   nucleus &  27 &  16 \\ 
##   syllable-count &  27 &  20 \\ 
##   stress &  12 &   8 \\ 
##   other &   6 &   5 \\ 
##    &   1 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{Cases of tonogenesis by category} 
## \end{table}

Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch)
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% filter(OnsetVoicing != "") %>% filter(EffectOnPitch != "")
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
#  filter(OnsetVoicing != "") %>% 
#  filter(EffectOnPitch != "") %>%
#  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
#  filter(OnsetVoicing != "") %>% 
#  filter(EffectOnPitch != "") %>%
#  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))
# table(tmp)

tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
  filter(OnsetVoicing != "") %>% 
  filter(EffectOnPitch != "") %>%
  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))

t <- data.frame(unclass(table(tmp$OnsetVoicing, tmp$EffectOnPitch)))
t <- t %>% select(lowering, mid, elevating, rising, falling)

print(xtable(t, type = "latex", caption="Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents"))

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrr}
##   \hline
##  & lowering & mid & elevating & rising & falling \\ 
##   \hline
## Voiced &  37 &   0 &  10 &   2 &   2 \\ 
##   Voiceless &  11 &   8 &  35 &   0 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents} 
## \end{table}

Tonogenesis triggered by coda consonants

tmp <- tonodb %>% select(CodaGlottal, EffectOnPitch) %>%
  filter(!is.na(CodaGlottal)) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(EffectOnPitch %in% c("level", "rising", "falling"))
table(tmp) %>% kable()

	falling	rising
/h/	2	0
/h/, glottal stop	2	1
glottal stop	4	3
glottalized	1	2
laryngeal	6	0
non-glottalized	1	0

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis triggered by coda consonants"))

Tonogenesis based on vowel length

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

	elevating	falling	lowering	rising	rising-falling
-ATR	0	0	2	0	0
-ATR and non-high vowel	0	0	1	0	0
+ATR	2	0	0	0	0
+ATR and high vowel	1	0	0	0	0
high vowel	3	0	1	0	0
long vowel	1	1	1	1	1
low vowel	1	0	2	0	0
other	0	0	1	0	0
short vowel	3	0	1	0	0
short, long	1	0	1	0	0
short, long, glottalic	2	0	1	0	0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("long vowel", "short vowel"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on vowel length"))

Tonogenesis based on vowel length

high/low is relative.

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

	elevating	falling	lowering	rising	rising-falling
-ATR	0	0	2	0	0
-ATR and non-high vowel	0	0	1	0	0
+ATR	2	0	0	0	0
+ATR and high vowel	1	0	0	0	0
high vowel	3	0	1	0	0
long vowel	1	1	1	1	1
low vowel	1	0	2	0	0
other	0	0	1	0	0
short vowel	3	0	1	0	0
short, long	1	0	1	0	0
short, long, glottalic	2	0	1	0	0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("high vowel", "low vowel"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on vowel height – high/low is relative"))

Tonogenesis based on ATR

High/low is relative.

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

	elevating	falling	lowering	rising	rising-falling
-ATR	0	0	2	0	0
-ATR and non-high vowel	0	0	1	0	0
+ATR	2	0	0	0	0
+ATR and high vowel	1	0	0	0	0
high vowel	3	0	1	0	0
long vowel	1	1	1	1	1
low vowel	1	0	2	0	0
other	0	0	1	0	0
short vowel	3	0	1	0	0
short, long	1	0	1	0	0
short, long, glottalic	2	0	1	0	0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("+ATR", "-ATR"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on ATR – high/low is relative"))

Effect of voicing on tone

In the DoTE (number of languages).

tmp <- tonodb %>% filter(Onset %in% c('voiceless', 'voiced'))
table(tmp$Onset, tmp$EffectOnPitch)

##            
##             elevating falling lowering rising
##   voiced            3       1       19      1
##   voiceless        15       0        2      0

t <- data.frame(unclass(table(tmp$Onset, tmp$EffectOnPitch)))
t <- t %>% select(lowering, elevating, rising, falling)
# print(xtable(t, type = "latex", caption="The effect of voicing on tone"))

Tonogenesis triggered by codas

In the DoTE (number of cases of tonogenesis).

# table(tonodb$Coda, tonodb$EffectOnPitch) %>% kable()

# tmp <- tonodb %>% select(Coda, EffectOnPitch) %>% filter_at(vars(Coda, EffectOnPitch),any_vars(!is.na(.)))
# table(tmp$Coda, tmp$EffectOnPitch) %>% kable()

# tmp <- tonodb %>% select(Coda, EffectOnPitch) %>% filter_at(vars(Coda, EffectOnPitch),all_vars(!is.na(.)))
# table(tmp$Coda, tmp$EffectOnPitch) %>% kable()

Onset Voicing by effect on pitch

tmp <- tonodb %>% select(OnsetAspiration, EffectOnPitch) %>%
  filter(OnsetAspiration != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	mid	rising
Aspirated	5	0	7	4	0
Aspirated, unaspirated	5	1	1	0	0
Breathy	0	0	0	0	1
Unaspirated	7	0	3	6	0

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, mid, elevating, falling, rising)

# print(xtable(t, type = "latex", caption="The effect of voicing on tone"))

Effect of voicing on pitch

tmp <- tonodb %>% select(CodaManner, EffectOnPitch) %>% separate_rows(CodaManner) %>%
  filter(CodaManner != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	level	lowering	rising
cluster	0	1	0	1	0
fricative	1	3	0	1	0
obstruent	3	4	0	1	1
open	0	0	3	0	0
sonorant	1	3	3	1	0
stop	2	4	0	4	3

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, level, elevating, rising, falling) %>% arrange(desc(lowering))
t %>% kable()

	lowering	level	elevating	rising	falling
stop	4	0	2	3	4
cluster	1	0	0	0	1
fricative	1	0	1	0	3
obstruent	1	0	3	1	4
sonorant	1	3	1	0	3
open	0	3	0	0	0

print(xtable(t, type = "latex", caption="The effect of voicing on tone"))

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrr}
##   \hline
##  & lowering & level & elevating & rising & falling \\ 
##   \hline
## stop &   4 &   0 &   2 &   3 &   4 \\ 
##   cluster &   1 &   0 &   0 &   0 &   1 \\ 
##   fricative &   1 &   0 &   1 &   0 &   3 \\ 
##   obstruent &   1 &   0 &   3 &   1 &   4 \\ 
##   sonorant &   1 &   3 &   1 &   0 &   3 \\ 
##   open &   0 &   3 &   0 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{The effect of voicing on tone} 
## \end{table}

Effect of voice on pitch

tmp <- tonodb %>% select(CodaPhonation, EffectOnPitch) %>%
  filter(CodaPhonation != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	falling	lowering	rising
breathy	1	0	0
creaky	2	0	0
preaspirated	0	1	0
voiced	1	1	0
voiceless	2	0	1

print(xtable(table(tmp), type = "latex", caption="The effect of voice on pitch"))

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrr}
##   \hline
##  & falling & lowering & rising \\ 
##   \hline
## breathy &   1 &   0 &   0 \\ 
##   creaky &   2 &   0 &   0 \\ 
##   preaspirated &   0 &   1 &   0 \\ 
##   voiced &   1 &   1 &   0 \\ 
##   voiceless &   2 &   0 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{The effect of voice on pitch} 
## \end{table}

Effect of coda glottal on pitch

tmp <- tonodb %>% select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	rising
/h/	1	2	1	0
/h/, glottal stop	1	2	0	1
glottal stop	2	4	3	3
glottalized	3	1	1	2
glottalized, non-glottalized	1	0	1	0
laryngeal	0	6	0	0
non-glottalized	0	1	0	0

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, elevating, falling, rising)

# print(xtable(t, type = "latex", caption="The effect of coda glottal on pitch"))

Effect of vowel height on pitch

tmp <- tonodb %>% select(Height, EffectOnPitch) %>%
  filter(Height != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	level	lowering	lowering, elevating	mid	no change	rising	rising, elevating	rising, lowering
high	51	1	0	4	0	0	1	1	1	0
low	0	2	0	47	0	0	0	0	0	1
mid	8	1	1	5	1	2	0	1	0	0

# print(xtable(table(tmp), type = "latex", caption="The effect of vowel height on pitch"))
table(tmp) %>% kable()

	elevating	falling	level	lowering	lowering, elevating	mid	no change	rising	rising, elevating	rising, lowering
high	51	1	0	4	0	0	1	1	1	0
low	0	2	0	47	0	0	0	0	0	1
mid	8	1	1	5	1	2	0	1	0	0

Effect of nucleus length on pitch

tmp <- tonodb %>% select(NucleusLength, EffectOnPitch) %>%
  filter(NucleusLength != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	rising	rising-falling
long	1	1	1	1	1
short	3	0	1	0	0

# print(xtable(table(tmp), type = "latex", caption="The effect of nucleus length on pitch"))
table(tmp) %>% kable()

	elevating	falling	lowering	rising	rising-falling
long	1	1	1	1	1
short	3	0	1	0	0

Effect of nuclear +/iATR on pitch

tmp <- tonodb %>% select(NucleusATR, EffectOnPitch) %>%
  filter(NucleusATR != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	lowering
-ATR	0	3
+ATR	3	0

# print(xtable(table(tmp), type = "latex", caption="The effect of nuclear +/- ATR on pitch"))
table(tmp) %>% kable()

	elevating	lowering
-ATR	0	3
+ATR	3	0

Number of cases/varieties of different types for each region

Africa

tmp <- tonodb %>% filter(Area == "Africa") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating	falling	lowering

tmp <- tonodb %>% filter(Area == "Africa") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

# Nothing here
# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Africa"))

Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

	elevating	falling	rising
/h/	1	2	0
glottal stop	1	4	3
glottalized	1	0	0
non-glottalized	0	0	0

tmp <- tonodb %>% filter(Area == "Asia") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	rising
/h/	1	2	0
glottal stop	1	4	3
glottalized	1	0	0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Asia"))

Europe

tmp <- tonodb %>% filter(Area == "Europe") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

	elevating	falling	level	no change	rising	rising-falling
glottalized	1	0	0	0	2	0
non-glottalized	0	1	0	0	0	0

tmp <- tonodb %>% filter(Area == "Europe") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	rising
glottalized	1	0	2
non-glottalized	0	1	0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Europe"))

North America

tmp <- tonodb %>% filter(Area == "North America") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

	elevating	falling	lowering	rising
/h/	0	0	1	0
/h/, glottal stop	1	2	0	1
glottalized	1	1	1	0
glottalized, non-glottalized	1	0	1	0
laryngeal	0	6	0	0

tmp <- tonodb %>% filter(Area == "North America") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	rising
/h/	0	0	1	0
/h/, glottal stop	1	2	0	1
glottalized	1	1	1	0
glottalized, non-glottalized	1	0	1	0
laryngeal	0	6	0	0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for North America"))

Papunesia

tmp <- tonodb %>% filter(Area == "Papunesia") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating	lowering	rising

tmp <- tonodb %>% filter(Area == "Papunesia") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

# No results
# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Papunesia"))

South America

tmp <- tonodb %>% filter(Area == "South America") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

	elevating	falling	lowering	rising
glottal stop	1	0	3	0

tmp <- tonodb %>% filter(Area == "South America") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	lowering
glottal stop	1	3

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for South America"))

Area and tonogenesis specific tables

Onset aspiration in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(OnsetAspiration, EffectOnPitch) %>% 
  select(OnsetAspiration, EffectOnPitch) %>%
  filter(OnsetAspiration != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	mid	rising
Aspirated	3	0	6	4	0
Aspirated, unaspirated	5	1	1	0	0
Breathy	0	0	0	0	1
Unaspirated	7	0	1	6	0

# print(xtable(table(tmp), type = "latex", caption="Onset aspiration in Asia"))

Coda glottal in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaGlottal, EffectOnPitch) %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	rising
/h/	1	2	0
glottal stop	1	4	3
glottalized	1	0	0

# print(xtable(table(tmp), type = "latex", caption="Coda glottal in Asia"))

Coda manner in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaManner, EffectOnPitch) %>% 
  select(CodaManner, EffectOnPitch) %>%
  filter(CodaManner != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	level	lowering	rising
fricative	1	3	0	0	0
obstruent	1	2	0	0	0
open	0	0	1	0	0
sonorant	0	1	1	0	0
sonorant, open	0	0	2	0	0
stop	1	4	0	1	3

# print(xtable(table(tmp), type = "latex", caption="Coda manner in Asia"))

Coda phonation type in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaPhonation, EffectOnPitch) %>% 
  select(CodaPhonation, EffectOnPitch) %>%
  filter(CodaPhonation != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	falling	lowering
breathy	1	0
voiced	0	1
voiceless	1	0

# print(xtable(table(tmp), type = "latex", caption="Coda phonation type in Asia"))

Nucleus height in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(NucleusHeight, EffectOnPitch) %>% 
  select(NucleusHeight, EffectOnPitch) %>%
  filter(NucleusHeight != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	lowering
High	1	0
Low	0	1

# print(xtable(table(tmp), type = "latex", caption="Nucleus height in Asia"))

Onset voicing in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(OnsetVoicing, EffectOnPitch) %>% 
  select(OnsetVoicing, EffectOnPitch) %>%
  filter(OnsetVoicing != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

	elevating	falling	lowering	lowering, elevating	mid	rising
sonorant	1	0	0	0	0	0
Voiced	10	2	32	0	0	2
Voiced, voiceless	4	0	3	1	2	0
Voiceless	31	1	11	0	8	0

# print(xtable(table(tmp), type = "latex", caption="Onset voicing in Asia"))

Tonogenetic events by macroarea

Worldwide

tmp <- tonodb %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Cases of tonogenesis` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of languages` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t <- t %>% arrange(desc(`Cases of tonogenesis`))

t %>% kable()

Type	Cases of tonogenesis	Number of languages
onset	133	41
coda	70	43
count	27	20
nucleus	27	16
syllable	27	20
stress	12	8
other	6	5
NA	1	1

# print(xtable(t, type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)

t(t) %>% kable()

Type	onset	coda	count	nucleus	syllable	stress	other	NA
Cases of tonogenesis	133	70	27	27	27	12	6	1
Number of languages	41	43	20	16	20	8	5	1

# print(xtable(t(t), type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)

Africa

tmp <- tonodb  %>% filter(Area == "Africa") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
count	6	4
nucleus	7	4
onset	7	5
other	1	1
syllable	6	4

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Africa in the DTE"))
t(t) %>% kable()

Type	count	nucleus	onset	other	syllable
Number of cases	6	7	7	1	6
Number of varieties	4	4	5	1	4

Asia

tmp <- tonodb  %>% filter(Area == "Asia") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
coda	35	15
count	3	3
nucleus	4	2
onset	116	29
other	2	1
stress	2	1
syllable	3	3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Asia in the DTE"))
t(t) %>% kable()

Type	coda	count	nucleus	onset	other	stress	syllable
Number of cases	35	3	4	116	2	2	3
Number of varieties	15	3	2	29	1	1	3

Europe

tmp <- tonodb  %>% filter(Area == "Europe") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
coda	8	5
count	8	6
nucleus	1	1
other	1	1
stress	5	4
syllable	8	6

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Europe in the DTE"))
t(t) %>% kable()

Type	coda	count	nucleus	other	stress	syllable
Number of cases	8	8	1	1	5	8
Number of varieties	5	6	1	1	4	6

North America

tmp <- tonodb  %>% filter(Area == "North America") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
coda	19	16
count	5	3
nucleus	8	5
onset	1	1
other	2	2
stress	4	2
syllable	5	3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in North America in the DTE"), include.colnames=FALSE)
t(t) %>% kable()

Type	coda	count	nucleus	onset	other	stress	syllable
Number of cases	19	5	8	1	2	4	5
Number of varieties	16	3	5	1	2	2	3

South America

tmp <- tonodb  %>% filter(Area == "South America") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
coda	6	5
count	1	1
nucleus	3	1
syllable	1	1

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in South America in the DTE"))
t(t) %>% kable()

Type	coda	count	nucleus	syllable
Number of cases	6	1	3	1
Number of varieties	5	1	1	1

Papunesia

tmp <- tonodb  %>% filter(Area == "Papunesia") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)

## Joining with `by = join_by(Type)`

t %>% kable()

Type	Number of cases	Number of varieties
coda	1	1
count	4	3
nucleus	3	2
onset	9	6
stress	1	1
syllable	4	3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Papunesia in the DTE"))
t(t) %>% kable()

Type	coda	count	nucleus	onset	stress	syllable
Number of cases	1	4	3	9	1	4
Number of varieties	1	3	2	6	1	3

Examples from the database for the paper

tmp <- tonodb %>% select(ID, LanguageVariety, TriggeringContext, EffectOnPitch, Type ) %>% head(n=10)
tmp %>% kable()

ID	LanguageVariety	TriggeringContext	EffectOnPitch	Type
1	Vietnamese	Initial voiced stop + Falling tone	lowering	onset
2	Vietnamese	Initial voiceless stop + Falling tone	elevating	onset
3	Vietnamese	Final voiceless fricative	falling	coda
4	Punjabi	voiced aspirateed coda	falling	coda
5	Middle Chinese	final /h/	falling	coda
6	Cherokee	inital glide or final glottal consonant	falling	coda, onset
7	Lhasa Tibetan	final glottal stop	falling	coda
8	Khaling	Obstruent coda OR disyllable –> monosyllable	falling	coda, syllable-count
9	Proto-Mohawk-Oneida	lengthened accented vowel followed by a glottal stop or by * / h / plus a resonant consonant	falling	coda
10	Dzongkha	loss of a second syllable OR loss of a coda /-r/ or /-l/.	falling	coda, syllable-count

# print(xtable(tmp, type = "latex", caption="Example entries from the DTE"), include.rownames=FALSE)

A table showing the number of cases/langauges for each type in each region

tmp <- tonodb %>% select(Area, LanguageVariety, Type) %>% separate_rows(Type)
# tmp <- tonodb %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Area, Type) %>% summarize(`Cases of tonogenesis` = n())

## `summarise()` has grouped output by 'Area'. You can override using the
## `.groups` argument.

varieties <- tmp %>% distinct() %>% group_by(Area, Type) %>% summarize(`Number of languages` = n())

## `summarise()` has grouped output by 'Area'. You can override using the
## `.groups` argument.

t <- left_join(cases, varieties)

## Joining with `by = join_by(Area, Type)`

t <- t %>% arrange(desc(`Cases of tonogenesis`))

tbl <- t %>% select(-`Number of languages`) %>% pivot_wider(names_from = Type, values_from = `Cases of tonogenesis`)
tbl

## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset  coda count syllable nucleus stress other  `NA`
##   <chr>         <int> <int> <int>    <int>   <int>  <int> <int> <int>
## 1 Asia            116    35     3        3       4      2     2    NA
## 2 North America     1    19     5        5       8      4     2    NA
## 3 Papunesia         9     1     4        4       3      1    NA    NA
## 4 Europe           NA     8     8        8       1      5     1    NA
## 5 Africa            7    NA     6        6       7     NA     1    NA
## 6 South America    NA     6     1        1       3     NA    NA    NA
## 7 <NA>             NA     1    NA       NA       1     NA    NA     1

# print(xtable(tbl, type = "latex", caption="Tonogenesis events by area"), include.rownames=FALSE)

tbl <- t %>% select(-`Cases of tonogenesis`) %>% pivot_wider(names_from = Type, values_from = `Number of languages`)
tbl

## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset  coda count syllable nucleus stress other  `NA`
##   <chr>         <int> <int> <int>    <int>   <int>  <int> <int> <int>
## 1 Asia             29    15     3        3       2      1     1    NA
## 2 North America     1    16     3        3       5      2     2    NA
## 3 Papunesia         6     1     3        3       2      1    NA    NA
## 4 Europe           NA     5     6        6       1      4     1    NA
## 5 Africa            5    NA     4        4       4     NA     1    NA
## 6 South America    NA     5     1        1       1     NA    NA    NA
## 7 <NA>             NA     1    NA       NA       1     NA    NA     1

# print(xtable(tbl, type = "latex", caption="Languages with tonogenesis events by area"), include.rownames=FALSE)

t$both_cases <- paste0(t$`Cases of tonogenesis`, " (", t$`Number of languages`, ")")
tbl <- t %>% select(-`Cases of tonogenesis`, -`Number of languages`) %>% pivot_wider(names_from = Type, values_from = both_cases)
tbl

## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset    coda    count syllable nucleus stress other `NA` 
##   <chr>         <chr>    <chr>   <chr> <chr>    <chr>   <chr>  <chr> <chr>
## 1 Asia          116 (29) 35 (15) 3 (3) 3 (3)    4 (2)   2 (1)  2 (1) <NA> 
## 2 North America 1 (1)    19 (16) 5 (3) 5 (3)    8 (5)   4 (2)  2 (2) <NA> 
## 3 Papunesia     9 (6)    1 (1)   4 (3) 4 (3)    3 (2)   1 (1)  <NA>  <NA> 
## 4 Europe        <NA>     8 (5)   8 (6) 8 (6)    1 (1)   5 (4)  1 (1) <NA> 
## 5 Africa        7 (5)    <NA>    6 (4) 6 (4)    7 (4)   <NA>   1 (1) <NA> 
## 6 South America <NA>     6 (5)   1 (1) 1 (1)    3 (1)   <NA>   <NA>  <NA> 
## 7 <NA>          <NA>     1 (1)   <NA>  <NA>     1 (1)   <NA>   <NA>  1 (1)

# print(xtable(tbl, type = "latex", caption="Tonogenesis events (languages) by area"), include.rownames=FALSE)

m <- tonodb %>% select(Latitude, Longitude, LanguageVariety, Type) %>% distinct() %>% separate_rows(Type)
ggplot(data=m, aes(x=Longitude, y=Latitude, color=Type)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()

## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).

Multiple paths to the same result

Chord diagrams showing the relative frequencies between type of tonogenetic events (left) and their effect on various factors.

x <- tonodb %>% select(Type, Height, Ordering) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height, Ordering) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type', 'Height'. You can override using
## the `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))

ggplot(data = x,
       aes(axis1 = Type, axis2 = Height, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Height) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))

ggplot(data = x,
       aes(axis1 = Height, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, EffectOnPitch, Ordering) %>% filter(!is.na(EffectOnPitch)) %>% separate_rows(Type)
x <- x %>% group_by(Type, EffectOnPitch, Ordering) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type', 'EffectOnPitch'. You can override
## using the `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x <- x %>% filter(Count > 1) %>% filter(Type != "other")
x %>% kable()

Type	EffectOnPitch	Ordering	Count	Freq
onset	elevating	Broad - split	39	0.1529412
onset	lowering	Broad - split	39	0.1529412
coda	falling	Unclear	14	0.0549020
onset	mid	Broad - split	10	0.0392157
onset	elevating	Possibly Strict	8	0.0313725
onset	lowering	Possibly Strict	8	0.0313725
coda	falling	Possibly Strict	5	0.0196078
coda	elevating	Unclear	4	0.0156863
coda	falling	Broad - split	4	0.0156863
coda	rising	Unclear	4	0.0156863
nucleus	elevating	Broad - split	4	0.0156863
nucleus	lowering	Broad - split	4	0.0156863
onset	elevating	Unclear	4	0.0156863
coda	elevating	Possibly Strict	3	0.0117647
coda	lowering	Broad	3	0.0117647
coda	lowering	Possibly Strict	3	0.0117647
count	falling	Unclear	3	0.0117647
nucleus	elevating	Possibly Strict	3	0.0117647
nucleus	elevating	Unclear	3	0.0117647
nucleus	lowering	Possibly Strict	3	0.0117647
onset	falling	Broad - split	3	0.0117647
onset	lowering	Broad	3	0.0117647
syllable	falling	Unclear	3	0.0117647
coda	falling	Strict	2	0.0078431
coda	level	Strict	2	0.0078431
coda	level	Unclear	2	0.0078431
coda	rising	Possibly Strict	2	0.0078431
coda	rising	Strict	2	0.0078431
count	elevating	Unclear	2	0.0078431
nucleus	elevating	Broad	2	0.0078431
nucleus	lowering	Broad	2	0.0078431
onset	elevating	Strict	2	0.0078431
onset	lowering	Strict	2	0.0078431
onset	lowering	Unclear	2	0.0078431
onset	rising	Broad - split	2	0.0078431
onset	rising	Unclear	2	0.0078431
stress	rising	Unclear	2	0.0078431
syllable	elevating	Unclear	2	0.0078431

x %>% filter(!(Type %in% c("count", "stress", "syllable"))) %>%
  filter(!(EffectOnPitch %in% c("level", "mid"))) %>%
ggplot(aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

w <- x %>% filter(!(Type %in% c("count", "stress", "syllable"))) %>%
  filter(!(EffectOnPitch %in% c("level", "mid")))
w$Type <- factor(w$Type, levels=c("onset", "nucleus", "coda"))
w$EffectOnPitch <- factor(w$EffectOnPitch, levels=c("elevating", "lowering", "rising", "falling"))

w %>%
ggplot(aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Contour) %>% filter(!is.na(Contour)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Contour) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x %>% kable()

Type	Contour	Count	Freq
coda	falling	26	0.2888889
onset	rising	11	0.1222222
onset	falling	9	0.1000000
onset	level	9	0.1000000
coda	rising	7	0.0777778
coda	level	5	0.0555556
count	falling	5	0.0555556
syllable	falling	5	0.0555556
stress	rising	3	0.0333333
count	rising	2	0.0222222
syllable	rising	2	0.0222222
count	rising-falling	1	0.0111111
nucleus	falling	1	0.0111111
nucleus	rising	1	0.0111111
nucleus	rising-falling	1	0.0111111
other	falling	1	0.0111111
syllable	rising-falling	1	0.0111111

ggplot(data = x,
       aes(axis1 = Contour, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

ggplot(data = x,
       aes(axis1 = Type, axis2 = Contour, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, EffectOnPitch) %>% filter(!is.na(EffectOnPitch)) %>% separate_rows(Type)
x <- x %>% group_by(Type, EffectOnPitch) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x <- x %>% filter(Count > 1) %>% filter(Type != "other")
x %>% kable()

Type	EffectOnPitch	Count	Freq
onset	elevating	54	0.2117647
onset	lowering	54	0.2117647
coda	falling	26	0.1019608
nucleus	elevating	12	0.0470588
nucleus	lowering	10	0.0392157
onset	mid	10	0.0392157
coda	elevating	9	0.0352941
coda	lowering	9	0.0352941
coda	rising	8	0.0313725
coda	level	5	0.0196078
count	falling	5	0.0196078
onset	rising	5	0.0196078
syllable	falling	5	0.0196078
count	elevating	4	0.0156863
onset	falling	4	0.0156863
syllable	elevating	4	0.0156863
count	lowering	3	0.0117647
syllable	lowering	3	0.0117647
stress	elevating	2	0.0078431
stress	lowering	2	0.0078431
stress	rising	2	0.0078431

ggplot(data = x,
       aes(axis1 = EffectOnPitch, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

ggplot(data = x,
       aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Height) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height) %>% summarize(Count = n())

## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.

x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))

ggplot(x, aes(x=Height, y=Type, fill = Freq)) + 
  geom_tile() +
  theme_bw() +
  scale_y_discrete(limits = c("other", "wordtype", "stress", "nucleus", "coda", "onset"))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_tile()`).

ggplot(x, aes(x=Type, y=Height, fill = Freq)) + 
  geom_tile() +
  theme_bw() +
  scale_x_discrete(limits = c("onset", "coda", "nucleus", "stress", "wordtype", "other")) +
  scale_y_discrete(limits = c("mid", "low", "high"))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_tile()`).

Patterns in level vs contour height

It is more common for onset tonogenesis to have a elevating or lowering effect, and more common for coda tonogenesis to have a rising or falling effect.

type_height <- tonodb %>% select(Type, Height) %>% separate_rows(Type)
type_countour <- tonodb %>% select(Type, Contour) %>% separate_rows(Type)
table(type_height)

##           Height
## Type       high low mid
##   coda       14  10   0
##   count       5   3   0
##   nucleus    14   7   1
##   onset      27  32  17
##   other       3   1   0
##   stress      3   2   1
##   syllable    5   3   0

table(type_countour)

##           Contour
## Type       falling level rising rising-falling
##   coda          26     5      7              0
##   count          5     0      2              1
##   nucleus        1     0      1              1
##   onset          9     9     11              0
##   other          1     0      0              0
##   stress         0     0      3              0
##   syllable       5     0      2              1

th <- data.frame(unclass(table(type_height$Type, type_height$Height)))
tc <- data.frame(unclass(table(type_countour$Type, type_countour$Contour)))

th <- tibble::rownames_to_column(th, "Type")
tc <- tibble::rownames_to_column(tc, "Type")

tmp <- left_join(th, tc)

## Joining with `by = join_by(Type)`

tmp <- tmp %>% arrange(desc(high))
tmp %>% kable()

Type	high	low	mid	falling	level	rising	rising.falling
onset	27	32	17	9	9	11	0
coda	14	10	0	26	5	7	0
nucleus	14	7	1	1	0	1	1
count	5	3	0	5	0	2	1
syllable	5	3	0	5	0	2	1
other	3	1	0	1	0	0	0
stress	3	2	1	0	0	3	0

# print(xtable(tmp, type = "latex", caption=""), include.rownames=FALSE)

tmp <- tmp %>% rowwise() %>% mutate(height = sum(c(high, low, mid)))
tmp <- tmp %>% rowwise() %>% mutate(contour = sum(c(falling, level, rising, rising.falling)))
tmp

## # A tibble: 7 × 10
## # Rowwise: 
##   Type      high   low   mid falling level rising rising.falling height contour
##   <chr>    <int> <int> <int>   <int> <int>  <int>          <int>  <int>   <int>
## 1 onset       27    32    17       9     9     11              0     76      29
## 2 coda        14    10     0      26     5      7              0     24      38
## 3 nucleus     14     7     1       1     0      1              1     22       3
## 4 count        5     3     0       5     0      2              1      8       8
## 5 syllable     5     3     0       5     0      2              1      8       8
## 6 other        3     1     0       1     0      0              0      4       1
## 7 stress       3     2     1       0     0      3              0      6       3

t <- tmp %>% select(Type, height, contour)
t %>% kable()

Type	height	contour
onset	76	29
coda	24	38
nucleus	22	3
count	8	8
syllable	8	8
other	4	1
stress	6	3

print(xtable(t, type = "latex", caption=""), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrr}
##   \hline
## Type & height & contour \\ 
##   \hline
## onset &  76 &  29 \\ 
##   coda &  24 &  38 \\ 
##   nucleus &  22 &   3 \\ 
##   count &   8 &   8 \\ 
##   syllable &   8 &   8 \\ 
##   other &   4 &   1 \\ 
##   stress &   6 &   3 \\ 
##    \hline
## \end{tabular}
## \caption{} 
## \end{table}

New tables for revise and resubmit

Strict vs broad.

# table(tonodb$Ordering, exclude = FALSE)
table(tonodb$Ordering)

## 
##           Broad   Broad - split Possibly Strict          Strict         Unclear 
##              20             116              43              17              62

t <- data.frame(table(tonodb$Ordering))
t <- t %>% rename(Ordering = Var1, Count = Freq)
t

##          Ordering Count
## 1           Broad    20
## 2   Broad - split   116
## 3 Possibly Strict    43
## 4          Strict    17
## 5         Unclear    62

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lr}
##   \hline
## Ordering & Count \\ 
##   \hline
## Broad &  20 \\ 
##   Broad - split & 116 \\ 
##   Possibly Strict &  43 \\ 
##   Strict &  17 \\ 
##   Unclear &  62 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis} 
## \end{table}

Plus something like this, where the numbers outside the parenthesis represent cases, and numbers in parenthesis are languages.

tonodb %>% select(Type, Ordering)

## # A tibble: 259 × 2
##    Type                 Ordering       
##    <chr>                <chr>          
##  1 onset                Broad - split  
##  2 onset                Broad - split  
##  3 coda                 Possibly Strict
##  4 coda                 Unclear        
##  5 coda                 Strict         
##  6 coda, onset          Possibly Strict
##  7 coda                 Broad          
##  8 coda, syllable-count Unclear        
##  9 coda                 Possibly Strict
## 10 coda, syllable-count Unclear        
## # ℹ 249 more rows

table(tonodb$Type, tonodb$Ordering)

##                          
##                           Broad Broad - split Possibly Strict Strict Unclear
##   coda                        6             7              10     10      24
##   coda, nucleus               0             0               4      0       2
##   coda, onset                 0             0               1      0       1
##   coda, syllable-count        0             1               0      0       3
##   nucleus                     3             8               2      0       5
##   nucleus, coda               0             0               1      0       0
##   nucleus, onset              1             0               0      0       0
##   onset                       4            98              14      5       7
##   onset, other                0             0               2      0       0
##   other                       0             0               1      0       3
##   stress                      2             0               4      0       6
##   syllable-count              4             2               3      2      11
##   syllable-count, nucleus     0             0               1      0       0

t <- data.frame(unclass(table(tonodb$Type, tonodb$Ordering))) %>% rownames_to_column()
t <- t %>% rename(Type = rowname)
t

##                       Type Broad Broad...split Possibly.Strict Strict Unclear
## 1                     coda     6             7              10     10      24
## 2            coda, nucleus     0             0               4      0       2
## 3              coda, onset     0             0               1      0       1
## 4     coda, syllable-count     0             1               0      0       3
## 5                  nucleus     3             8               2      0       5
## 6            nucleus, coda     0             0               1      0       0
## 7           nucleus, onset     1             0               0      0       0
## 8                    onset     4            98              14      5       7
## 9             onset, other     0             0               2      0       0
## 10                   other     0             0               1      0       3
## 11                  stress     2             0               4      0       6
## 12          syllable-count     4             2               3      2      11
## 13 syllable-count, nucleus     0             0               1      0       0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Broad & Broad...split & Possibly.Strict & Strict & Unclear \\ 
##   \hline
## coda &   6 &   7 &  10 &  10 &  24 \\ 
##   coda, nucleus &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, syllable-count &   0 &   1 &   0 &   0 &   3 \\ 
##   nucleus &   3 &   8 &   2 &   0 &   5 \\ 
##   nucleus, coda &   0 &   0 &   1 &   0 &   0 \\ 
##   nucleus, onset &   1 &   0 &   0 &   0 &   0 \\ 
##   onset &   4 &  98 &  14 &   5 &   7 \\ 
##   onset, other &   0 &   0 &   2 &   0 &   0 \\ 
##   other &   0 &   0 &   1 &   0 &   3 \\ 
##   stress &   2 &   0 &   4 &   0 &   6 \\ 
##   syllable-count &   4 &   2 &   3 &   2 &  11 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class} 
## \end{table}

Numbers in parenthesis are languages.

tmp <- tonodb %>% select(Type, Ordering, Language_ID) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Ordering))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##                       Type Broad Broad...split Possibly.Strict Strict Unclear
## 1                     coda     6             3               7      3      19
## 2            coda, nucleus     0             0               1      0       1
## 3              coda, onset     0             0               1      0       1
## 4     coda, syllable-count     0             1               0      0       3
## 5                  nucleus     2             4               1      0       4
## 6            nucleus, coda     0             0               1      0       0
## 7           nucleus, onset     1             0               0      0       0
## 8                    onset     3            17               7      3       5
## 9             onset, other     0             0               1      0       0
## 10                   other     0             0               1      0       3
## 11                  stress     1             0               2      0       5
## 12          syllable-count     4             1               3      1       7
## 13 syllable-count, nucleus     0             0               1      0       0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by language"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Broad & Broad...split & Possibly.Strict & Strict & Unclear \\ 
##   \hline
## coda &   6 &   3 &   7 &   3 &  19 \\ 
##   coda, nucleus &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, syllable-count &   0 &   1 &   0 &   0 &   3 \\ 
##   nucleus &   2 &   4 &   1 &   0 &   4 \\ 
##   nucleus, coda &   0 &   0 &   1 &   0 &   0 \\ 
##   nucleus, onset &   1 &   0 &   0 &   0 &   0 \\ 
##   onset &   3 &  17 &   7 &   3 &   5 \\ 
##   onset, other &   0 &   0 &   1 &   0 &   0 \\ 
##   other &   0 &   0 &   1 &   0 &   3 \\ 
##   stress &   1 &   0 &   2 &   0 &   5 \\ 
##   syllable-count &   4 &   1 &   3 &   1 &   7 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by language} 
## \end{table}

Class by type by area.

tmp <- tonodb %>% select(Type, Ordering, Macroarea)
t <- data.frame(unclass(table(tmp$Type, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##                       Type Africa Eurasia North.America Papunesia South.America
## 1                     coda      0      39            14         0             4
## 2            coda, nucleus      0       0             4         0             2
## 3              coda, onset      0       0             1         1             0
## 4     coda, syllable-count      0       4             0         0             0
## 5                  nucleus      7       5             3         2             1
## 6            nucleus, coda      0       0             0         0             0
## 7           nucleus, onset      0       0             0         1             0
## 8                    onset      7      91             0         7             0
## 9             onset, other      0       2             0         0             0
## 10                   other      1       1             2         0             0
## 11                  stress      0       7             4         1             0
## 12          syllable-count      6       7             4         4             1
## 13 syllable-count, nucleus      0       0             1         0             0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by macroarea"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  39 &  14 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   4 &   0 &   0 &   0 \\ 
##   nucleus &   7 &   5 &   3 &   2 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   7 &  91 &   0 &   7 &   0 \\ 
##   onset, other &   0 &   2 &   0 &   0 &   0 \\ 
##   other &   1 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   7 &   4 &   1 &   0 \\ 
##   syllable-count &   6 &   7 &   4 &   4 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by macroarea (macroarea collapses Asia and Europe). All rows.

tmp <- tonodb %>% select(Type, Area)
t <- data.frame(unclass(table(tmp$Type, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##                       Type Africa Asia Europe North.America Papunesia
## 1                     coda      0   32      7            14         0
## 2            coda, nucleus      0    0      0             4         0
## 3              coda, onset      0    0      0             1         1
## 4     coda, syllable-count      0    3      1             0         0
## 5                  nucleus      7    4      1             3         2
## 6            nucleus, coda      0    0      0             0         0
## 7           nucleus, onset      0    0      0             0         1
## 8                    onset      7  114      0             0         7
## 9             onset, other      0    2      0             0         0
## 10                   other      1    0      1             2         0
## 11                  stress      0    2      5             4         1
## 12          syllable-count      6    0      7             4         4
## 13 syllable-count, nucleus      0    0      0             1         0
##    South.America
## 1              4
## 2              2
## 3              0
## 4              0
## 5              1
## 6              0
## 7              0
## 8              0
## 9              0
## 10             0
## 11             0
## 12             1
## 13             0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis per macroarea"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  32 &   7 &  14 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   3 &   1 &   0 &   0 &   0 \\ 
##   nucleus &   7 &   4 &   1 &   3 &   2 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   7 & 114 &   0 &   0 &   7 &   0 \\ 
##   onset, other &   0 &   2 &   0 &   0 &   0 &   0 \\ 
##   other &   1 &   0 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   2 &   5 &   4 &   1 &   0 \\ 
##   syllable-count &   6 &   0 &   7 &   4 &   4 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis per macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by macroarea (macroarea collapses Asia and Europe). Per language.

tmp <- tonodb %>% select(Type, LanguageVariety, Area) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##                       Type Africa Asia Europe North.America Papunesia
## 1                     coda      0   15      5            13         0
## 2            coda, nucleus      0    0      0             2         0
## 3              coda, onset      0    0      0             1         1
## 4     coda, syllable-count      0    3      1             0         0
## 5                  nucleus      4    2      1             2         1
## 6            nucleus, coda      0    0      0             0         0
## 7           nucleus, onset      0    0      0             0         1
## 8                    onset      5   28      0             0         4
## 9             onset, other      0    1      0             0         0
## 10                   other      1    0      1             2         0
## 11                  stress      0    1      4             2         1
## 12          syllable-count      4    0      5             3         3
## 13 syllable-count, nucleus      0    0      0             1         0
##    South.America
## 1              4
## 2              1
## 3              0
## 4              0
## 5              1
## 6              0
## 7              0
## 8              0
## 9              0
## 10             0
## 11             0
## 12             1
## 13             0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by language by macroarea"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  15 &   5 &  13 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   0 &   2 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   3 &   1 &   0 &   0 &   0 \\ 
##   nucleus &   4 &   2 &   1 &   2 &   1 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   5 &  28 &   0 &   0 &   4 &   0 \\ 
##   onset, other &   0 &   1 &   0 &   0 &   0 &   0 \\ 
##   other &   1 &   0 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   1 &   4 &   2 &   1 &   0 \\ 
##   syllable-count &   4 &   0 &   5 &   3 &   3 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by language by macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by language by macroarea.

tmp <- tonodb %>% select(Type, Ordering, Macroarea) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##                       Type Africa Eurasia North.America Papunesia South.America
## 1                     coda      0       5             2         0             1
## 2            coda, nucleus      0       0             1         0             1
## 3              coda, onset      0       0             1         1             0
## 4     coda, syllable-count      0       2             0         0             0
## 5                  nucleus      2       2             2         1             1
## 6            nucleus, coda      0       0             0         0             0
## 7           nucleus, onset      0       0             0         1             0
## 8                    onset      2       5             0         2             0
## 9             onset, other      0       1             0         0             0
## 10                   other      1       1             2         0             0
## 11                  stress      0       3             2         1             0
## 12          syllable-count      3       3             3         2             1
## 13 syllable-count, nucleus      0       0             1         0             0

print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by language by macroarea"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &   5 &   2 &   0 &   1 \\ 
##   coda, nucleus &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   2 &   0 &   0 &   0 \\ 
##   nucleus &   2 &   2 &   2 &   1 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   2 &   5 &   0 &   2 &   0 \\ 
##   onset, other &   0 &   1 &   0 &   0 &   0 \\ 
##   other &   1 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   3 &   2 &   1 &   0 \\ 
##   syllable-count &   3 &   3 &   3 &   2 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by language by macroarea} 
## \end{table}

Type by rows by area.

tmp <- tonodb %>% select(Ordering, Area)
t <- data.frame(unclass(table(tmp$Ordering, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##              Type Africa Asia Europe North.America Papunesia South.America
## 1           Broad      5    4      4             0         3             4
## 2   Broad - split     13  102      1             0         0             0
## 3 Possibly Strict      0   16      1            18         7             0
## 4          Strict      0   15      0             2         0             0
## 5         Unclear      3   20     16            13         6             4

print(xtable(t, type = "latex", caption="Type by rows by area"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## Broad &   5 &   4 &   4 &   0 &   3 &   4 \\ 
##   Broad - split &  13 & 102 &   1 &   0 &   0 &   0 \\ 
##   Possibly Strict &   0 &  16 &   1 &  18 &   7 &   0 \\ 
##   Strict &   0 &  15 &   0 &   2 &   0 &   0 \\ 
##   Unclear &   3 &  20 &  16 &  13 &   6 &   4 \\ 
##    \hline
## \end{tabular}
## \caption{Type by rows by area} 
## \end{table}

Type by distinct languages.

tmp <- tonodb %>% select(Language_ID, Ordering, Macroarea) %>% distinct()
t <- data.frame(unclass(table(tmp$Ordering, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t

##              Type Africa Eurasia North.America Papunesia South.America
## 1           Broad      5       6             0         2             4
## 2   Broad - split      6      17             0         0             0
## 3 Possibly Strict      0       8            11         4             0
## 4          Strict      0       6             1         0             0
## 5         Unclear      2      19             9         4             2

print(xtable(t, type = "latex", caption="Type by distinct languages"), include.rownames=FALSE)

## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## Broad &   5 &   6 &   0 &   2 &   4 \\ 
##   Broad - split &   6 &  17 &   0 &   0 &   0 \\ 
##   Possibly Strict &   0 &   8 &  11 &   4 &   0 \\ 
##   Strict &   0 &   6 &   1 &   0 &   0 \\ 
##   Unclear &   2 &  19 &   9 &   4 &   2 \\ 
##    \hline
## \end{tabular}
## \caption{Type by distinct languages} 
## \end{table}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README_files/figure-gfm		README_files/figure-gfm
data		data
database_checks_files/figure-gfm		database_checks_files/figure-gfm
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
database_checks.Rmd		database_checks.Rmd
database_checks.md		database_checks.md
get_extra_codes.csv		get_extra_codes.csv
get_macroareas.csv		get_macroareas.csv
hand_coordinates.csv		hand_coordinates.csv
hand_macroareas.csv		hand_macroareas.csv
references.bib		references.bib

bambooforest/tono_db

Folders and files

Latest commit

History

Repository files navigation

Supplementary materials for: ‘Tonogenesis: a diachronic typology’

Overview

Setup

Basics of the database contents

Tables for the paper

Distribution of the languages, families and cases of tonogenesis across different areas

Number of languages in different families

Cases of tonogenesis sorted by triggering context

Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents

Tonogenesis triggered by coda consonants

Tonogenesis based on vowel length

Tonogenesis based on vowel length

Tonogenesis based on ATR

Effect of voicing on tone

Tonogenesis triggered by codas

Onset Voicing by effect on pitch

Effect of voicing on pitch

Effect of voice on pitch

Effect of coda glottal on pitch

Effect of vowel height on pitch

Effect of nucleus length on pitch

Effect of nuclear +/iATR on pitch

Number of cases/varieties of different types for each region

Africa

Asia

Europe

North America

Papunesia

South America

Area and tonogenesis specific tables

Onset aspiration in Asia

Coda glottal in Asia

Coda manner in Asia

Coda phonation type in Asia

Nucleus height in Asia

Onset voicing in Asia

Tonogenetic events by macroarea

Worldwide

Africa

Asia

Europe

North America

South America

Papunesia

Examples from the database for the paper

A table showing the number of cases/langauges for each type in each region

Multiple paths to the same result

Patterns in level vs contour height

New tables for revise and resubmit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages