Skip to content

bambooforest/tono_db

Repository files navigation

Supplementary materials for: ‘Tonogenesis: a diachronic typology’

Steven Moran, Etian Grossman and Lilja Maria Sæbø

26 April, 2025

Overview

Supplementary materials for “Tonogenesis: a diachronic typology” by Lilja Maria Sæbø, Eitan Grossman and Steven Moran, accepted in Diachronica.

The CLDF data are available here:

Setup

Load the libraries.

library(tidyverse)
library(knitr)
library(kableExtra)
library(xtable)
library(ggalluvial)

Load the tonodb CLDF data.

values <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/values.csv'))
languages <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/languages.csv'))
contributions <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/contributions.csv'))
parameters <- 
  read_csv(url('https://raw.githubusercontent.com/cldf-datasets/tonodb/main/cldf/parameters.csv'))

Basics of the database contents

We have this many languages in our sample.

nrow(languages)
## [1] 97

And this many observations.

nrow(values)
## [1] 259

Let’s map our data points. We note some rows are removed because the lat/long figures are NA due to them being listed as dialects or language families.

ggplot(data=languages, aes(x=Longitude, y=Latitude)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

These are the missing data points for geographic location.

languages %>% filter(is.na(Latitude)) %>% select(ID, Name, Macroarea, Latitude, Longitude) %>% kable()

ID

Name

Macroarea

Latitude

Longitude

atha1247

Athabaskan

NA

NA

NA

auks1239

Aukshtaitish

Eurasia

NA

NA

cant1236

Cantonese

Eurasia

NA

NA

cent2346

Central Tibetan

NA

NA

NA

coas1300

Coast Tsimshian

North America

NA

NA

east2280

Eastern Baltic

NA

NA

NA

extr1245

Extreme Southern New Caledonian

NA

NA

NA

kere1287

Keresan

NA

NA

NA

mang1393

Mangbetu-Asua

NA

NA

NA

metn1237

Metnyo

Papunesia

NA

NA

midd1319

Middle Franconian

Eurasia

NA

NA

moha1257

Mohawk-Oneida

NA

NA

NA

newc1243

New Caledonian

NA

NA

NA

nort3160

North Germanic

NA

NA

NA

podo1243

Podoko

NA

NA

NA

pwoo1239

Pwo

NA

NA

NA

raja1258

Raja Ampat Maya

NA

NA

NA

sind1278

Sindhi-Lahnda

NA

NA

NA

slav1255

Slavic

NA

NA

NA

taik1256

Tai-Kadai

NA

NA

NA

tere1281

Terena

South America

NA

NA

utsa1239

Lhasa Tibetan

Eurasia

NA

NA

yeni1252

Yeniseian

NA

NA

NA

zhuo1234

Zhuoni

Eurasia

NA

NA

We’ve gone through by hand and added approximate geocoordinates for visualization purposes, e.g., using Glottolog’s Swedish latitude and longitude for North Germanic.

Merge in the hand attributed geocoordinates.

# There must be a saner way to do this!
hc <- read_csv('hand_coordinates.csv')
tmp <- left_join(languages, hc, by=c("ID"="ID", "Name"="Name"))
tmp <- tmp %>% mutate(Latitude.x = coalesce(Latitude.x, Latitude.y))
tmp <- tmp %>% mutate(Longitude.x = coalesce(Longitude.x, Longitude.y))
tmp <- tmp %>% select(-Latitude.y, Longitude.y)
tmp <- tmp %>% rename(Latitude = Latitude.x)
tmp <- tmp %>% rename(Longitude = Longitude.x)
languages <- tmp

Redo the map.

ggplot(data=languages, aes(x=Longitude, y=Latitude)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()

Here we can add some color by language family.

ggplot(data=languages, aes(x=Longitude, y=Latitude, color=family_id)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw() +
  theme(legend.position="none")

  # ggtitle("Language varieties colored for language family")

How many data points per macroarea? (Note again several NAs.)

table(languages$Macroarea, exclude=FALSE)
## 
##        Africa       Eurasia North America     Papunesia South America 
##            11            40            17             7             6 
##          <NA> 
##            16

Some Glottolog macroareas are missing, e.g., languages that don’t have Glottocodes or are family level codes.

languages %>% filter(is.na(Macroarea))
## # A tibble: 16 × 18
##    ID       Name  Macroarea Latitude Longitude Glottocode ISO639P3code family_id
##    <chr>    <chr> <chr>        <dbl>     <dbl> <chr>      <chr>        <chr>    
##  1 atha1247 Atha… <NA>        60.5      -151.  atha1247   <NA>         atha1245 
##  2 cent2346 Cent… <NA>        28.4        90.2 cent2346   <NA>         sino1245 
##  3 east2280 East… <NA>        56.8        24.3 east2280   <NA>         indo1319 
##  4 extr1245 Extr… <NA>       -22.1       167.  extr1245   <NA>         aust1307 
##  5 kere1287 Kere… <NA>        35.5      -106.  kere1287   <NA>         <NA>     
##  6 mang1393 Mang… <NA>         0.268      27.3 mang1393   <NA>         cent2225 
##  7 moha1257 Moha… <NA>        43.7       -74.7 moha1257   <NA>         iroq1247 
##  8 newc1243 New … <NA>       -20.9       167.  newc1243   <NA>         aust1307 
##  9 nort3160 Nort… <NA>        59.8        17.4 nort3160   <NA>         indo1319 
## 10 podo1243 Podo… <NA>        10.9        14.0 podo1243   <NA>         afro1255 
## 11 pwoo1239 Pwo   <NA>        18.0        99.6 pwoo1239   <NA>         sino1245 
## 12 raja1258 Raja… <NA>        -0.173     130.  raja1258   <NA>         aust1307 
## 13 sind1278 Sind… <NA>        30.1        75.3 sind1278   <NA>         indo1319 
## 14 slav1255 Slav… <NA>        49.9        15.1 slav1255   <NA>         indo1319 
## 15 taik1256 Tai-… <NA>        24.1       110.  taik1256   <NA>         <NA>     
## 16 yeni1252 Yeni… <NA>        63.8        87.5 yeni1252   <NA>         <NA>     
## # ℹ 10 more variables: parent_id <chr>, bookkeeping <lgl>, level <chr>,
## #   description <lgl>, markup_description <lgl>, child_family_count <dbl>,
## #   child_language_count <dbl>, child_dialect_count <dbl>, country_ids <chr>,
## #   Longitude.y <dbl>
# tmp <- languages %>% filter(is.na(Macroarea)) %>% select(ID, Name, Macroarea)
# write_csv(tmp, 'get_macroareas.csv')

# There must be a saner way to do this!
hc <- read_csv('hand_macroareas.csv')
tmp <- left_join(languages, hc, by=c("ID"="ID", "Name"="Name"))
tmp <- tmp %>% mutate(Macroarea.x = coalesce(Macroarea.x, Macroarea.y))
tmp <- tmp %>% select(-Macroarea.y)
tmp <- tmp %>% rename(Macroarea = Macroarea.x)
languages <- tmp
table(languages$Macroarea, exclude = FALSE)
## 
##        Africa       Eurasia North America     Papunesia South America 
##            13            48            20            10             6

And a quick look at our areas.

contributions %>% filter(is.na(Area)) # 
## # A tibble: 1 × 9
##      ID Contributor  Citation      Glottocode LanguageVariety Family Area  Notes
##   <dbl> <chr>        <chr>         <chr>      <chr>           <chr>  <chr> <chr>
## 1   105 Lilja Saeboe Lilja Saeboe… <NA>       Montagnais      Algic  <NA>  <NA> 
## # ℹ 1 more variable: BibTex <chr>
table(contributions$Area, exclude=FALSE)
## 
##        Africa          Asia        Europe North America     Papunesia 
##            13            39            14            21            10 
## South America          <NA> 
##             6             1

Tables for the paper

Create tables for the paper. First merge the tonodb tables.

tonodb <- left_join(values, languages, by=c("Language_ID"="ID"))

# Reduce the Contributor table and get the TonoDB Area column
tmp <- contributions %>% select(ID, Family, Area)
tonodb <- left_join(tonodb, tmp, by=c("Inventory_ID"="ID"))

# tonodb %>% filter(is.na(family_id))

# Rename wordtype to syllable-count -- TODO replace when database is updated
tonodb <- tonodb %>% mutate(Type = str_replace(Type, "wordtype", "syllable"))

# Fix the mistakes (TODO: rerun the CLDF creation script, which will fix these typos below)
tonodb$Ordering <- str_replace(tonodb$Ordering, "broad", "Broad")
tonodb$Ordering <- str_replace(tonodb$Ordering, "strict", "Strict")
tonodb %>% filter(Ordering=="broad")
## # A tibble: 0 × 54
## # ℹ 54 variables: ID <dbl>, Parameter_ID <chr>, Value <chr>, Language_ID <chr>,
## #   Inventory_ID <dbl>, LanguageVariety <chr>, Ordering <chr>, Ongoing <chr>,
## #   TriggeringContext <chr>, Tone <chr>, Extra <chr>, Height <chr>,
## #   Contour <chr>, Phonation <chr>, ToneDescription <chr>, ChaoNumerals <chr>,
## #   RestrictedEnviroment <chr>, Notes <chr>, EffectOnPitch <chr>,
## #   ResultantSystem <chr>, Type <chr>, Onset <chr>, OnsetManner <chr>,
## #   OnsetVoicing <chr>, OnsetAspiration <chr>, Coda <chr>, …
tonodb %>% filter(is.na(Ordering))
## # A tibble: 1 × 54
##      ID Parameter_ID     Value Language_ID Inventory_ID LanguageVariety Ordering
##   <dbl> <chr>            <chr> <chr>              <dbl> <chr>           <chr>   
## 1   259 8D966B2253A9170… high  <NA>                  NA <NA>            <NA>    
## # ℹ 47 more variables: Ongoing <chr>, TriggeringContext <chr>, Tone <chr>,
## #   Extra <chr>, Height <chr>, Contour <chr>, Phonation <chr>,
## #   ToneDescription <chr>, ChaoNumerals <chr>, RestrictedEnviroment <chr>,
## #   Notes <chr>, EffectOnPitch <chr>, ResultantSystem <chr>, Type <chr>,
## #   Onset <chr>, OnsetManner <chr>, OnsetVoicing <chr>, OnsetAspiration <chr>,
## #   Coda <chr>, CodaPhonation <chr>, CodaGlottal <chr>, CodaManner <chr>,
## #   Stress <chr>, SyllableCount <chr>, NucleusATR <chr>, NucleusLength <chr>, …

Distribution of the languages, families and cases of tonogenesis across different areas

x <- tonodb %>% select(Area, Language_ID) %>% distinct() %>% group_by(Area) %>% summarise(Languages = n())
y <- tonodb %>% select(Area, family_id) %>% distinct() %>% group_by(Area) %>% summarize(Families = n())
z <- tonodb %>% select(Area, TriggeringContext) %>% group_by(Area) %>% summarize(`Cases of tonogenesis` = n())

tmp <- left_join(x, y)
tmp <- left_join(tmp, z)
tmp <- tmp %>% arrange(desc(`Cases of tonogenesis`))
tmp %>% kable()

Area

Languages

Families

Cases of tonogenesis

Asia

37

9

157

North America

20

10

33

Europe

12

2

22

Africa

13

5

21

Papunesia

10

1

16

South America

6

3

8

NA

1

1

2

# Still getting some NAs, let's drop them
table(tonodb$family_id, exclude = FALSE)
## 
## afro1255 algi1248 araw1281 atha1245 atla1278 aust1305 aust1307 cadd1255 
##        3        7        1        5        6       15       27        1 
## cent2225 chim1311 gong1255 hmon1336 indo1319 iroq1247 koma1264 kore1284 
##        4        1        2       10       23        8        6        2 
## maya1287 mong1349 nada1235 sino1245 taik1256 tsim1258 tuca1253 ural1272 
##        4        2        3       58       28        1        4        1 
## utoa1244 waka1280     <NA> 
##        2        2       33
tmp <- tmp %>% filter(!is.na(Area))
tmp %>% kable()

Area

Languages

Families

Cases of tonogenesis

Asia

37

9

157

North America

20

10

33

Europe

12

2

22

Africa

13

5

21

Papunesia

10

1

16

South America

6

3

8

print(xtable(tmp, type = "latex", caption="Distribution of the languages, families and cases of tonogenesis across different areas"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:40 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrr}
##   \hline
## Area & Languages & Families & Cases of tonogenesis \\ 
##   \hline
## Asia &  37 &   9 & 157 \\ 
##   North America &  20 &  10 &  33 \\ 
##   Europe &  12 &   2 &  22 \\ 
##   Africa &  13 &   5 &  21 \\ 
##   Papunesia &  10 &   1 &  16 \\ 
##   South America &   6 &   3 &   8 \\ 
##    \hline
## \end{tabular}
## \caption{Distribution of the languages, families and cases of tonogenesis across different areas} 
## \end{table}

Number of languages in different families

tmp <- tonodb %>% select(family_id, LanguageVariety) %>% distinct() %>% arrange(family_id, LanguageVariety) %>% group_by(family_id) %>% summarize(`Number of varieties` = n(), Languages = str_c(LanguageVariety, collapse=", "))

# We need the Glottolog family names
glottolog <- read_csv('data/languoid.csv')
families <- glottolog %>% filter(id %in% tmp$family_id) %>% select(id, name)
tmp <- left_join(tmp, families, by=c("family_id"="id"))
tmp <- tmp %>% select(name, `Number of varieties`, Languages)
tmp <- tmp %>% rename(Family = name)

tmp %>% kable()

Family

Number of varieties

Languages

Afro-Asiatic

2

Iraqw, Podoko

Algic

4

Arapaho, Cheyenne, Kickapoo, Maliseet-Passamaquoddy

Arawakan

1

Terena

Athabaskan-Eyak-Tlingit

3

Proto-Athabaskan (tonal dialects) group one, Proto-Athabaskan (tonal dialects) group two, Sanya-Henya Tlingit

Atlantic-Congo

5

Bantu D30, Bila, Kohumono, Moba, Nupe

Austroasiatic

5

Hu, U, Vietnamese, Wester Kammu, Western Kammu

Austronesian

12

Cem, Central North New Caledonian languages, Far South New Caledonian langauges, Magey Matbat, Metnyo Ambel, Moor, Phan Rang Cham, Pre-proto-North Huon Gulf, Proto-Maˈya, Samoan, Utsat, Yerisiam

Caddoan

1

Caddo

Central Sudanic

2

Languages of the Mangbetu-Asua subgroup with three tones, Western Lugbara

Chimakuan

1

Quileute

Ta-Ne-Omotic

2

Gimira, Shinasha

Hmong-Mien

1

White Hmong

Indo-European

14

Auktaitian dialects of Lithuanian, Central Franconian, Central Scandinavian, East Baltic (Latvian and Lithuanian), East Slesvig, Late Proto-Slavic, Latvian, Limburgish, Lithuanian, Proto-Nordic, Punjabi, Scottish gaelic (Bernera), West Baltic (Prussian, Zealand Danish

Iroquoian

3

Cherokee, Mohawk, Proto-Mohawk-Oneida

Koman

2

Proto-Gwama, Proto-Opo

Koreanic

1

Korean

Mayan

4

Mocho’, San Bartolo Tzotzil, Uspanteko, Yucatec

Mongolic-Khitan

1

Mongour

Naduhup

1

Eastern Naduhup

Sino-Tibetan

20

Baima Tibetan, Brokpa, Burmese, Cantonese, Chitabu (bwe), Dzongkha, Geba, Khaling, Kurtöp, Lahu, Lhasa Tibetan, Middle Chinese, Phlong, Pwo Karen, Rikeze Tibetan, Sgaw Karen, Tokpe Gola (Tibetan), T’ientsin, Zhibo Tibetan, Zhuoni Tibetan

Tai-Kadai

4

Nakhon Si Thammarat Thai, Proto-Tai, Shan, Yung Chiang Kam

Tsimshian

1

Coast Tsimshian

Tucanoan

4

Barasana, Kubeo, Máíhɨ̃ki, Tatuyo

Uralic

1

Estonian

Uto-Aztecan

1

Hopi

Wakashan

1

Heiltsuk

NA

9

NA

# print(xtable(tmp, type = "latex", caption="Number of languages in different language families"), include.rownames=FALSE)

Cases of tonogenesis sorted by triggering context

z <- tonodb %>% select(Type, LanguageVariety) %>% separate_rows(Type)
x <- z %>% group_by(Type) %>% summarize(`Cases of tonogenesis` = n()) %>% arrange()
y <- z %>% select(Type, LanguageVariety) %>% distinct() %>% group_by(Type) %>% summarize(`Number of languages` = n()) %>% arrange()

tmp <- left_join(x, y)
## Joining with `by = join_by(Type)`
tmp <- tmp %>% arrange(desc(`Cases of tonogenesis`))

# Remove NAs
# tmp <- tmp %>% filter(!is.na(Type))
# tmp %>% kable()

# rename to syllable-count
tmp <- tmp %>% mutate(Type = str_replace(Type, "syllable", "syllable-count"))

tmp %>% kable()

Type

Cases of tonogenesis

Number of languages

onset

133

41

coda

70

43

count

27

20

nucleus

27

16

syllable-count

27

20

stress

12

8

other

6

5

NA

1

1

print(xtable(tmp, type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrr}
##   \hline
## Type & Cases of tonogenesis & Number of languages \\ 
##   \hline
## onset & 133 &  41 \\ 
##   coda &  70 &  43 \\ 
##   count &  27 &  20 \\ 
##   nucleus &  27 &  16 \\ 
##   syllable-count &  27 &  20 \\ 
##   stress &  12 &   8 \\ 
##   other &   6 &   5 \\ 
##    &   1 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{Cases of tonogenesis by category} 
## \end{table}

Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch)
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% filter(OnsetVoicing != "") %>% filter(EffectOnPitch != "")
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
#  filter(OnsetVoicing != "") %>% 
#  filter(EffectOnPitch != "") %>%
#  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))
# table(tmp)

# tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
#  filter(OnsetVoicing != "") %>% 
#  filter(EffectOnPitch != "") %>%
#  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))
# table(tmp)

tmp <- tonodb %>% select(OnsetVoicing, EffectOnPitch) %>% 
  filter(OnsetVoicing != "") %>% 
  filter(EffectOnPitch != "") %>%
  filter(OnsetVoicing %in% c("Voiced", "Voiceless"))

t <- data.frame(unclass(table(tmp$OnsetVoicing, tmp$EffectOnPitch)))
t <- t %>% select(lowering, mid, elevating, rising, falling)

print(xtable(t, type = "latex", caption="Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents"))
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrr}
##   \hline
##  & lowering & mid & elevating & rising & falling \\ 
##   \hline
## Voiced &  37 &   0 &  10 &   2 &   2 \\ 
##   Voiceless &  11 &   8 &  35 &   0 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{Tonogenesis conditioned by voiced and voiceless (unaspirated) obstruents} 
## \end{table}

Tonogenesis triggered by coda consonants

tmp <- tonodb %>% select(CodaGlottal, EffectOnPitch) %>%
  filter(!is.na(CodaGlottal)) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(EffectOnPitch %in% c("level", "rising", "falling"))
table(tmp) %>% kable()

falling

rising

/h/

2

0

/h/, glottal stop

2

1

glottal stop

4

3

glottalized

1

2

laryngeal

6

0

non-glottalized

1

0

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis triggered by coda consonants"))

Tonogenesis based on vowel length

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

rising-falling

rising, elevating

rising, lowering

-ATR

0

0

0

2

0

0

0

0

0

0

0

-ATR and non-high vowel

0

0

0

1

0

0

0

0

0

0

0

+ATR

2

0

0

0

0

0

0

0

0

0

0

+ATR and high vowel

1

0

0

0

0

0

0

0

0

0

0

high vowel

3

0

0

1

0

0

0

0

0

0

0

long vowel

1

1

0

1

0

0

0

1

1

0

0

low vowel

1

0

0

2

0

0

0

0

0

0

0

other

0

0

0

1

0

0

0

0

0

0

0

short vowel

3

0

0

1

0

0

0

0

0

0

0

short, long

1

0

0

1

0

0

0

0

0

0

0

short, long, glottalic

2

0

0

1

0

0

0

0

0

0

0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("long vowel", "short vowel"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on vowel length"))

Tonogenesis based on vowel length

high/low is relative.

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

rising-falling

rising, elevating

rising, lowering

-ATR

0

0

0

2

0

0

0

0

0

0

0

-ATR and non-high vowel

0

0

0

1

0

0

0

0

0

0

0

+ATR

2

0

0

0

0

0

0

0

0

0

0

+ATR and high vowel

1

0

0

0

0

0

0

0

0

0

0

high vowel

3

0

0

1

0

0

0

0

0

0

0

long vowel

1

1

0

1

0

0

0

1

1

0

0

low vowel

1

0

0

2

0

0

0

0

0

0

0

other

0

0

0

1

0

0

0

0

0

0

0

short vowel

3

0

0

1

0

0

0

0

0

0

0

short, long

1

0

0

1

0

0

0

0

0

0

0

short, long, glottalic

2

0

0

1

0

0

0

0

0

0

0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("high vowel", "low vowel"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on vowel height – high/low is relative"))

Tonogenesis based on ATR

High/low is relative.

table(tonodb$Nucleus, tonodb$EffectOnPitch) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

rising-falling

rising, elevating

rising, lowering

-ATR

0

0

0

2

0

0

0

0

0

0

0

-ATR and non-high vowel

0

0

0

1

0

0

0

0

0

0

0

+ATR

2

0

0

0

0

0

0

0

0

0

0

+ATR and high vowel

1

0

0

0

0

0

0

0

0

0

0

high vowel

3

0

0

1

0

0

0

0

0

0

0

long vowel

1

1

0

1

0

0

0

1

1

0

0

low vowel

1

0

0

2

0

0

0

0

0

0

0

other

0

0

0

1

0

0

0

0

0

0

0

short vowel

3

0

0

1

0

0

0

0

0

0

0

short, long

1

0

0

1

0

0

0

0

0

0

0

short, long, glottalic

2

0

0

1

0

0

0

0

0

0

0

tmp <- tonodb %>% select(Nucleus, EffectOnPitch) %>%
  filter(Nucleus != "") %>%
  filter(EffectOnPitch != "") %>%
  filter(Nucleus %in% c("+ATR", "-ATR"))

# print(xtable(table(tmp), type = "latex", caption="Tonogenesis based on ATR – high/low is relative"))

Effect of voicing on tone

In the DoTE (number of languages).

tmp <- tonodb %>% filter(Onset %in% c('voiceless', 'voiced'))
table(tmp$Onset, tmp$EffectOnPitch)
##            
##             elevating falling lowering rising
##   voiced            3       1       19      1
##   voiceless        15       0        2      0
t <- data.frame(unclass(table(tmp$Onset, tmp$EffectOnPitch)))
t <- t %>% select(lowering, elevating, rising, falling)
# print(xtable(t, type = "latex", caption="The effect of voicing on tone"))

Tonogenesis triggered by codas

In the DoTE (number of cases of tonogenesis).

# table(tonodb$Coda, tonodb$EffectOnPitch) %>% kable()

# tmp <- tonodb %>% select(Coda, EffectOnPitch) %>% filter_at(vars(Coda, EffectOnPitch),any_vars(!is.na(.)))
# table(tmp$Coda, tmp$EffectOnPitch) %>% kable()

# tmp <- tonodb %>% select(Coda, EffectOnPitch) %>% filter_at(vars(Coda, EffectOnPitch),all_vars(!is.na(.)))
# table(tmp$Coda, tmp$EffectOnPitch) %>% kable()

Onset Voicing by effect on pitch

tmp <- tonodb %>% select(OnsetAspiration, EffectOnPitch) %>%
  filter(OnsetAspiration != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

mid

rising

Aspirated

5

0

7

4

0

Aspirated, unaspirated

5

1

1

0

0

Breathy

0

0

0

0

1

Unaspirated

7

0

3

6

0

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, mid, elevating, falling, rising)

# print(xtable(t, type = "latex", caption="The effect of voicing on tone"))

Effect of voicing on pitch

tmp <- tonodb %>% select(CodaManner, EffectOnPitch) %>% separate_rows(CodaManner) %>%
  filter(CodaManner != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

level

lowering

rising

cluster

0

1

0

1

0

fricative

1

3

0

1

0

obstruent

3

4

0

1

1

open

0

0

3

0

0

sonorant

1

3

3

1

0

stop

2

4

0

4

3

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, level, elevating, rising, falling) %>% arrange(desc(lowering))
t %>% kable()

lowering

level

elevating

rising

falling

stop

4

0

2

3

4

cluster

1

0

0

0

1

fricative

1

0

1

0

3

obstruent

1

0

3

1

4

sonorant

1

3

1

0

3

open

0

3

0

0

0

print(xtable(t, type = "latex", caption="The effect of voicing on tone"))
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrrrr}
##   \hline
##  & lowering & level & elevating & rising & falling \\ 
##   \hline
## stop &   4 &   0 &   2 &   3 &   4 \\ 
##   cluster &   1 &   0 &   0 &   0 &   1 \\ 
##   fricative &   1 &   0 &   1 &   0 &   3 \\ 
##   obstruent &   1 &   0 &   3 &   1 &   4 \\ 
##   sonorant &   1 &   3 &   1 &   0 &   3 \\ 
##   open &   0 &   3 &   0 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{The effect of voicing on tone} 
## \end{table}

Effect of voice on pitch

tmp <- tonodb %>% select(CodaPhonation, EffectOnPitch) %>%
  filter(CodaPhonation != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

falling

lowering

rising

breathy

1

0

0

creaky

2

0

0

preaspirated

0

1

0

voiced

1

1

0

voiceless

2

0

1

print(xtable(table(tmp), type = "latex", caption="The effect of voice on pitch"))
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:41 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{rrrr}
##   \hline
##  & falling & lowering & rising \\ 
##   \hline
## breathy &   1 &   0 &   0 \\ 
##   creaky &   2 &   0 &   0 \\ 
##   preaspirated &   0 &   1 &   0 \\ 
##   voiced &   1 &   1 &   0 \\ 
##   voiceless &   2 &   0 &   1 \\ 
##    \hline
## \end{tabular}
## \caption{The effect of voice on pitch} 
## \end{table}

Effect of coda glottal on pitch

tmp <- tonodb %>% select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

rising

/h/

1

2

1

0

/h/, glottal stop

1

2

0

1

glottal stop

2

4

3

3

glottalized

3

1

1

2

glottalized, non-glottalized

1

0

1

0

laryngeal

0

6

0

0

non-glottalized

0

1

0

0

t <- data.frame(unclass(table(tmp)))
t <- t %>% select(lowering, elevating, falling, rising)

# print(xtable(t, type = "latex", caption="The effect of coda glottal on pitch"))

Effect of vowel height on pitch

tmp <- tonodb %>% select(Height, EffectOnPitch) %>%
  filter(Height != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

rising, elevating

rising, lowering

high

51

1

0

4

0

0

1

1

1

0

low

0

2

0

47

0

0

0

0

0

1

mid

8

1

1

5

1

2

0

1

0

0

# print(xtable(table(tmp), type = "latex", caption="The effect of vowel height on pitch"))
table(tmp) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

rising, elevating

rising, lowering

high

51

1

0

4

0

0

1

1

1

0

low

0

2

0

47

0

0

0

0

0

1

mid

8

1

1

5

1

2

0

1

0

0

Effect of nucleus length on pitch

tmp <- tonodb %>% select(NucleusLength, EffectOnPitch) %>%
  filter(NucleusLength != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

rising

rising-falling

long

1

1

1

1

1

short

3

0

1

0

0

# print(xtable(table(tmp), type = "latex", caption="The effect of nucleus length on pitch"))
table(tmp) %>% kable()

elevating

falling

lowering

rising

rising-falling

long

1

1

1

1

1

short

3

0

1

0

0

Effect of nuclear +/iATR on pitch

tmp <- tonodb %>% select(NucleusATR, EffectOnPitch) %>%
  filter(NucleusATR != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

lowering

-ATR

0

3

+ATR

3

0

# print(xtable(table(tmp), type = "latex", caption="The effect of nuclear +/- ATR on pitch"))
table(tmp) %>% kable()

elevating

lowering

-ATR

0

3

+ATR

3

0

Number of cases/varieties of different types for each region

Africa

tmp <- tonodb %>% filter(Area == "Africa") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

falling

lowering

tmp <- tonodb %>% filter(Area == "Africa") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()
# Nothing here
# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Africa"))

Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

falling

level

lowering

lowering, elevating

mid

no change

rising

/h/

1

2

0

0

0

0

0

0

glottal stop

1

4

0

0

0

0

0

3

glottalized

1

0

0

0

0

0

0

0

non-glottalized

0

0

0

0

0

0

0

0

tmp <- tonodb %>% filter(Area == "Asia") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

rising

/h/

1

2

0

glottal stop

1

4

3

glottalized

1

0

0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Asia"))

Europe

tmp <- tonodb %>% filter(Area == "Europe") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

falling

level

no change

rising

rising-falling

glottalized

1

0

0

0

2

0

non-glottalized

0

1

0

0

0

0

tmp <- tonodb %>% filter(Area == "Europe") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

rising

glottalized

1

0

2

non-glottalized

0

1

0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Europe"))

North America

tmp <- tonodb %>% filter(Area == "North America") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

falling

lowering

rising

rising, elevating

rising, lowering

/h/

0

0

1

0

0

0

/h/, glottal stop

1

2

0

1

0

0

glottalized

1

1

1

0

0

0

glottalized, non-glottalized

1

0

1

0

0

0

laryngeal

0

6

0

0

0

0

tmp <- tonodb %>% filter(Area == "North America") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

rising

/h/

0

0

1

0

/h/, glottal stop

1

2

0

1

glottalized

1

1

1

0

glottalized, non-glottalized

1

0

1

0

laryngeal

0

6

0

0

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for North America"))

Papunesia

tmp <- tonodb %>% filter(Area == "Papunesia") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

lowering

rising

tmp <- tonodb %>% filter(Area == "Papunesia") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()
# No results
# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for Papunesia"))

South America

tmp <- tonodb %>% filter(Area == "South America") %>% select(CodaGlottal, EffectOnPitch)
table(tmp) %>% kable()

elevating

falling

lowering

rising

glottal stop

1

0

3

0

tmp <- tonodb %>% filter(Area == "South America") %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

lowering

glottal stop

1

3

# print(xtable(table(tmp), type = "latex", caption="Number of cases/varieties of different tonogenesis types for South America"))

Area and tonogenesis specific tables

Onset aspiration in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(OnsetAspiration, EffectOnPitch) %>% 
  select(OnsetAspiration, EffectOnPitch) %>%
  filter(OnsetAspiration != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

mid

rising

Aspirated

3

0

6

4

0

Aspirated, unaspirated

5

1

1

0

0

Breathy

0

0

0

0

1

Unaspirated

7

0

1

6

0

# print(xtable(table(tmp), type = "latex", caption="Onset aspiration in Asia"))

Coda glottal in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaGlottal, EffectOnPitch) %>% 
  select(CodaGlottal, EffectOnPitch) %>%
  filter(CodaGlottal != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

rising

/h/

1

2

0

glottal stop

1

4

3

glottalized

1

0

0

# print(xtable(table(tmp), type = "latex", caption="Coda glottal in Asia"))

Coda manner in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaManner, EffectOnPitch) %>% 
  select(CodaManner, EffectOnPitch) %>%
  filter(CodaManner != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

level

lowering

rising

fricative

1

3

0

0

0

obstruent

1

2

0

0

0

open

0

0

1

0

0

sonorant

0

1

1

0

0

sonorant, open

0

0

2

0

0

stop

1

4

0

1

3

# print(xtable(table(tmp), type = "latex", caption="Coda manner in Asia"))

Coda phonation type in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(CodaPhonation, EffectOnPitch) %>% 
  select(CodaPhonation, EffectOnPitch) %>%
  filter(CodaPhonation != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

falling

lowering

breathy

1

0

voiced

0

1

voiceless

1

0

# print(xtable(table(tmp), type = "latex", caption="Coda phonation type in Asia"))

Nucleus height in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(NucleusHeight, EffectOnPitch) %>% 
  select(NucleusHeight, EffectOnPitch) %>%
  filter(NucleusHeight != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

lowering

High

1

0

Low

0

1

# print(xtable(table(tmp), type = "latex", caption="Nucleus height in Asia"))

Onset voicing in Asia

tmp <- tonodb %>% filter(Area == "Asia") %>% select(OnsetVoicing, EffectOnPitch) %>% 
  select(OnsetVoicing, EffectOnPitch) %>%
  filter(OnsetVoicing != "") %>%
  filter(EffectOnPitch != "")
table(tmp) %>% kable()

elevating

falling

lowering

lowering, elevating

mid

rising

sonorant

1

0

0

0

0

0

Voiced

10

2

32

0

0

2

Voiced, voiceless

4

0

3

1

2

0

Voiceless

31

1

11

0

8

0

# print(xtable(table(tmp), type = "latex", caption="Onset voicing in Asia"))

Tonogenetic events by macroarea

Worldwide

tmp <- tonodb %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Cases of tonogenesis` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of languages` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t <- t %>% arrange(desc(`Cases of tonogenesis`))

t %>% kable()

Type

Cases of tonogenesis

Number of languages

onset

133

41

coda

70

43

count

27

20

nucleus

27

16

syllable

27

20

stress

12

8

other

6

5

NA

1

1

# print(xtable(t, type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)

t(t) %>% kable()

Type

onset

coda

count

nucleus

syllable

stress

other

NA

Cases of tonogenesis

133

70

27

27

27

12

6

1

Number of languages

41

43

20

16

20

8

5

1

# print(xtable(t(t), type = "latex", caption="Cases of tonogenesis by category"), include.rownames=FALSE)

Africa

tmp <- tonodb  %>% filter(Area == "Africa") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

count

6

4

nucleus

7

4

onset

7

5

other

1

1

syllable

6

4

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Africa in the DTE"))
t(t) %>% kable()

Type

count

nucleus

onset

other

syllable

Number of cases

6

7

7

1

6

Number of varieties

4

4

5

1

4

Asia

tmp <- tonodb  %>% filter(Area == "Asia") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

coda

35

15

count

3

3

nucleus

4

2

onset

116

29

other

2

1

stress

2

1

syllable

3

3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Asia in the DTE"))
t(t) %>% kable()

Type

coda

count

nucleus

onset

other

stress

syllable

Number of cases

35

3

4

116

2

2

3

Number of varieties

15

3

2

29

1

1

3

Europe

tmp <- tonodb  %>% filter(Area == "Europe") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

coda

8

5

count

8

6

nucleus

1

1

other

1

1

stress

5

4

syllable

8

6

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Europe in the DTE"))
t(t) %>% kable()

Type

coda

count

nucleus

other

stress

syllable

Number of cases

8

8

1

1

5

8

Number of varieties

5

6

1

1

4

6

North America

tmp <- tonodb  %>% filter(Area == "North America") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

coda

19

16

count

5

3

nucleus

8

5

onset

1

1

other

2

2

stress

4

2

syllable

5

3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in North America in the DTE"), include.colnames=FALSE)
t(t) %>% kable()

Type

coda

count

nucleus

onset

other

stress

syllable

Number of cases

19

5

8

1

2

4

5

Number of varieties

16

3

5

1

2

2

3

South America

tmp <- tonodb  %>% filter(Area == "South America") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

coda

6

5

count

1

1

nucleus

3

1

syllable

1

1

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in South America in the DTE"))
t(t) %>% kable()

Type

coda

count

nucleus

syllable

Number of cases

6

1

3

1

Number of varieties

5

1

1

1

Papunesia

tmp <- tonodb  %>% filter(Area == "Papunesia") %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Type) %>% summarize(`Number of cases` = n())
varieties <- tmp %>% distinct() %>% group_by(Type) %>% summarize(`Number of varieties` = n())
t <- left_join(cases, varieties)
## Joining with `by = join_by(Type)`
t %>% kable()

Type

Number of cases

Number of varieties

coda

1

1

count

4

3

nucleus

3

2

onset

9

6

stress

1

1

syllable

4

3

# print(xtable(t(t), type = "latex", caption="Tonogenetic events in Papunesia in the DTE"))
t(t) %>% kable()

Type

coda

count

nucleus

onset

stress

syllable

Number of cases

1

4

3

9

1

4

Number of varieties

1

3

2

6

1

3

Examples from the database for the paper

tmp <- tonodb %>% select(ID, LanguageVariety, TriggeringContext, EffectOnPitch, Type ) %>% head(n=10)
tmp %>% kable()

ID

LanguageVariety

TriggeringContext

EffectOnPitch

Type

1

Vietnamese

Initial voiced stop + Falling tone

lowering

onset

2

Vietnamese

Initial voiceless stop + Falling tone

elevating

onset

3

Vietnamese

Final voiceless fricative

falling

coda

4

Punjabi

voiced aspirateed coda

falling

coda

5

Middle Chinese

final /h/

falling

coda

6

Cherokee

inital glide or final glottal consonant

falling

coda, onset

7

Lhasa Tibetan

final glottal stop

falling

coda

8

Khaling

Obstruent coda OR disyllable –> monosyllable

falling

coda, syllable-count

9

Proto-Mohawk-Oneida

lengthened accented vowel followed by a glottal stop or by * / h / plus a resonant consonant

falling

coda

10

Dzongkha

loss of a second syllable OR loss of a coda /-r/ or /-l/.

falling

coda, syllable-count

# print(xtable(tmp, type = "latex", caption="Example entries from the DTE"), include.rownames=FALSE)

A table showing the number of cases/langauges for each type in each region

tmp <- tonodb %>% select(Area, LanguageVariety, Type) %>% separate_rows(Type)
# tmp <- tonodb %>% select(LanguageVariety, Type) %>% separate_rows(Type)
cases <- tmp %>% group_by(Area, Type) %>% summarize(`Cases of tonogenesis` = n())
## `summarise()` has grouped output by 'Area'. You can override using the
## `.groups` argument.
varieties <- tmp %>% distinct() %>% group_by(Area, Type) %>% summarize(`Number of languages` = n())
## `summarise()` has grouped output by 'Area'. You can override using the
## `.groups` argument.
t <- left_join(cases, varieties)
## Joining with `by = join_by(Area, Type)`
t <- t %>% arrange(desc(`Cases of tonogenesis`))
tbl <- t %>% select(-`Number of languages`) %>% pivot_wider(names_from = Type, values_from = `Cases of tonogenesis`)
tbl
## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset  coda count syllable nucleus stress other  `NA`
##   <chr>         <int> <int> <int>    <int>   <int>  <int> <int> <int>
## 1 Asia            116    35     3        3       4      2     2    NA
## 2 North America     1    19     5        5       8      4     2    NA
## 3 Papunesia         9     1     4        4       3      1    NA    NA
## 4 Europe           NA     8     8        8       1      5     1    NA
## 5 Africa            7    NA     6        6       7     NA     1    NA
## 6 South America    NA     6     1        1       3     NA    NA    NA
## 7 <NA>             NA     1    NA       NA       1     NA    NA     1
# print(xtable(tbl, type = "latex", caption="Tonogenesis events by area"), include.rownames=FALSE)
tbl <- t %>% select(-`Cases of tonogenesis`) %>% pivot_wider(names_from = Type, values_from = `Number of languages`)
tbl
## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset  coda count syllable nucleus stress other  `NA`
##   <chr>         <int> <int> <int>    <int>   <int>  <int> <int> <int>
## 1 Asia             29    15     3        3       2      1     1    NA
## 2 North America     1    16     3        3       5      2     2    NA
## 3 Papunesia         6     1     3        3       2      1    NA    NA
## 4 Europe           NA     5     6        6       1      4     1    NA
## 5 Africa            5    NA     4        4       4     NA     1    NA
## 6 South America    NA     5     1        1       1     NA    NA    NA
## 7 <NA>             NA     1    NA       NA       1     NA    NA     1
# print(xtable(tbl, type = "latex", caption="Languages with tonogenesis events by area"), include.rownames=FALSE)
t$both_cases <- paste0(t$`Cases of tonogenesis`, " (", t$`Number of languages`, ")")
tbl <- t %>% select(-`Cases of tonogenesis`, -`Number of languages`) %>% pivot_wider(names_from = Type, values_from = both_cases)
tbl
## # A tibble: 7 × 9
## # Groups:   Area [7]
##   Area          onset    coda    count syllable nucleus stress other `NA` 
##   <chr>         <chr>    <chr>   <chr> <chr>    <chr>   <chr>  <chr> <chr>
## 1 Asia          116 (29) 35 (15) 3 (3) 3 (3)    4 (2)   2 (1)  2 (1) <NA> 
## 2 North America 1 (1)    19 (16) 5 (3) 5 (3)    8 (5)   4 (2)  2 (2) <NA> 
## 3 Papunesia     9 (6)    1 (1)   4 (3) 4 (3)    3 (2)   1 (1)  <NA>  <NA> 
## 4 Europe        <NA>     8 (5)   8 (6) 8 (6)    1 (1)   5 (4)  1 (1) <NA> 
## 5 Africa        7 (5)    <NA>    6 (4) 6 (4)    7 (4)   <NA>   1 (1) <NA> 
## 6 South America <NA>     6 (5)   1 (1) 1 (1)    3 (1)   <NA>   <NA>  <NA> 
## 7 <NA>          <NA>     1 (1)   <NA>  <NA>     1 (1)   <NA>   <NA>  1 (1)
# print(xtable(tbl, type = "latex", caption="Tonogenesis events (languages) by area"), include.rownames=FALSE)
m <- tonodb %>% select(Latitude, Longitude, LanguageVariety, Type) %>% distinct() %>% separate_rows(Type)
ggplot(data=m, aes(x=Longitude, y=Latitude, color=Type)) + 
  borders("world", colour="gray50", fill="gray50") + 
  geom_point() +
  theme_bw()
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).

Multiple paths to the same result

Chord diagrams showing the relative frequencies between type of tonogenetic events (left) and their effect on various factors.

x <- tonodb %>% select(Type, Height, Ordering) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height, Ordering) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type', 'Height'. You can override using
## the `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
ggplot(data = x,
       aes(axis1 = Type, axis2 = Height, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Height) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
ggplot(data = x,
       aes(axis1 = Height, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, EffectOnPitch, Ordering) %>% filter(!is.na(EffectOnPitch)) %>% separate_rows(Type)
x <- x %>% group_by(Type, EffectOnPitch, Ordering) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type', 'EffectOnPitch'. You can override
## using the `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x <- x %>% filter(Count > 1) %>% filter(Type != "other")
x %>% kable()

Type

EffectOnPitch

Ordering

Count

Freq

onset

elevating

Broad - split

39

0.1529412

onset

lowering

Broad - split

39

0.1529412

coda

falling

Unclear

14

0.0549020

onset

mid

Broad - split

10

0.0392157

onset

elevating

Possibly Strict

8

0.0313725

onset

lowering

Possibly Strict

8

0.0313725

coda

falling

Possibly Strict

5

0.0196078

coda

elevating

Unclear

4

0.0156863

coda

falling

Broad - split

4

0.0156863

coda

rising

Unclear

4

0.0156863

nucleus

elevating

Broad - split

4

0.0156863

nucleus

lowering

Broad - split

4

0.0156863

onset

elevating

Unclear

4

0.0156863

coda

elevating

Possibly Strict

3

0.0117647

coda

lowering

Broad

3

0.0117647

coda

lowering

Possibly Strict

3

0.0117647

count

falling

Unclear

3

0.0117647

nucleus

elevating

Possibly Strict

3

0.0117647

nucleus

elevating

Unclear

3

0.0117647

nucleus

lowering

Possibly Strict

3

0.0117647

onset

falling

Broad - split

3

0.0117647

onset

lowering

Broad

3

0.0117647

syllable

falling

Unclear

3

0.0117647

coda

falling

Strict

2

0.0078431

coda

level

Strict

2

0.0078431

coda

level

Unclear

2

0.0078431

coda

rising

Possibly Strict

2

0.0078431

coda

rising

Strict

2

0.0078431

count

elevating

Unclear

2

0.0078431

nucleus

elevating

Broad

2

0.0078431

nucleus

lowering

Broad

2

0.0078431

onset

elevating

Strict

2

0.0078431

onset

lowering

Strict

2

0.0078431

onset

lowering

Unclear

2

0.0078431

onset

rising

Broad - split

2

0.0078431

onset

rising

Unclear

2

0.0078431

stress

rising

Unclear

2

0.0078431

syllable

elevating

Unclear

2

0.0078431

x %>% filter(!(Type %in% c("count", "stress", "syllable"))) %>%
  filter(!(EffectOnPitch %in% c("level", "mid"))) %>%
ggplot(aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

w <- x %>% filter(!(Type %in% c("count", "stress", "syllable"))) %>%
  filter(!(EffectOnPitch %in% c("level", "mid")))
w$Type <- factor(w$Type, levels=c("onset", "nucleus", "coda"))
w$EffectOnPitch <- factor(w$EffectOnPitch, levels=c("elevating", "lowering", "rising", "falling"))

w %>%
ggplot(aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Ordering)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Contour) %>% filter(!is.na(Contour)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Contour) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x %>% kable()

Type

Contour

Count

Freq

coda

falling

26

0.2888889

onset

rising

11

0.1222222

onset

falling

9

0.1000000

onset

level

9

0.1000000

coda

rising

7

0.0777778

coda

level

5

0.0555556

count

falling

5

0.0555556

syllable

falling

5

0.0555556

stress

rising

3

0.0333333

count

rising

2

0.0222222

syllable

rising

2

0.0222222

count

rising-falling

1

0.0111111

nucleus

falling

1

0.0111111

nucleus

rising

1

0.0111111

nucleus

rising-falling

1

0.0111111

other

falling

1

0.0111111

syllable

rising-falling

1

0.0111111

ggplot(data = x,
       aes(axis1 = Contour, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

ggplot(data = x,
       aes(axis1 = Type, axis2 = Contour, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, EffectOnPitch) %>% filter(!is.na(EffectOnPitch)) %>% separate_rows(Type)
x <- x %>% group_by(Type, EffectOnPitch) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
x <- x %>% filter(Count > 1) %>% filter(Type != "other")
x %>% kable()

Type

EffectOnPitch

Count

Freq

onset

elevating

54

0.2117647

onset

lowering

54

0.2117647

coda

falling

26

0.1019608

nucleus

elevating

12

0.0470588

nucleus

lowering

10

0.0392157

onset

mid

10

0.0392157

coda

elevating

9

0.0352941

coda

lowering

9

0.0352941

coda

rising

8

0.0313725

coda

level

5

0.0196078

count

falling

5

0.0196078

onset

rising

5

0.0196078

syllable

falling

5

0.0196078

count

elevating

4

0.0156863

onset

falling

4

0.0156863

syllable

elevating

4

0.0156863

count

lowering

3

0.0117647

syllable

lowering

3

0.0117647

stress

elevating

2

0.0078431

stress

lowering

2

0.0078431

stress

rising

2

0.0078431

ggplot(data = x,
       aes(axis1 = EffectOnPitch, axis2 = Type, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

ggplot(data = x,
       aes(axis1 = Type, axis2 = EffectOnPitch, y = Count)) +
  geom_alluvium(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Survey", "Response"),
                   expand = c(0.15, 0.05)) +
  theme_void()

x <- tonodb %>% select(Type, Height) %>% filter(!is.na(Height)) %>% separate_rows(Type)
x <- x %>% group_by(Type, Height) %>% summarize(Count = n())
## `summarise()` has grouped output by 'Type'. You can override using the
## `.groups` argument.
x <- x %>% mutate(Freq = Count / sum(x$Count))
x <- x %>% arrange(desc(Count))
ggplot(x, aes(x=Height, y=Type, fill = Freq)) + 
  geom_tile() +
  theme_bw() +
  scale_y_discrete(limits = c("other", "wordtype", "stress", "nucleus", "coda", "onset"))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_tile()`).

ggplot(x, aes(x=Type, y=Height, fill = Freq)) + 
  geom_tile() +
  theme_bw() +
  scale_x_discrete(limits = c("onset", "coda", "nucleus", "stress", "wordtype", "other")) +
  scale_y_discrete(limits = c("mid", "low", "high"))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_tile()`).

Patterns in level vs contour height

It is more common for onset tonogenesis to have a elevating or lowering effect, and more common for coda tonogenesis to have a rising or falling effect.

type_height <- tonodb %>% select(Type, Height) %>% separate_rows(Type)
type_countour <- tonodb %>% select(Type, Contour) %>% separate_rows(Type)
table(type_height)
##           Height
## Type       high low mid
##   coda       14  10   0
##   count       5   3   0
##   nucleus    14   7   1
##   onset      27  32  17
##   other       3   1   0
##   stress      3   2   1
##   syllable    5   3   0
table(type_countour)
##           Contour
## Type       falling level rising rising-falling
##   coda          26     5      7              0
##   count          5     0      2              1
##   nucleus        1     0      1              1
##   onset          9     9     11              0
##   other          1     0      0              0
##   stress         0     0      3              0
##   syllable       5     0      2              1
th <- data.frame(unclass(table(type_height$Type, type_height$Height)))
tc <- data.frame(unclass(table(type_countour$Type, type_countour$Contour)))

th <- tibble::rownames_to_column(th, "Type")
tc <- tibble::rownames_to_column(tc, "Type")

tmp <- left_join(th, tc)
## Joining with `by = join_by(Type)`
tmp <- tmp %>% arrange(desc(high))
tmp %>% kable()

Type

high

low

mid

falling

level

rising

rising.falling

onset

27

32

17

9

9

11

0

coda

14

10

0

26

5

7

0

nucleus

14

7

1

1

0

1

1

count

5

3

0

5

0

2

1

syllable

5

3

0

5

0

2

1

other

3

1

0

1

0

0

0

stress

3

2

1

0

0

3

0

# print(xtable(tmp, type = "latex", caption=""), include.rownames=FALSE)

tmp <- tmp %>% rowwise() %>% mutate(height = sum(c(high, low, mid)))
tmp <- tmp %>% rowwise() %>% mutate(contour = sum(c(falling, level, rising, rising.falling)))
tmp
## # A tibble: 7 × 10
## # Rowwise: 
##   Type      high   low   mid falling level rising rising.falling height contour
##   <chr>    <int> <int> <int>   <int> <int>  <int>          <int>  <int>   <int>
## 1 onset       27    32    17       9     9     11              0     76      29
## 2 coda        14    10     0      26     5      7              0     24      38
## 3 nucleus     14     7     1       1     0      1              1     22       3
## 4 count        5     3     0       5     0      2              1      8       8
## 5 syllable     5     3     0       5     0      2              1      8       8
## 6 other        3     1     0       1     0      0              0      4       1
## 7 stress       3     2     1       0     0      3              0      6       3
t <- tmp %>% select(Type, height, contour)
t %>% kable()

Type

height

contour

onset

76

29

coda

24

38

nucleus

22

3

count

8

8

syllable

8

8

other

4

1

stress

6

3

print(xtable(t, type = "latex", caption=""), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrr}
##   \hline
## Type & height & contour \\ 
##   \hline
## onset &  76 &  29 \\ 
##   coda &  24 &  38 \\ 
##   nucleus &  22 &   3 \\ 
##   count &   8 &   8 \\ 
##   syllable &   8 &   8 \\ 
##   other &   4 &   1 \\ 
##   stress &   6 &   3 \\ 
##    \hline
## \end{tabular}
## \caption{} 
## \end{table}

New tables for revise and resubmit

Strict vs broad.

# table(tonodb$Ordering, exclude = FALSE)
table(tonodb$Ordering)
## 
##           Broad   Broad - split Possibly Strict          Strict         Unclear 
##              20             116              43              17              62
t <- data.frame(table(tonodb$Ordering))
t <- t %>% rename(Ordering = Var1, Count = Freq)
t
##          Ordering Count
## 1           Broad    20
## 2   Broad - split   116
## 3 Possibly Strict    43
## 4          Strict    17
## 5         Unclear    62
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lr}
##   \hline
## Ordering & Count \\ 
##   \hline
## Broad &  20 \\ 
##   Broad - split & 116 \\ 
##   Possibly Strict &  43 \\ 
##   Strict &  17 \\ 
##   Unclear &  62 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis} 
## \end{table}

Plus something like this, where the numbers outside the parenthesis represent cases, and numbers in parenthesis are languages.

tonodb %>% select(Type, Ordering)
## # A tibble: 259 × 2
##    Type                 Ordering       
##    <chr>                <chr>          
##  1 onset                Broad - split  
##  2 onset                Broad - split  
##  3 coda                 Possibly Strict
##  4 coda                 Unclear        
##  5 coda                 Strict         
##  6 coda, onset          Possibly Strict
##  7 coda                 Broad          
##  8 coda, syllable-count Unclear        
##  9 coda                 Possibly Strict
## 10 coda, syllable-count Unclear        
## # ℹ 249 more rows
table(tonodb$Type, tonodb$Ordering)
##                          
##                           Broad Broad - split Possibly Strict Strict Unclear
##   coda                        6             7              10     10      24
##   coda, nucleus               0             0               4      0       2
##   coda, onset                 0             0               1      0       1
##   coda, syllable-count        0             1               0      0       3
##   nucleus                     3             8               2      0       5
##   nucleus, coda               0             0               1      0       0
##   nucleus, onset              1             0               0      0       0
##   onset                       4            98              14      5       7
##   onset, other                0             0               2      0       0
##   other                       0             0               1      0       3
##   stress                      2             0               4      0       6
##   syllable-count              4             2               3      2      11
##   syllable-count, nucleus     0             0               1      0       0
t <- data.frame(unclass(table(tonodb$Type, tonodb$Ordering))) %>% rownames_to_column()
t <- t %>% rename(Type = rowname)
t
##                       Type Broad Broad...split Possibly.Strict Strict Unclear
## 1                     coda     6             7              10     10      24
## 2            coda, nucleus     0             0               4      0       2
## 3              coda, onset     0             0               1      0       1
## 4     coda, syllable-count     0             1               0      0       3
## 5                  nucleus     3             8               2      0       5
## 6            nucleus, coda     0             0               1      0       0
## 7           nucleus, onset     1             0               0      0       0
## 8                    onset     4            98              14      5       7
## 9             onset, other     0             0               2      0       0
## 10                   other     0             0               1      0       3
## 11                  stress     2             0               4      0       6
## 12          syllable-count     4             2               3      2      11
## 13 syllable-count, nucleus     0             0               1      0       0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Broad & Broad...split & Possibly.Strict & Strict & Unclear \\ 
##   \hline
## coda &   6 &   7 &  10 &  10 &  24 \\ 
##   coda, nucleus &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, syllable-count &   0 &   1 &   0 &   0 &   3 \\ 
##   nucleus &   3 &   8 &   2 &   0 &   5 \\ 
##   nucleus, coda &   0 &   0 &   1 &   0 &   0 \\ 
##   nucleus, onset &   1 &   0 &   0 &   0 &   0 \\ 
##   onset &   4 &  98 &  14 &   5 &   7 \\ 
##   onset, other &   0 &   0 &   2 &   0 &   0 \\ 
##   other &   0 &   0 &   1 &   0 &   3 \\ 
##   stress &   2 &   0 &   4 &   0 &   6 \\ 
##   syllable-count &   4 &   2 &   3 &   2 &  11 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class} 
## \end{table}

Numbers in parenthesis are languages.

tmp <- tonodb %>% select(Type, Ordering, Language_ID) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Ordering))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##                       Type Broad Broad...split Possibly.Strict Strict Unclear
## 1                     coda     6             3               7      3      19
## 2            coda, nucleus     0             0               1      0       1
## 3              coda, onset     0             0               1      0       1
## 4     coda, syllable-count     0             1               0      0       3
## 5                  nucleus     2             4               1      0       4
## 6            nucleus, coda     0             0               1      0       0
## 7           nucleus, onset     1             0               0      0       0
## 8                    onset     3            17               7      3       5
## 9             onset, other     0             0               1      0       0
## 10                   other     0             0               1      0       3
## 11                  stress     1             0               2      0       5
## 12          syllable-count     4             1               3      1       7
## 13 syllable-count, nucleus     0             0               1      0       0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by language"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Broad & Broad...split & Possibly.Strict & Strict & Unclear \\ 
##   \hline
## coda &   6 &   3 &   7 &   3 &  19 \\ 
##   coda, nucleus &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, syllable-count &   0 &   1 &   0 &   0 &   3 \\ 
##   nucleus &   2 &   4 &   1 &   0 &   4 \\ 
##   nucleus, coda &   0 &   0 &   1 &   0 &   0 \\ 
##   nucleus, onset &   1 &   0 &   0 &   0 &   0 \\ 
##   onset &   3 &  17 &   7 &   3 &   5 \\ 
##   onset, other &   0 &   0 &   1 &   0 &   0 \\ 
##   other &   0 &   0 &   1 &   0 &   3 \\ 
##   stress &   1 &   0 &   2 &   0 &   5 \\ 
##   syllable-count &   4 &   1 &   3 &   1 &   7 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by language} 
## \end{table}

Class by type by area.

tmp <- tonodb %>% select(Type, Ordering, Macroarea)
t <- data.frame(unclass(table(tmp$Type, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##                       Type Africa Eurasia North.America Papunesia South.America
## 1                     coda      0      39            14         0             4
## 2            coda, nucleus      0       0             4         0             2
## 3              coda, onset      0       0             1         1             0
## 4     coda, syllable-count      0       4             0         0             0
## 5                  nucleus      7       5             3         2             1
## 6            nucleus, coda      0       0             0         0             0
## 7           nucleus, onset      0       0             0         1             0
## 8                    onset      7      91             0         7             0
## 9             onset, other      0       2             0         0             0
## 10                   other      1       1             2         0             0
## 11                  stress      0       7             4         1             0
## 12          syllable-count      6       7             4         4             1
## 13 syllable-count, nucleus      0       0             1         0             0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by macroarea"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  39 &  14 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   4 &   0 &   0 &   0 \\ 
##   nucleus &   7 &   5 &   3 &   2 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   7 &  91 &   0 &   7 &   0 \\ 
##   onset, other &   0 &   2 &   0 &   0 &   0 \\ 
##   other &   1 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   7 &   4 &   1 &   0 \\ 
##   syllable-count &   6 &   7 &   4 &   4 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by macroarea (macroarea collapses Asia and Europe). All rows.

tmp <- tonodb %>% select(Type, Area)
t <- data.frame(unclass(table(tmp$Type, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##                       Type Africa Asia Europe North.America Papunesia
## 1                     coda      0   32      7            14         0
## 2            coda, nucleus      0    0      0             4         0
## 3              coda, onset      0    0      0             1         1
## 4     coda, syllable-count      0    3      1             0         0
## 5                  nucleus      7    4      1             3         2
## 6            nucleus, coda      0    0      0             0         0
## 7           nucleus, onset      0    0      0             0         1
## 8                    onset      7  114      0             0         7
## 9             onset, other      0    2      0             0         0
## 10                   other      1    0      1             2         0
## 11                  stress      0    2      5             4         1
## 12          syllable-count      6    0      7             4         4
## 13 syllable-count, nucleus      0    0      0             1         0
##    South.America
## 1              4
## 2              2
## 3              0
## 4              0
## 5              1
## 6              0
## 7              0
## 8              0
## 9              0
## 10             0
## 11             0
## 12             1
## 13             0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis per macroarea"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  32 &   7 &  14 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   0 &   4 &   0 &   2 \\ 
##   coda, onset &   0 &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   3 &   1 &   0 &   0 &   0 \\ 
##   nucleus &   7 &   4 &   1 &   3 &   2 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   7 & 114 &   0 &   0 &   7 &   0 \\ 
##   onset, other &   0 &   2 &   0 &   0 &   0 &   0 \\ 
##   other &   1 &   0 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   2 &   5 &   4 &   1 &   0 \\ 
##   syllable-count &   6 &   0 &   7 &   4 &   4 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis per macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by macroarea (macroarea collapses Asia and Europe). Per language.

tmp <- tonodb %>% select(Type, LanguageVariety, Area) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##                       Type Africa Asia Europe North.America Papunesia
## 1                     coda      0   15      5            13         0
## 2            coda, nucleus      0    0      0             2         0
## 3              coda, onset      0    0      0             1         1
## 4     coda, syllable-count      0    3      1             0         0
## 5                  nucleus      4    2      1             2         1
## 6            nucleus, coda      0    0      0             0         0
## 7           nucleus, onset      0    0      0             0         1
## 8                    onset      5   28      0             0         4
## 9             onset, other      0    1      0             0         0
## 10                   other      1    0      1             2         0
## 11                  stress      0    1      4             2         1
## 12          syllable-count      4    0      5             3         3
## 13 syllable-count, nucleus      0    0      0             1         0
##    South.America
## 1              4
## 2              1
## 3              0
## 4              0
## 5              1
## 6              0
## 7              0
## 8              0
## 9              0
## 10             0
## 11             0
## 12             1
## 13             0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by language by macroarea"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &  15 &   5 &  13 &   0 &   4 \\ 
##   coda, nucleus &   0 &   0 &   0 &   2 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   3 &   1 &   0 &   0 &   0 \\ 
##   nucleus &   4 &   2 &   1 &   2 &   1 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   5 &  28 &   0 &   0 &   4 &   0 \\ 
##   onset, other &   0 &   1 &   0 &   0 &   0 &   0 \\ 
##   other &   1 &   0 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   1 &   4 &   2 &   1 &   0 \\ 
##   syllable-count &   4 &   0 &   5 &   3 &   3 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by language by macroarea} 
## \end{table}

Strict vs broad cases of tonogenesis by class by language by macroarea.

tmp <- tonodb %>% select(Type, Ordering, Macroarea) %>% distinct()
t <- data.frame(unclass(table(tmp$Type, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##                       Type Africa Eurasia North.America Papunesia South.America
## 1                     coda      0       5             2         0             1
## 2            coda, nucleus      0       0             1         0             1
## 3              coda, onset      0       0             1         1             0
## 4     coda, syllable-count      0       2             0         0             0
## 5                  nucleus      2       2             2         1             1
## 6            nucleus, coda      0       0             0         0             0
## 7           nucleus, onset      0       0             0         1             0
## 8                    onset      2       5             0         2             0
## 9             onset, other      0       1             0         0             0
## 10                   other      1       1             2         0             0
## 11                  stress      0       3             2         1             0
## 12          syllable-count      3       3             3         2             1
## 13 syllable-count, nucleus      0       0             1         0             0
print(xtable(t, type = "latex", caption="Strict vs broad cases of tonogenesis by class by language by macroarea"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## coda &   0 &   5 &   2 &   0 &   1 \\ 
##   coda, nucleus &   0 &   0 &   1 &   0 &   1 \\ 
##   coda, onset &   0 &   0 &   1 &   1 &   0 \\ 
##   coda, syllable-count &   0 &   2 &   0 &   0 &   0 \\ 
##   nucleus &   2 &   2 &   2 &   1 &   1 \\ 
##   nucleus, coda &   0 &   0 &   0 &   0 &   0 \\ 
##   nucleus, onset &   0 &   0 &   0 &   1 &   0 \\ 
##   onset &   2 &   5 &   0 &   2 &   0 \\ 
##   onset, other &   0 &   1 &   0 &   0 &   0 \\ 
##   other &   1 &   1 &   2 &   0 &   0 \\ 
##   stress &   0 &   3 &   2 &   1 &   0 \\ 
##   syllable-count &   3 &   3 &   3 &   2 &   1 \\ 
##   syllable-count, nucleus &   0 &   0 &   1 &   0 &   0 \\ 
##    \hline
## \end{tabular}
## \caption{Strict vs broad cases of tonogenesis by class by language by macroarea} 
## \end{table}

Type by rows by area.

tmp <- tonodb %>% select(Ordering, Area)
t <- data.frame(unclass(table(tmp$Ordering, tmp$Area))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##              Type Africa Asia Europe North.America Papunesia South.America
## 1           Broad      5    4      4             0         3             4
## 2   Broad - split     13  102      1             0         0             0
## 3 Possibly Strict      0   16      1            18         7             0
## 4          Strict      0   15      0             2         0             0
## 5         Unclear      3   20     16            13         6             4
print(xtable(t, type = "latex", caption="Type by rows by area"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrrr}
##   \hline
## Type & Africa & Asia & Europe & North.America & Papunesia & South.America \\ 
##   \hline
## Broad &   5 &   4 &   4 &   0 &   3 &   4 \\ 
##   Broad - split &  13 & 102 &   1 &   0 &   0 &   0 \\ 
##   Possibly Strict &   0 &  16 &   1 &  18 &   7 &   0 \\ 
##   Strict &   0 &  15 &   0 &   2 &   0 &   0 \\ 
##   Unclear &   3 &  20 &  16 &  13 &   6 &   4 \\ 
##    \hline
## \end{tabular}
## \caption{Type by rows by area} 
## \end{table}

Type by distinct languages.

tmp <- tonodb %>% select(Language_ID, Ordering, Macroarea) %>% distinct()
t <- data.frame(unclass(table(tmp$Ordering, tmp$Macroarea))) %>% rownames_to_column() 
t <- t %>% rename(Type = rowname)
t
##              Type Africa Eurasia North.America Papunesia South.America
## 1           Broad      5       6             0         2             4
## 2   Broad - split      6      17             0         0             0
## 3 Possibly Strict      0       8            11         4             0
## 4          Strict      0       6             1         0             0
## 5         Unclear      2      19             9         4             2
print(xtable(t, type = "latex", caption="Type by distinct languages"), include.rownames=FALSE)
## % latex table generated in R 4.3.2 by xtable 1.8-4 package
## % Sat Apr 26 08:51:45 2025
## \begin{table}[ht]
## \centering
## \begin{tabular}{lrrrrr}
##   \hline
## Type & Africa & Eurasia & North.America & Papunesia & South.America \\ 
##   \hline
## Broad &   5 &   6 &   0 &   2 &   4 \\ 
##   Broad - split &   6 &  17 &   0 &   0 &   0 \\ 
##   Possibly Strict &   0 &   8 &  11 &   4 &   0 \\ 
##   Strict &   0 &   6 &   1 &   0 &   0 \\ 
##   Unclear &   2 &  19 &   9 &   4 &   2 \\ 
##    \hline
## \end{tabular}
## \caption{Type by distinct languages} 
## \end{table}

About

Tonogenesis database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages