Researchers from University of California, Berkeley and the University of British Columbia have created a computer program that can rapidly reconstruct “proto-languages” — the linguistic ancestors from which all modern languages have evolved. The computer scientists have reconstructed ancient Proto-Austronesian,
which gave rise to languages spoken in Polynesia, among other places.
These earliest-known languages include Proto-Indo-European,
Proto-Afroasiatic and, in this case, Proto-Austronesian, which gave rise
to languages spoken in Southeast Asia, parts of continental Asia,
Australasia and the Pacific. The researchers plan to use the same
computational model to reconstruct indigenous North American
proto-languages.
Ancient languages
hold a treasure trove of information about the culture, politics and
commerce of millennia past. Yet, reconstructing them to reveal clues
into human history can require decades of painstaking work. Humans’
earliest written records date back less than 6,000 years, long after the
advent of many proto-languages.
While archeologists can catch direct glimpses of ancient languages in
written form, linguists typically use what is known as the “comparative
method” to probe the past. This method establishes relationships
between languages, identifying sounds that change with regularity over
time to determine whether they share a common mother language.
Using an algorithm known as the Markov chain Monte Carlo
sampler, the program sorted through sets of cognates, words in
different languages that share a common sound, history and origin, to
calculate the odds of which set is derived from which proto-language. At
each step, it stored a hypothesized reconstruction for each cognate and
each ancestral language. The algorithm for ancestral word form
reconstruction is based on a fundamental Bayesian decision theoretic
concept called Bayes estimators.
”What excites me about this system is that it takes so many of the
great ideas that linguists have had about historical reconstruction, and
it automates them at a new scale: more data, more words, more
languages, but less time,” said Dan Klein, an associate professor of
computer science at UC Berkeley and co-author of the paper published
online in the journal Proceedings of the National Academy of Sciences.
The research team’s computational model uses probabilistic reasoning —
which explores logic and statistics to predict an outcome — to
reconstruct more than 600 Proto-Austronesian languages from an existing database of more than 140,000 words, replicating with 85 percent accuracy what linguists had done manually.
While manual reconstruction is a meticulous process that can take
years, this system can perform a large-scale reconstruction in a matter
of days or even hours, researchers said. Not only will this program
speed up the ability of linguists to rebuild the world’s proto-languages
on a large scale, boosting our understanding of ancient civilizations
based on their vocabularies, but it can also provide clues to how
languages might change years from now.
“Our statistical model can be used to answer scientific questions
about languages over time, not only to make inferences about the past,
but also to extrapolate how language might change in the future,” said
Tom Griffiths, associate professor of psychology, director of UC
Berkeley’s Computational Cognitive Science Lab and another co-author of
the paper.
“To understand how language changes — which sounds are more likely to
change and what they will become — requires reconstructing and
analyzing massive amounts of ancestral word forms, which is where
automatic reconstructions play an important role,” said Alexandre
Bouchard-Côté, an assistant professor of statistics at the University of
British Columbia and lead author of the study, which he started while a
graduate student at UC Berkeley.
The UC Berkeley computational model is based on the established
linguistic theory that words evolve along the branches of a family tree —
much like a genealogical tree— reflecting linguistic relationships that
evolve over time, with the roots and nodes representing proto-languages
and the leaves representing modern languages.
“Because the sound changes and reconstructions are closely linked,
our system uses them to repeatedly improve each other,” Klein said. “It
first fixes its predicted sound changes and deduces better
reconstructions of the ancient forms. It then fixes the reconstructions
and re-analyzes the sound changes. These steps are repeated, and both
predictions gradually improve as the underlying structure emerges over
time.”
A. Bouchard-Cote et al., Automated reconstruction of ancient
languages using probabilistic models of sound change, Proceedings of the
National Academy of Sciences, 2013, DOI: 10.1073/pnas.1204678110
Image credit: courtesy of National Geographic
Source: The Daily Galaxy via University of California – Berkeley and National Academy of Sciences
No hay comentarios:
Publicar un comentario