The Language Processing Group is a research group in the Department of Language Science at the University of California, Irvine.

We study language processing in humans and machines. Our goal is to understand how human language works using computational modeling, machine learning, large datasets, behavioral experiments, and information theory.

People

Principal Investigator

Richard Futrell

Assistant Professor of Language Science, UC Irvine

Language, Information theory, Machine learning

Lab Members

Charles Torres

PhD Student, UC Irvine

cultural evolution, semantic representation, cognitive modeling

Huteng Dai

PhD Student, Rutgers University

Theoretical phonology, Computational and experimental methods, Formal Language Theory

Jiaxuan Li

PhD Student, UC Irvine

Sentence Processing, Semantic Representation, ERPs, Cognitive Modeling

Michael Hahn

PhD Student, Stanford University

Human language processing, Natural language processing, How cognition shapes language

Niels Dickson

PhD Student, UC Irvine

language acquisition, cognitive constraints and efficient communication, computational models

Shiva Upadhye

PhD Student, UC Irvine

Language Production and Comprehension, Semantic Representation, Cognitive Modeling

Weijie Xu

PhD Student, UC Irvine

Language Processing, Language Change, Crosslinguistic variation

Zeinab Kachakeche

PhD Student, UC Irvine

language efficiency, adjective order

Collaborating Students

Ethan Wilcox

PhD Student, Harvard University

Natural Language Processing, Linguistic Theory, Psycholinguistics

Himanshu Yadav

PhD Student, University of Potsdam

Quantitative structural complexity of languages, Individual difference in sentence processing

Neil Rathi

Senior, Palo Alto High School

linguistic typology, computational psycholinguistics, cognitive pressures on natural language

Peng Qian

PhD Student, MIT

Cognitive basis of language, Relevance of linguistic knowledge in learning, reasoning, and judgment

Sihan Chen

PhD Student, MIT

sentence processing, communicative efficiency of language, environmental pressure on language

Alums

Zachary Rosen

PhD Student, UC Irvine

metaphor comprehension, distributed semantics models

Selected Publications

See all publications

Neil Rathi, Michael Hahn, Richard Futrell

November 2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

An Information-Theoretic Characterization of Morphological Fusion

Linguistic typology generally divides synthetic languages into groups based on their morphological fusion. However, this measure has long been thought to be best considered a matter of degree. We present an information-theoretic measure, called informational fusion, to quantify the degree of fusion of a given set of morphological features in a surface form, which naturally provides such a graded scale. Informational fusion is able to encapsulate not only concatenative, but also nonconcatenative morphological systems (e.g. Arabic), abstracting away from any notions of morpheme segmentation. We then show, on a sample of twenty-one languages, that our measure recapitulates the usual linguistic classifications for concatenative systems, and provides new measures for nonconcatenative ones. We also evaluate the long-standing hypotheses that more frequent forms are more fusional, and that paradigm size anticorrelates with degree of fusion. We do not find evidence for the idea that languages have characteristic levels of fusion; rather, the degree of fusion varies across part-of-speech within languages.

DOI URL

Michael Hahn, Judith Degen, Richard Futrell

January 2021 Psychological Review

Modeling word and morpheme order in natural language as an efficient tradeoff of memory and surprisal

Memory limitations are known to constrain language comprehension and production, and have been argued to account for crosslinguistic word order regularities. However, a systematic assessment of the role of memory limitations in language structure has proven elusive, in part because it is hard to extract precise large-scale quantitative generalizations about language from existing mechanistic models of memory use in sentence processing. We provide an architecture-independent information-theoretic formalization of memory limitations which enables a simple calculation of the memory efficiency of languages. Our notion of memory efficiency is based on the idea of a memory–surprisal tradeoff: a certain level of average surprisal per word can only be achieved at the cost of storing some amount of information about past context. Based on this notion of memory usage, we advance the Efficient Tradeoff Hypothesis: the order of elements in natural language is under pressure to enable favorable memory-surprisal tradeoffs. We derive that languages enable more efficient tradeoffs when they exhibit information locality: when predictive information about an element is concentrated in its recent past. We provide empirical evidence from three test domains in support of the Efficient Tradeoff Hypothesis: a reanalysis of a miniature artificial language learning experiment, a large-scale study of word order in corpora of 54 languages, and an analysis of morpheme order in two agglutinative languages. These results suggest that principles of order in natural language can be explained via highly generic cognitively motivated principles and lend support to efficiency-based models of the structure of human language.

PDF DOI