the computation and language lab
at the University of California, Berkeley



research

overview of our work and approach


people

information for current and prospective members


papers

our publications are freely available


resources

code, libraries, and data sets



We study the basic computational processes involved in human language and cognition.








overview: colala's research includes computational and experimental approaches to language acquistion, language processing, and understanding the core features of language. Our work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics.

We collaborate closely with Tedlab at MIT, The CAOS lab at CMU, The Kidd Lab at UC Berkeley, and The Hayden Lab at the University of Minnesota.

We do fieldwork with an indigenous South American population, the Tsimane' of Bolivia, in collaboration with tedlab and CBIDSI to study universal aspects of human language, numerical cognition, and cognitive development.



philosophy: Most research in colala centers on creating fully-formalized (implemented) theories that aim to address basic questions about human language and cognition with convergent evidence across methods. We work to develop open tools, data, and result, and build an atmosphere that is inclusive, scientifically rigorous, and productive.



topics and selected papers:


facilities: Our experimental facilities include rooms for testing adults and children. Our computational facilities include dedicated HPC cores for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• prospective students •


graduate students: Prospective graduate students should apply to the Department of Psychology at UC Berkeley and contact Steve. Applications are due in in late November. Graduate students should be hard-working, full of ideas, skilled programmers, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

undergraduates: Undergraduates must be independent, creative, self-motivated, and good at programming. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in psychology, math, computer science, or linguistics. Apply here.

• graduate students •


Mark Gorenstein works on concepts, language, and the brain

Umesh Singla is a first year in the lab

Dyana Muller works on concepts and neural computation.

Karthikeya Kaushik works on language and concepts.

• staff •


Holly Palmeri is the primary lab manager.

Gabriella Morales works on probability and inference with children.

Sarah Sommer works on kid number and algorithms.

• postdoctoral researchers •


Emily Sanford works on numerical cognition.

Vijay Marupudi works on number and algorithms.

• principal investigator •

The lab's PI is Steve Piantadosi. He is interested in understanding the computational mechanisms supporting human language learning and use.

•alums•





Isabelle Boni is a postdoc with ManyNumbers

Benjamin Pitt is a faculty member at UMass Amherst.

Michelle Hurst is a faculty member at Rutgers.

David O'Shaughnessy work at the Center for Social Impact

Shengyi Wu is a graduate student at MIT

Margaret Bryer is faculty at the University of Wisconsin-Madison

Sam Cheyette is a postdoc at MIT

Yim Register is a PhD student at UW

Eric Bigelow is a PhD student at Harvard

Frank Mollica is a professor at The University of Melbourne

Tania Cruz is a PhD student at Delaware

Charlene Gallardo is a graduate student at Cal State LA

• collaborators •

Researchers in the lab work in close collaboration with a number of labs around the country:





Josh Tenenbaum

Ted Gibson

Jessica Cantlon

Ben Hayden

Celeste Kidd

Dick Aslin

Ev Fedorenko

Dan Everett

Noah Goodman

Richard Futrell

Julian Jara-Ettinger

Julien Musolino

Ricardo Godoy

Kyle Mahowald

Isabelle Dautriche

Stephen Ferrigno


modeling language information theory psycholinguistics
concepts number development neuroscience cross cultural

2025
[126] The End of Radical Concept Nativism ( and ), In , . [bibtex] [url]
2024
[125] Reliable Reasoning Beyond Natural Language ( and ), In arXiv, . [bibtex] [url]
[124] Uniquely human intelligence arose from expanded information capacity ( and ), In Nature Reviews Psychology, . [bibtex] [pdf]
[123] Limited information-processing capacity in vision explains number psychophysics (, and ), In Psychological Review, . [bibtex] [pdf]
[122] Response to difficulty drives variation in IQ test performance ( and ), In Open Mind, volume 8, . [bibtex] [url]
[121] Language is primarily a tool for communication rather than thought (, and ), In Nature, . [bibtex] [pdf]
[120] Continuous and Discrete Proportions Elicit Different Cognitive Strategies ( and ), In Cognition, . [bibtex] [pdf]
[119] Formalising the role of behaviour in neuroscience ( and ), In European Journal of Neuroscience, . [bibtex] [url]
[118] Why concepts are (probably) vectors (, , , , , and ), In Trends in Cognitive Sciences, . [bibtex] [pdf]
[117] Modern language models refute Chomsky's approach to language (), Chapter in From fieldwork to linguistic theory: A tribute to Dan Everett (Empirically Oriented Theoretical Morphology and Syntax 15) (Edward Gibson, Moshe Poliak, eds.), Berlin: Language Science Press, . [bibtex] [url]
[116] Symbolic metaprogram search improves learning efficiency and explains rule learning in humans (, , , , and ), In Nature Communications, . [bibtex] [pdf]
2023
[115] Language processing and language learning (, and ), Chapter in Bayesian Models of Cognition: Reverse Engineering the Mind (Thomas L. Griffiths, Nick Chater, Joshua Tenenbaum, eds.), . [bibtex] [pdf]
[114] Origins of Hierarchical Logical Reasoning (, , and ), In Cognitive Science, volume 47, . [bibtex] [pdf]
[113] Cognitive Mechanisms Underlying Recursive Pattern Processing in Human Adults (, and ), In Cognitive Science, volume 47, . [bibtex] [url]
[112] The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing (, and ), In Open Mind, volume 7, . [bibtex] [url]
[111] Latent diversity in conceptual representation (, , and ), In Open Mind, volume 7, . [bibtex] [url]
[110] Diverse mathematical knowledge among indigenous Amazonians (, , , , , and ), In Proceedings of the National Academy of Sciences, volume 120, . [bibtex] [url]
[109] Trans-inclusive gender categories are cognitively natural (, and ), In Nature Human Behavior, volume 7, . [bibtex] [pdf]
[108] The Algorithmic Origins of Counting (), In Child Development, volume 94, . [bibtex] [pdf]
[107] How to enumerate trees from a context-free grammar (), In arXiv, . [bibtex] [url]
[106] Learning as Bayesian inference over programs (, and ), Chapter in Bayesian Models of Cognition: Reverse Engineering the Mind (Thomas L. Griffiths, Nick Chater, Joshua Tenenbaum, eds.), . [bibtex] [pdf]
[105] No clear evidence for a left-to-right mental number line in insects (, and ), In Proceedings of the National Academy of Sciences (commentary), . [bibtex] [url]
[104] Real-time pragmatic inference across cultures: evidence from a non-industrialized society (, , and ), In Journal of Experimental Psychology: General, volume 152, . [bibtex] [url]
[103] Sampling in Approximate Number Perception ( and ), In Proceedings of the Annual Meeting of the Cognitive Science Society, . [bibtex] [pdf]
[102] Children’s Estimation of Peripheral Information Drives Improvements in Approximate Number Sense (, and ), In Proceedings of the Annual Meeting of the Cognitive Science Society, . [bibtex] [pdf]
2022
[101] Culture and Commutativity ( and ), In Proceedings of the Annual Meeting of the Cognitive Science Society, . [bibtex] [pdf]
[100] Verbal counting and the timing of number acquisition in an indigenous Amazonian group (, , and ), In PLOS ONE, . [bibtex] [url]
[99] Stochastic time-series analyses highlight the day-to-day dynamics of lexical frequencies ( and ), In Cognitive Science, volume 46, . [bibtex] [pdf]
[98] Investigating Adults’ Strategy Use During Proportional Comparison ( and ), In Proceedings of the Annual Meeting of the Cognitive Science Society, . [bibtex] [pdf]
[97] Reply to Kodner et al: Fundamental misunderstanding of both model and methods ( and ), In Proceedings of the National Academy of Sciences (response to commentary), . [bibtex] [url]
[96] Meaning without reference in large language models ( and ), In arXiv preprint arXiv:2208.02957, . [bibtex] [pdf]
[95] Reply to Murphy et al: Program induction can learn language ( and ), In Proceedings of the National Academy of Sciences (response to commentary), . [bibtex] [url]
[94] Exact number concepts are limited to the verbal count range (, and ), In Psychological Science, volume 33, . [bibtex] [pdf]
[93] Different reference frames on different axes: Space and language in indigenous Amazonians (, , , and ), In Science Advances, volume 8, . [bibtex] [pdf]
[92]Learning as programming: Efficient search in models of human concept learning (, and ), In Proceedings of the Cognitive Science Society, . [bibtex]
[91] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (, , , , , , , , , and ), In arXiv preprint arXiv:2206.04615, . [bibtex] [url]
[90] One model for the learning of language ( and ), In Proceedings of the National Academy of Sciences, volume 119, . [bibtex] [pdf]
2021
[89] The evolution of quantitative sensitivity (, , , , , , , , , , , , , , , and ), In Philosophical Transactions of the Royal Society B, volume 377, . [bibtex] [pdf]
[88] The psychophysics of number arise from resource-limited spatial memory (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[87] The Natural Stories Corpus (, , , , , and ), In Language Resources and Evaluation, volume 55, . [bibtex] [url]
[86] Logical Word Learning: The case of kinship ( and ), In Psychonomic Bulletin & Review, . [bibtex] [pdf]
[85] The Cultural Origins of Symbolic Number (, and ), In Psychological Review, volume 129, . [bibtex] [pdf]
[84]Probability, Belief, and the Richness of Cognition (), Chapter in The Cognitive Science of Belief (Julien Musolino, Joseph Sommer, Pernille Hemmer, eds.), Cambridge University Press, . [bibtex]
[83] The computational origin of representation (), In Minds and Machines, volume 31, . [bibtex] [pdf]
[82] Spatial concepts of number, size, and time in an indigenous culture (, , , , and ), In Science Advances, volume 7, . [bibtex] [url]
[81] Variation in spatial concepts: Different frames of reference on different axes (, , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[80] Uncontrolled corpus composition drives an apparent surge in cognitive distortions (Commentary on Bollen et al.) (, and ), In Proceedings of the National Academy of Sciences (commentary), . [bibtex] [url]
2020
[79] A unified account of numerosity perception ( and ), In Nature Human Behavior, volume 4, . [bibtex] [pdf]
[78] Recursive sequence generation in monkeys, children, US adults, and native Amazonians (, , and ), In Science Advances, volume 6, . [bibtex] [url]
[77] Simple models of sequential processing cannot explain center-embedded generalizations (, , , and ), In Science Advances eLetters, . [bibtex] [url]
[76] A model of temporal connective acquisition (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[75] People Infer Recursive Visual Concepts from Just a Few Examples ( and ), In Computational Brain and Behavior, volume 3, . [bibtex] [pdf]
[74] Composition is the core driver of the language-selective network (, , , , , , , and ), In Neurobiology of Language, volume 1, . [bibtex] [url]
[73] Multi-directional mappings in the minds of the Tsimane': Size, time, and number on three spatial axes (, , , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[72] The Child as Hacker (, and ), In Trends in Cognitive Science, volume 24, . [bibtex] [pdf]
[71] The neural basis of predictive pursuit (, , and ), In Nature Neuroscience, volume 23, . [bibtex] [url]
2019
[70] Intrinsic whole number bias in an indigenous population (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[69] A primarily serial, foveal accumulator underlies approximate numerical estimation ( and ), In Proceedings of the National Academy of Sciences, volume 116, . [bibtex] [pdf]
[68] Why we should abandon the Semantic Subset Principle (, and ), In Language Learning and Development, volume 15, . [bibtex] [pdf]
[67] How Efficiency Shapes Human Language (, , , , , and ), In Trends in Cognitive Science, volume 23, . [bibtex] [pdf]
[66] One-to-one correspondence without Language (, , and ), In Royal Society Open Science, . [bibtex] [url]
[65] Humans store about 1.5 megabytes of information during language acquisition ( and ), In Royal Society Open Science, . [bibtex] [url]
2018
[64] A threshold free model of number comparison (, and ), In PLOS ONE, . [bibtex] [pdf] [doi]
[63] Intrinsic whole number bias in humans (, , and ), In Journal of Experimental Psychology: Human Perception and Performance, volume 44, . [bibtex] [pdf]
[62] Robust mixture modeling reveals category-free selectivity in reward region neuronal ensembles (, and ), In Journal of Neurophysiology, volume 119, . [bibtex] [pdf]
[61] Birth season and height among girls and boys below 12 years of age: Lasting effects and catch-up growth among native Amazonians in Bolivia (, , , , , , , , , , and ), In Annals of Human Biology, volume 45, . [bibtex] [pdf]
[60] Word forms are structured for efficient use (, , and ), In Cognitive Science, volume 42, . [bibtex] [pdf]
[59] Certainty is Primarily Determined by Past Performance during Concept Learning (, , and ), In Open Mind, volume 2, . [bibtex] [url]
[58] Adults use gradient similarity information in compositional rules (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[57] Limits on Composition of Conceptual Operations in 9-Month-Olds (, and ), In Infancy, volume 23, . [bibtex] [pdf]
[56] One parameter is always enough (), In AIP Advances, volume 8, . [bibtex] [url]
[55] Learning list concepts through program induction (, , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[54] Child stunting is associated with weaker human capital among native Amazonians (, , , , , , , , and ), In American Journal of Human Biology, volume 30, . [bibtex] [pdf]
2017
[53] Knowledge transfer in a probabilistic Language of Thought ( and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[52] Wordform similarity increases with semantic similarity: an analysis of 100 languages (, , and ), In Cognitive Science, volume 41, . [bibtex] [pdf]
[51] Words cluster phonetically beyond phonotactic regularities (, , , and ), In Cognition, volume 163, . [bibtex] [pdf]
[50] Universal and uniquely human factors in spontaneous number perception (, , and ), In Nature Communications, volume 8, . [bibtex] [url]
[49] Color naming across languages reflects color use (, , , , , , , and ), In Proceedings of the National Academy of Sciences, National Acad Sciences, volume 114, . [bibtex] [pdf]
[48] How data drives early word learning: A cross-linguistic waiting time analysis ( and ), In Open Mind, volume 1, . [bibtex] [url]
[47] An incremental information-theoretic buffer supports sentence processing ( and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[46] A Rational Constructivist Account of the Characteristic-to-Defining Shift (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[45] Learning abstract visual concepts via probabilistic program induction in a Language of Thought (, and ), In Cognition, volume 168, . [bibtex] [pdf]
[44] True Numerical Cognition in the Wild ( and ), In Psychological Science, volume 28, . [bibtex] [pdf]
2016
[43] Inferring priors in compositional cognitive models ( and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[42] A large dataset of generalization patterns in the number game ( and ), In Journal of Open Psychology Data, volume 4, . [bibtex] [url] [doi]
[41] A Corpus Investigation of Syntactic Embedding in Pirahã (, , , and ), In PLOS ONE, . [bibtex] [url]
[40] Mastery of the logic of natural numbers is not the result of mastery of counting: Evidence from late counters (, , , and ), In Developmental Science, volume 20, . [bibtex] [pdf]
[39] Native Amazonian Children Forego Egalitarianism When They Learn to Count (, , and ), In Developmental Science, volume 19, . [bibtex] [pdf] [doi]
[38] What determines human certainty? (, , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[37] A Hierarchical Probabilistic Language-of-Thought Model of Human Visual Concept Learning (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[36] Infinitely productive language can arise from chance under communicative pressure ( and ), In Journal of Language Evolution, volume 2, . [bibtex] [pdf]
[35] Compositional reasoning in early childhood ( and ), In PLOS ONE, . [bibtex] [url]
[34] Efficient estimation of Weber's W (), In Behavior Research Methods, volume 48, . [bibtex] [pdf]
[33] Endogenous or exogenous? The data don’t say (Commentary on Han, Musolino, & Lidz 2016) ( and ), In Proceedings of the National Academy of Sciences, volume 113, . [bibtex] [pdf]
[32] Extraordinary intelligence and the care of infants ( and ), In Proceedings of the National Academy of Sciences, volume 113, . [bibtex] [pdf]
[31] Four problems solved by the probabilistic Language of Thought ( and ), In Current Directions in Psychological Science, volume 25, . [bibtex] [pdf]
[30] The logical primitives of thought: Empirical foundations for compositional cognitive models (, and ), In Psychological Review, volume 123, . [bibtex] [pdf]
[29] A rational analysis of the approximate number system (), In Psychonomic Bulletin and Review, . [bibtex] [pdf] [doi]
2015
[28] Cognition in reach: continuous statistical inference in optimal motor planning (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[27] The origins of counting algorithms (, , , and ), In Psychological Science, volume 26, . [bibtex] [pdf]
[26] Inferring the Tsimane's use of color categories from recognition memory (, , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[25] The perceptual foundation of linguistic context (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[24] Towards semantically rich and recursive word learning models ( and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[23] The dynamics of idealized attention in complex learning environments (, and ), In The 5th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, . [bibtex] [pdf]
[22] Utility-free models of binomial choice can replicate predictions of utility models in many conditions ( and ), In Frontiers in Neuroscience, . [bibtex] [url]
[21] Problems in the philosophy of mathematics: A view from cognitive science (), Chapter in Mathematics, Substance and Surmise: Views on the Meaning and Ontology of Mathematics (Ernest Davis, Philip J. Davis, eds.), Springer, . [bibtex] [pdf]
[20] Response: "Commentary: Utility-free heuristic models of two-option choice can mimic predictions of utility-stage models under many conditions" ( and ), In Frontiers in Neuroscience, volume 9, . [bibtex] [url] [doi]
2014
[19] The Goldilocks Effect in Infant Auditory Attention (, and ), In Child Development, volume 85, . [bibtex] [pdf] [doi]
[18] Children's learning of number words in an indigenous farming-foraging group (, and ), In Developmental Science, volume 17, . [bibtex] [pdf] [doi]
[17] Quantitative Standards for Absolute Linguistic Universals ( and ), In Cognitive Science, volume 38, . [bibtex] [pdf] [doi]
[16] Rich analysis and rational models: Inferring individual behavior from infant looking data (, and ), In Developmental Science, volume 17, . [bibtex] [pdf]
[15] Zipf’s word frequency law in natural language: A critical review and future directions (), In Psychonomic Bulletin & Review, Springer US, volume 21, . [bibtex] [pdf] [doi]
2013
[14] Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751] (, and ), In ArXiv e-prints, . [bibtex] [url]
2012
[13] The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex (, and ), In PLoS ONE, . [bibtex] [url]
[12] Bootstrapping in a language of thought: a formal model of numerical concept learning (, and ), In Cognition, volume 123, . [bibtex] [pdf]
[11] A corpus analysis of Pirahã grammar: An investigation of recursion (, , and ), In Talk presented at the LSA (by E. Gibson)., . [bibtex] [pdf]
2011
[10] The communicative function of ambiguity in language (, and ), In Cognition, volume 122, . [bibtex] [pdf]
[9] Learning and the language of thought (), PhD thesis, MIT, . [bibtex] [pdf]
[8] Reply to Reilly and Kean: Clarifications on word length and information content (, and ), In Proceedings of the National Academy of Sciences (response to commentary), National Acad Sciences, volume 108, . [bibtex] [pdf]
[7] Word lengths are optimized for efficient communication (, and ), In Proceedings of the National Academy of Sciences, National Acad Sciences, volume 108, . [bibtex] [pdf]
2010
[6] The Goldilocks Effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
[5] Beyond Boolean logic: exploring representation languages for learning complex concepts (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
2009
[4] The communicative lexicon hypothesis (, and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
2008
[3] A Bayesian model of the acquisition of compositional semantics (, , and ), In Proceedings of the Cognitive Science Society, . [bibtex] [pdf]
under review
[2]Indigenous Amazonians spontaneously use space to offload cognitive demands (, and ), In , . [bibtex]
[1]Algorithm induction in indigenous Amazonian children (, , , , and ), In , . [bibtex]

• libraries & software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib3 is a Python 3 library for modeling learning complex concepts as compositions of primitives in a language of thought. GPL3

  • Fleet is a C++ library with the same goal as LOTlib3, but much faster. This is currently developed most heavily and is used by the lab for a variety of LOT-learning projects. GPL3

  • pychuriso is a python implementation of churiso inference, which learns Church encodings from simple relational datasets. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and supportrs Tobii eyetracking. GPL3

• data & code •

Data from all projects completed and in progress is available upon request







2121 Berkeley Way
Department of Psychology
University of California, Berkeley
Berkeley, CA, 94704