the computation and language lab
at the University of California, Berkeley



research

overview of our work and approach


people

information for current and prospective members


papers

our publications are freely available


resources

code, libraries, and data sets


• about •

We study the basic computational processes involved in human language and cognition.








overview: colala's research includes computational and experimental approaches to language acquistion, language processing, and understanding the core features of language. Our work draws on corpus methods, behavioral experiments with adults and children, computational and mathematical modeling, as well as state-of-the-art techniques in machine learning, information theory, statistics, computer science, and linguistics.

We collaborate closely with Tedlab at MIT, The Kidd Lab at UC Berkeley, The CAOS lab at CMU, and The Hayden Lab at the University of Minnesota.

We do fieldwork with an indigenous South American population, the Tsimane' of Bolivia, in collaboration with tedlab and CBIDSI to study universal aspects of human language, numerical cognition, and cognitive development.



philosophy: Most research in colala centers on creating fully-formalized (implemented) theories that aim to address basic questions about human language and cognition with convergent evidence across methods. We work to develop open tools, data, and result, and build an atmosphere that is inclusive, scientifically rigorous, and productive.



topics and selected papers:


facilities: Our experimental facilities include rooms for testing adults and children. Our computational facilities include dedicated HPC cores for running models and analysis, as well as several in-house web, corpus, GPU, and HPC servers.

• open positions •


graduate students: Prospective graduate students should apply to the Department of Psychology at UC Berkeley and contact Steve. Applications are due in in late November. Graduate students should be hard-working, full of ideas, skilled programmers, and capable of self-directed research. Students interested in the lab should bring strong quantitative skills including knowledge of statistics and programming, as well as serious background in some of the following areas: machine learning, mathematics, theoretical computer science, mathematical logic, cognitive development, language acquisition, language processing, or linguistics.

undergraduates: Undergraduates must be independent, creative, self-motivated, and good at programming. Our undergraduate research opportunities include work on current large-scale projects, as well as opportunities for self-directed research. Undergraduates will typically have a background in psychology, math, computer science, or linguistics.

• graduate students •


Sam Cheyette works on number and learning

Isabelle Boni works on numerical cognition

• staff •


Holly Palmeri is the primary lab manager, joint with The Kidd Lab

Charlene Gallardo on numerical cognition

Tania Cruz works with the Tsimane'

• postdoctoral researchers •


Ben Pitt works on numerical cognition

David O'Shaughnessy works with the Tsimane'

• principal investigator •

The lab's PI is Steve Piantadosi. He is interested in understanding the computational mechanisms supporting human language learning and use.







• collaboratoring students & staff •



Josh Rule collaborates at cocosci at MIT.

Kyle Mahowald collaborates at MIT.

Seng Bum (Michael) Yoo collaborates at the Hayden Lab.

Isabelle Dautriche collaborates at the Laboratoire de Sciences Cognitive et Psycholinguistique.

Santiago Alonso collaborates at the Concepts, Actions and Objects Lab.

Stephen Ferrigno collaborates collaborates at Harvard.

Sarah Koopman collaborates at the Concepts, Actions and Objects Lab.

Shirlene Wade collaborates at the Kidd Lab.

Maddie Pelz collaborates at the Early Child Cognition Lab at MIT.

Matt Overlan collaborates at the Jacobs Lab.

Louis Marti collaborates at the Kidd Lab.

Josh Rule collaborates at cocosci at MIT.

• Alums •



Frank Mollica (University of Melbourne, Edinburgh)

Yim Register (University of Washington)

Eric Bigelow (Uber)

• collaborating labs & PIs •

Researchers in the lab work in close collaboration with a number of labs at Rochester and around the country:




Celeste Kidd
The Kidd Lab

Dick Aslin
Haskins Lab

Jessica Cantlon
The Concepts, Actions, and Objects lab

Ben Hayden
The Hayden lab at the University of Minnesota

Ted Gibson
Tedlab, the language lab at MIT

Noah Goodman
cocolab at Stanford

Ev Fedorenko
evlab at Harvard/MGH/MIT

Dan Everett

Josh Tenenbaum
cocosci at MIT

Richard Futrell

Julian Jara-Ettinger
compdevlab at Yale

Julien Musolino

Ricardo Godoy

• papers •

modeling language information theory psycholinguistics
concepts number development neuroscience cross cultural

under review
[77]J. Register, F. Mollica, S. Piantadosi, "Semantic verification is flexible and sensitive to context", under review. [email]
[76]S. T. Piantadosi, "The computational origin of representation and conceptual change", under review. [pdf]
[75]D. M. O'Shaughnessy, E. Gibson, S. T. Piantadosi, "The Cultural Origins of Number", under review. [email]
[74]F. Mollica, S. T. Piantadosi, "Logical Word Learning: The case of kinship", under review. [email]
[73]F. Mollica, M. Siegelman, E. Diachek, S. T. Piantadosi, Z. Mineroff, R. Futrell, E. Fedorenko, "High local mutual information drives the response in the human language network", under review. [pdf]
[72]S. E. Koopman, J. F. Cantlon, S. T. Piantadosi, E. L. Maclean, et al., "The evolution of quantitative sensitivity", under review. [email]
[71]I. Boni, J. Jara-Ettinger, S. Sackstein, S. T. Piantadosi, "Verbal counting and number delay in an indigenous Amazonian group", under review. [email]
2019
[70]S. B. M. Yoo, S. T. Piantadosi, B. Hayden, "Monkeys predict trajectories of virtual prey using basic variables from Newtonian physics", Nature Neuroscience, 2019. [pdf][data]
[69]F. Mollica, S. T. Piantadosi, "Humans store  1.5 megabytes during language acquisition: information theoretic bounds.", Royal Society Open Science, 2019. [pdf][ELI5] Some rough estimates show that people know about 1.5 megabytes of information about language.
[68]B. Lake, S. T. Piantadosi, "Learning recursive visual concepts from examples", Computational Brain and Behavior, 2019. [pdf]
[67]S. E. Koopman, A. Arre, S. T. Piantadosi, J. F. Cantlon, "Understanding 1-to-1 Correspondence without Language", Royal Society Open Science, 2019. [email]
[66]E. Gibson, R. Futrell, S. T. Piantadosi, I. Dautriche, K. Mahowald, L. Bergen, R. Levy, "How Efficiency Shapes Human Language", Trends in Cognitive Science, 2019. [pdf][ELI5] Pressures for efficient usage can be found in the properties of human language.
[65]S. J. Cheyette, S. T. Piantadosi, "A primarily serial, foveal accumulator underlies approximate numerical estimation", Proceedings of the National Academy of Sciences, 2019. [pdf]
2018
[64]E. Undurraga, J. Behrman, S. Emmet, C. Kidd, W. Leonard, S. T. Piantadosi, V. Reyes-Garcia, A. Sharma, R. Zhang, R. Godoy, "Child stunting is associated with weaker human capital among native Amazonians", American Journal of Human Biology, vol. 30, 2018. [pdf]
[63]J. Rule, E. Schulz, S. T. Piantadosi, J. B. Tenenbaum, "Learning list concepts through program induction", in Proceedings of the Cognitive Science Society, 2018. [pdf]
[62]S. T. Piantadosi, "One parameter is always enough", AIP Advances, vol. 8, 2018. [pdf][ELI5] Shows how a single real number free parameter can fit any scatter plot.
[61]S. T. Piantadosi, H. Palmeri, R. Aslin, "Limits on Composition of Conceptual Operations in 9-Month-Olds", Infancy, vol. 23, 2018, pp. 310-324. [pdf][ELI5] Infants don't seem to correctly predict what happens when you put together two novel functions.
[60]L. A. Oey, F. Mollica, S. T. Piantadosi, "Adults use gradient similarity information in compositional rules", in Proceedings of the Cognitive Science Society, 2018. [pdf][ELI5] In learnings tasks, people combine rules with continuous variables to form concepts (but our theories don't do this).
[59]K. Mahowald, I. Dautriche, E. Gibson, S. T. Piantadosi, "Word forms are structured for efficient use", Cognitive Science, 2018. [pdf][data][ELI5] Words that people use the most tend to have the easiest forms.
[58]M. Brabec, J. Behrman, S. Emmett, E. Gibson, C. Kidd, W. Leonard, M. Penny, S. T. Piantadosi, A. Sharma, S. Tanner, E. Undurraga, R. Godoy, "Birth season and height among girls and boys below 12 years of age: Lasting effects and catch-up growth among native Amazonians in Bolivia", Annals of Human Biology, vol. 45, 2018, pp. 299-313. [pdf]
[57]J. Musolino, K. L. d'Agostino, S. T. Piantadosi, "Why we should abandon the Semantic Subset Principle", Language Learning and Development, 2018. [pdf][ELI5] Children don't learn language by starting with the most specific possible meaning.
[56]T. Blanchard, S. T. Piantadosi, B. Hayden, "Robust mixture modeling reveals category-free selectivity in reward region neuronal ensembles", Journal of Neurophysiology, vol. 119, 2018, pp. 1305-1318. [pdf]
[55]S. Alonso-Diaz, S. T. Piantadosi, B. Hayden, J. F. Cantlon, "Intrinsic whole number bias in humans", Journal of Experimental Psychology: Human Perception and Performance, vol. 44, no. 9, 2018, pp. 1472-1481. [pdf]
[54]S. Alonso-Diaz, J. F. Cantlon, S. T. Piantadosi, "A threshold free model of number comparison", PLOS ONE, 2018. [pdf][ELI5] Hand movements to the larger of two numbers reflect show that people continuously update their certainty throughout movement.
2017
[53]S. T. Piantadosi, J. Cantlon, "True Numerical Cognition in the Wild", Psychological Science, vol. 28, 2017, pp. 462-469. [pdf][ELI5] Wild primates make decisions about where to go according to number systems we can study in the lab and in people.
[52]M. C. Overlan, R. A. Jacobs, S. T. Piantadosi, "Learning abstract visual concepts via probabilistic program induction in a Language of Thought", Cognition, vol. 168, 2017, pp. 320-334. [pdf]
[51]F. Mollica, S. Wade, S. T. Piantadosi, "A Rational Constructivist Account of the Characteristic-to-Defining Shift", in Proceedings of the Cognitive Science Society, 2017. [pdf]
[50]F. Mollica, S. T. Piantadosi, "An incremental information-theoretic buffer supports sentence processing", in Proceedings of the Cognitive Science Society, 2017. [pdf][ELI5] Reading time patterns suggest that people store linguistic information in a FIFO buffer.
[49]F. Mollica, S. T. Piantadosi, "How data drives early word learning: A cross-linguistic waiting time analysis", Open Mind, 2017. [pdf][ELI5] The distribution of ages at which children learn early words suggests that they need about 10-20 informative examples of each.
[48]L. Martí, F. Mollica, S. T. Piantadosi, C. Kidd, "Certainty is Primarily Determined by Past Performance during Concept Learning", Open Mind, 2017. [pdf][ELI5] People's confidence that they have learned something correctly is primarily based on how well they seem themselves doing, not on an idealized model of how certain they should be.
[47]E. Gibson, R. Futrell, J. Jara-Ettinger, K. Mahowald, L. Bergen, S. Ratnasingam, M. Gibson, S. T. Piantadosi, B. R. Conway, "Color naming across languages reflects color use", Proceedings of the National Academy of Sciences, vol. 114, no. 40, 2017, pp. 10785-10790. [pdf]
[46]R. Futrell, E. Gibson, H. Tily, I. Blank, A. Vishnevetsky, S. T. Piantadosi, E. Fedorenko, "The Natural Stories Corpus", arXiv, 2017. [pdf]
[45]S. Ferrigno, J. Jara-Ettinger, S. T. Piantadosi, J. Cantlon, "Universal and uniquely human factors in spontaneous number perception", Nature Communications, vol. 8, 2017. [pdf][ELI5] Indigneous adults, US adults, US kids, and primates all prefer to generalize according to number, instead of correlated dimensions like area.
[44]I. Dautriche, K. Mahowald, E. Gibson, A. Christophe, S. T. Piantadosi, "Words cluster phonetically beyond phonotactic regularities", Cognition, 2017. [pdf][ELI5] Words in language are more similar to each other than you would expect by chance.
[43]S. J. Cheyette, S. T. Piantadosi, "Knowledge transfer in a probabilistic Language of Thought", in Proceedings of the Cognitive Science Society, 2017. [pdf]
2016
[42]S. T. Piantadosi, "A rational analysis of the approximate number system", Psychonomic Bulletin and Review, 2016, pp. 1-10. [pdf][ELI5] [doi] The properties of the Approximate Number System can be explained by thinking about how to represent numbers with a low probability of confusion, relative to how often you have to represent them.
[41]S. T. Piantadosi, J. Tenenbaum, N. Goodman, "The logical primitives of thought: Empirical foundations for compositional cognitive models", Psychological Review, vol. 123, 2016, pp. 392-424. [pdf][ELI5] By modeling learning, we can infer what operations people are likely to put together to form complex concepts.
[40]S. T. Piantadosi, R. Jacobs, "Four problems solved by the probabilistic Language of Thought", Current Directions in Psychological Science, vol. 25, 2016, pp. 54-59. [pdf]
[39]S. T. Piantadosi, C. Kidd, "Extraordinary intelligence and the care of infants", Proceedings of the National Academy of Sciences, vol. 113, no. 25, 2016. [pdf][ELI5] Human-like intelligence may have evolved because helpless kids require smart parents, smart parents require big brains, and big brains lead to even more helpless kids.
[38]S. T. Piantadosi, C. Kidd, "Endogenous or exogenous? The data don’t say (Commentary on Han, Musolino, & Lidz 2016)", Proceedings of the National Academy of Sciences, vol. 113, no. 20, 2016. [pdf]
[37]S. T. Piantadosi, "Efficient estimation of Weber's W", Behavior Research Methods, vol. 48, 2016, pp. 42-52. [pdf]
[36]S. T. Piantadosi, R. Aslin, "Compositional reasoning in early childhood", PLOS ONE, 2016. [pdf][ELI5] Children can predict what happens when you put together two novel functions.
[35]S. T. Piantadosi, E. Fedorenko, "Infinitely productive language can arise from chance under communicative pressure", Journal of Language Evolution, vol. 2, 2016, pp. 141-147. [pdf][ELI5] Having brains that are complex won't tend to give you languages that generate a lot of sentences, but needing to communicate many meanings will.
[34]M. C. Overlan, R. A. Jacobs, S. T. Piantadosi, "A Hierarchical Probabilistic Language-of-Thought Model of Human Visual Concept Learning", in Proceedings of the Cognitive Science Society, 2016. [pdf]
[33]L. Martí, F. Mollica, S. T. Piantadosi, C. Kidd, "What determines human certainty?", in Proceedings of the Cognitive Science Society, 2016. [pdf]
[32]J. Jara-Ettinger, S. T. Piantadosi, E. Spelke, R. Levy, E. Gibson, "Mastery of the logic of natural numbers is not the result of mastery of counting: Evidence from late counters", Developmental Science, vol. 20, 2016, pp. e12459. [pdf][ELI5] Children's learning of counting is not closely tied to their understanding of quantity more abstractly.
[31]R. Futrell, L. Stearns, D. L. Everett, S. T. Piantadosi, E. Gibson, "A Corpus Investigation of Syntactic Embedding in Pirahã", PLOS ONE, 2016. [pdf][data][ELI5] There is no clear evidence for recursive structures in Piraha
[30]I. Dautriche, K. Mahowald, E. Gibson, S. T. Piantadosi, "Wordform similarity increases with semantic similarity: an analysis of 100 languages", Cognitive Science, 2016. [pdf][ELI5] Words that sound similar tend to have similar meanings.
[29]E. J. Bigelow, S. T. Piantadosi, "A large dataset of generalization patterns in the number game", Journal of Open Psychology Data, vol. 4, 2016. [pdf][data] [doi]
[28]E. J. Bigelow, S. T. Piantadosi, "Inferring priors in compositional cognitive models", in Proceedings of the Cognitive Science Society, 2016. [pdf]
2015
[27]S. T. Piantadosi, B. Hayden, "Response: ``Commentary: Utility-free heuristic models of two-option choice can mimic predictions of utility-stage models under many conditions''", Frontiers in Neuroscience, vol. 9, no. 299, 2015. [pdf] [doi]
[26]S. T. Piantadosi, "Problems in the philosophy of mathematics: A view from cognitive science", in Mathematics, Substance and Surmise: Views on the Meaning and Ontology of Mathematics, E. Davis, P. J. Davis, Eds., Springer, 2015. [pdf][ELI5] Philosophy is mainly just people failing to see that their own concepts are really imprecise.
[25]S. T. Piantadosi, B. Hayden, "Utility-free models of binomial choice can replicate predictions of utility models in many conditions", Frontiers in Neuroscience, 2015. [pdf][ELI5] You can make decisions which are equivalent to expected value decisions without ever computing expected value.
[24]M. Pelz, S. T. Piantadosi, C. Kidd, "The dynamics of idealized attention in complex learning environments", in The 5th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, 2015. [pdf]
[23]F. Mollica, S. T. Piantadosi, "Towards semantically rich and recursive word learning models", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[22]F. Mollica, S. T. Piantadosi, M. K. Tanenhaus, "The perceptual foundation of linguistic context", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[21]J. Jara-Ettinger, E. Gibson, C. Kidd, S. T. Piantadosi, "Native Amazonian Children Forego Egalitarianism When They Learn to Count", Developmental Science, 2015. [pdf] [doi]
[20]P. Hemmer, K. Persaud, C. Kidd, S. T. Piantadosi, "Inferring the Tsimane's use of color categories from recognition memory", in Proceedings of the Cognitive Science Society, 2015. [pdf]
[19]J. Cantlon, S. T. Piantadosi, S. Ferrigno, K. Hughes, A. Barnard, "The origins of counting algorithms", Psychological Science, 2015. [pdf][ELI5] Monkeys can follow a procedure very similar to counting, but approximate.
[18]S. Alonso-Diaz, J. Cantlon, S. T. Piantadosi, "Cognition in reach: continuous statistical inference in optimal motor planning", in Proceedings of the Cognitive Science Society, 2015. [pdf]
2014
[17]S. T. Piantadosi, C. Kidd, R. Aslin, "Rich analysis and rational models: Inferring individual behavior from infant looking data", Developmental Science, 2014, pp. 1-16. [pdf][ELI5] You can find U-shaped attentional patterns within individual kids.
[16]S. T. Piantadosi, E. Gibson, "Quantitative Standards for Absolute Linguistic Universals", Cognitive Science, vol. 38, no. 4, 2014, pp. 736-756. [pdf][ELI5] [doi] You can get statistical evidence that certain features of language are impossible, but it probably takes more languages than actually exist.
[15]S. T. Piantadosi, J. Jara-Ettinger, E. Gibson, "Children's learning of number words in an indigenous farming-foraging group", Developmental Science, vol. 17, no. 4, 2014, pp. 553-563. [pdf][ELI5] [doi] Children in the Tsimane', an indigneous group in Bolivia, go through the exact same series of stages in learning number words as US kids, but take much longer to do so.
[14]S. T. Piantadosi, "Zipf’s word frequency law in natural language: A critical review and future directions", Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130. [pdf][ELI5] [doi] All theories of Zipf's law stink.
[13]C. Kidd, S. T. Piantadosi, R. N. Aslin, "The Goldilocks Effect in Infant Auditory Attention", Child Development, vol. 85, 2014, pp. 1795-1804. [pdf][ELI5] [doi] Infants show U-shaped attentional preferences where they tend to look away from things that are too complex, or too simple.
2013
[12]S. T. Piantadosi, H. Tily, E. Gibson, "Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]", ArXiv e-prints, 2013. [pdf][ELI5] Random typing models of language make absolutely no sense.
2012
[11]S. T. Piantadosi, L. Stearns, D. Everett, E. Gibson, "A corpus analysis of Pirahã grammar: An investigation of recursion", Talk presented at the LSA (by E. Gibson)., 2012. [pdf]
[10]S. T. Piantadosi, J. Tenenbaum, N. Goodman, "Bootstrapping in a language of thought: a formal model of numerical concept learning", Cognition, vol. 123, 2012, pp. 199-217. [pdf][ELI5] Children may learn counting by inferring the right (latent) algorithm to explain how their parents use number words.
[9]C. Kidd, S. T. Piantadosi, R. Aslin, "The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex", PLoS ONE, 2012. [pdf][ELI5] Infants show U-shaped attentional preferences where they tend to look away from things that are too complex, or too simple.
2011
[8]S. T. Piantadosi, H. Tily, E. Gibson, "Word lengths are optimized for efficient communication", Proceedings of the National Academy of Sciences, vol. 108, no. 9, 2011, pp. 3526. [pdf][ELI5] Word length is better explained by a word's predictability than its frequency.
[7]S. T. Piantadosi, H. Tily, E. Gibson, "Reply to Reilly and Kean: Clarifications on word length and information content", Proceedings of the National Academy of Sciences, vol. 108, no. 20, 2011, pp. E109. [pdf]
[6]S. T. Piantadosi, "Learning and the language of thought.", Ph.D. dissertation, MIT, 2011. [pdf][ELI5] Learners have to infer algorithms in order to show human-like competence in number, language, and logical concepts.
[5]S. T. Piantadosi, H. Tily, E. Gibson, "The communicative function of ambiguity in language", Cognition, vol. 122, 2011, pp. 280--291. [pdf][ELI5] Contrary to Chomsky's claims, ambiguity is good for communication in language because it prevents you from saying things that are redundant with the context.
2010
[4]S. T. Piantadosi, J. Tenenbaum, N. Goodman, "Beyond Boolean logic: exploring representation languages for learning complex concepts", in Proceedings of the Cognitive Science Society, 2010. [pdf][ELI5] You can model learning curves to see which basis of logical operations people likely use.
[3]C. Kidd, S. T. Piantadosi, R. Aslin, "The Goldilocks Effect: Infants' preference for visual stimuli that are neither too predictable nor too surprising", in Proceedings of the Cognitive Science Society, 2010. [pdf][ELI5] Infants show U-shaped attentional preferences where they tend to look away from things that are too complex, or too simple.
2009
[2]S. T. Piantadosi, H. Tily, E. Gibson, "The communicative lexicon hypothesis", in Proceedings of the Cognitive Science Society, 2009, pp. 2582-2587. [pdf][ELI5] Lexicons in natural language are strongly shaped by a pressure for communicative efficiency.
2008
[1]S. T. Piantadosi, N. Goodman, B. Ellis, J. B. Tenenbaum, "A Bayesian model of the acquisition of compositional semantics", in Proceedings of the Cognitive Science Society, 2008. [pdf]

• libraries & software •

A number of research software packages are actively developed by colala and available under the GNU Public License:

  • LOTlib is a library for modeling learning complex concepts as compositions of primitives in a language of thought. GPL3

  • kelpy (kid experimental library in python) is a library for running simple psychology experiments in python. It is intended primarily for making simple animated displays with simple responses for baby and child experiments. It is built on top of pygame and supportrs Tobii eyetracking. GPL3

  • ngrampy is a python library for manipulating large google ngram data sets, and computing measures such as average surprisal in context from Piantadosi, Tily, & Gibson (2011). Code is included to replicate and extend that finding. GPL3

  • ChurIso is a scheme library for recovering Church encodings from constraints (email for upcoming publication). GPL3

  • pychuriso is a python implementation of churiso inference. GPL3

  • WeberMCMC Bayesian data analysis for estimation of Weber ratios. In practice, incorporating the reliability of an estimate of W into statistics allows for more power and correctness. GPL3

  • GPUropolis Bayesian inference via Metropolis-Hastings on symbolic expressions, currently under heavy development. Code is highly parallelized using CUDA to run on graphics hardware. Includes several classic scientific data sets to test with. GPL3

  • BabylabDatabase Led by the The Rochester Baby Lab, this software provides an open source participant pool management tool. GPL3

• data & code •

Data from all projects completed and in progress is available upon request

• misc •

Extra programs, tutorials, stuff!







2121 Berkeley Way
Department of Psychology
University of California, Berkeley
Berkeley, CA, 94704