Approach
Research at Bigspin
We study how people actually work with AI agents — what works, what fails invisibly, and how decisions from both system builders and users impact these outcomes.
But the layer where humans and AI actually interact — where capability gets constructed turn by turn, where failures hide in plain sight, where some users develop fluent practice and others don’t — has been studied much less than its importance warrants.
Our research program is about that layer.
Library
Published research
Chief Scientist
Chris Potts
The research program is led by Chris Potts, Professor of Linguistics and Computer Science at Stanford and Professor in the Stanford NLP Group and the Stanford AI Lab. Chris has spent decades studying how humans communicate, how meaning is constructed in dialogue, and how computational systems can engage with both.
Bigspin’s research program builds directly on this foundation. When humans and AI systems collaborate, what happens between them is a form of dialogue with its own structure, its own failure modes, and its own dynamics — and that’s what we study.
Affiliation
Professor, Stanford NLP Group;
Professor, Stanford AI Lab (SAIL)
Current
Co-Creator
Stanford Sentiment Treebank (SST)
2013
Stanford Natural Language Inference Corpus (SNLI)
2015
DSPy — programming framework for language models
2023
Publications
Recognition
Test-of-time award, recursive deep learning;
Multiple best-paper awards
Industry
Past Amazon Scholar
Off-Hours
Passionate Skateboarder
How WE WORK
How we work
How the research gets made
We work from real conversation data — public where possible (WildChat, SWE-chat), partner data when shared. Not from controlled experiments or synthetic benchmarks.
Three methods working together
Our work combines three approaches against the same corpora. Structural signal extraction pulls measurable patterns from conversations — counts, tool use, scaffolding patterns. Interpretive annotation uses LLMs reading transcripts under structured schemas to surface qualitative signals. Statistical validation grounds the findings — cross-validation, bootstrap confidence intervals, control conditions.
What we hold ourselves to
Transparency about what the data does and doesn't support. Reports include what didn't work and what we cut. Claims match the confidence the data supports — directional patterns are labeled as such, robust effects are reported with their intervals, corpus demonstrations are kept separate from general claims about agent quality. We'd rather underclaim than overstate.
Who we're building this for
The longer-arc goal is to build the analytical practice that makes the human–AI interaction layer legible — to product builders, to AI labs, to researchers, and to the people whose work depends on getting these systems right.

Working with us
Teams building agent products can engage Bigspin to apply this analytical practice to their own user transcripts.
For research collaborations, citations, or media inquiries, contact hello@bigspin.ai

