We analyzed thousands of conversations in a first-of-its-kind study. The biggest failures were invisible.

Invisible Failures in AI

Bigspin is research-led

Bigspin is research-led

Bigspin is built from the ground up as a research-led product, driven by the thesis that solving the critical industry blind spot of invisible AI failures requires a radically new approach to product development.

Bigspin is built from the ground up as a research-led product, driven by the thesis that solving the critical industry blind spot of invisible AI failures requires a radically new approach to product development.

Approach

Research at Bigspin

We study how people actually work with AI agents — what works, what fails invisibly, and how decisions from both system builders and users impact these outcomes.

But the layer where humans and AI actually interact — where capability gets constructed turn by turn, where failures hide in plain sight, where some users develop fluent practice and others don’t — has been studied much less than its importance warrants.

Our research program is about that layer.

Chief Scientist

Chris Potts

The research program is led by Chris Potts, Professor of Linguistics and Computer Science at Stanford and Professor in the Stanford NLP Group and the Stanford AI Lab. Chris has spent decades studying how humans communicate, how meaning is constructed in dialogue, and how computational systems can engage with both. 

Bigspin’s research program builds directly on this foundation. When humans and AI systems collaborate, what happens between them is a form of dialogue with its own structure, its own failure modes, and its own dynamics — and that’s what we study.

The Bigspin team gathered outdoors
The Bigspin team gathered outdoors

Affiliation

Professor, Stanford NLP Group;
Professor, Stanford AI Lab (SAIL)

Current

Co-Creator

Stanford Sentiment Treebank (SST)

2013

Co-Creator

Stanford Natural Language Inference Corpus (SNLI)

2015

Co-Creator

DSPy — programming framework for language models

2023

Publications

Recognition

Test-of-time award, recursive deep learning;
Multiple best-paper awards

2013 – 2015

Industry

Past Amazon Scholar

2023

Off-Hours

Passionate Skateboarder

2023

How WE WORK

How we work

How the research gets made

We work from real conversation data — public where possible (WildChat, SWE-chat), partner data when shared. Not from controlled experiments or synthetic benchmarks.

Three methods working together

Our work combines three approaches against the same corpora. Structural signal extraction pulls measurable patterns from conversations — counts, tool use, scaffolding patterns. Interpretive annotation uses LLMs reading transcripts under structured schemas to surface qualitative signals. Statistical validation grounds the findings — cross-validation, bootstrap confidence intervals, control conditions.

What we hold ourselves to

Transparency about what the data does and doesn't support. Reports include what didn't work and what we cut. Claims match the confidence the data supports — directional patterns are labeled as such, robust effects are reported with their intervals, corpus demonstrations are kept separate from general claims about agent quality. We'd rather underclaim than overstate.

Who we're building this for

The longer-arc goal is to build the analytical practice that makes the human–AI interaction layer legible — to product builders, to AI labs, to researchers, and to the people whose work depends on getting these systems right.

David, founding engineer, working on a laptop outdoors in a skatepark
Working with us

Teams building agent products can engage Bigspin to apply this analytical practice to their own user transcripts.

For research collaborations, citations, or media inquiries, contact hello@bigspin.ai