OpenThoughts

curate the best open reasoning datasets

🌟 Open Thoughts: Curating the Best Open Reasoning Datasets

**Open Thoughts** is a community effort by DataComp and Bespoke Labs to curate the best open reasoning datasets. Our first goal is to create a reasoning dataset to train state-of-the-art small reasoning models that surpass DeepSeek-R1-Distill-32B and DeepSeek-R1-Distill-7B on math and code reasoning benchmarks.

AI Innovation

📊 Latest Results

Here are the latest results from our models evaluated with our open-source tool, Evalchemy:

Model	Dataset	AIME24	AIME25 I	MATH500	GPQA-D	LCBv2
LIMO-32B	0.8k	56.7	49.3	86.6	58.1	60.0
s1-32B	1k	36.0	25.3	84.8	50.5	40.9
s1.1-32B	1k	64.7	49.3	89.0	60.1	65.5
OpenThinker-32B	114k	66.0	53.3	90.6	61.6	68.9
R1-Distill-32B	800k	76.7	55.9	89.4	57.6	71.2

AI Performance

👥 About Us

We are a team of researchers and engineers from Stanford, University of California Berkeley, University of Washington, Bespoke Labs, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, and Toyota Research Institute, united around building the best datasets (and thus the best models). See our previous works at datacomp.ai and mlfoundations.

AI Collaboration

💡 Open Thoughts Supporters

Open Thoughts is supported by Bespoke Labs, NSF IFML, UT Austin Machine Learning Lab, Juelich Supercomputing Center, Toyota Research Institute, and Lambda Labs.

AI Support

📢 Announcements

February 12, 2025: Scaling up Open Reasoning with OpenThinker-32B
January 30, 2025: Measuring Reasoning with Evalchemy
January 28, 2025: Launching the Open Thoughts Project

AI News

TOKEN SHOWCASE

List of tokens people are building with Solana

🙏 Please add your token

BTC