SYNTHETIC SIMULATIONS
Pre-launch behavioral intelligence for product & marketing teams
Research Phase
AI-powered agentic simulation for pre-launch user behavior prediction

Know how your users will behave — before you ship a single thing.

According to research from Harvard Business School, 95% of new products fail — not from poor execution, but from poor pre-launch understanding of the customer. Synthetic Simulations converts your real behavioral data into thousands of AI-powered digital twins that experience your ads, products, and content in a simulated social environment — delivering directional, segment-level signals at a fraction of the cost and time of A/B testing or focus groups.

The Problem

Every launch is a blind bet

A/B testing burns real users, requires traffic you don't have pre-launch, and takes weeks to reach significance. Focus groups cost $15,000–$75,000, take 4–6 weeks, and suffer from observer bias. Analytics are retrospective — they tell you what happened, not what will. Surveys capture stated preference, not revealed behavior.

The Solution

Simulate before you commit

We are building a system that creates digital twins from your actual behavioral data, runs them through your content or product in a simulated social environment at society scale (10,000+ agents), and delivers qualitative + quantitative predictions. Think of it as a weather forecast for user behavior — directional, probabilistic, and increasingly accurate the more domain data it has.

r = 0.85

Human–Simulation Correlation

Across 70 nationally representative U.S. survey experiments, AI-generated persona responses correlated with actual human treatment effects at r = 0.85.

Hewitt et al., 2024, building on Argyle et al., 2023
85%

Test-Retest Validity

1,052 interview-grounded agents replicated real participants' survey responses 85% as accurately as individuals replicate their own answers over two weeks.

Park et al., 2024/2025 — Stanford Generative Agents
< 0.2

RMSE in Social Dynamics

Tested against 198 real-world information-propagation cases, OASIS replicated observed spreading dynamics with a mean normalized RMSE under 0.2 — including group polarization and herd behavior.

Yang et al., 2024 — OASIS, arXiv:2411.11581
10K+

Scale Threshold

Critical group dynamics — virality, polarization, herd behavior, social amplification — only emerge reliably at ≥10,000 agents. Below that threshold, network effects disappear. Our simulations default to 10,000+ agents.

OASIS validation research
Principle 01

Fine-tune on your data, not generic personas

Off-the-shelf LLM personas reflect the modal internet user — not your customers. Our twin construction pipeline is designed to convert your behavioral logs, engagement history, and demographic signals into dynamic behavioral priors fine-tuned on traces of actual decisions your users made.

Principle 02

Run at society scale

The research is definitive: critical group dynamics only emerge at ≥10,000 agents. Our target architecture defaults to 10,000 agents, with OASIS infrastructure supporting up to 1 million — capturing network effects that determine whether a campaign spreads or a feature gets adopted organically.

Principle 03

Measure group behavior, not individual prediction

We do not predict what any one user will do. We surface directional, segment-level signals: which segments engage, where drop-off concentrates, which variant resonates stronger, how content propagates — and the qualitative reasoning behind why.

01

Connect Your Data

Behavioral logs, engagement history, CRM signals, and demographic data via our ingestion API or secure file upload.

02

Build the Twins

Our pipeline will cluster your users into behavioral archetypes, initialize LLM agents with those profiles, and optionally fine-tune on your domain-specific action traces.

03

Define the Scenario

Upload the ad creative, product flow, or content piece. Configure platform context, social network topology, and recommendation system weight.

04

Run the Simulation

10,000+ agents will encounter your stimulus inside an OASIS environment with a realistic dual recommendation system across 21 human-like action types.

05

Get the Report

Segment-level signals, drop-off analysis, variant comparison, virality scores, and agent verbatims explaining why.

10K+
Agents per simulation (target)
1M
Max agents (OASIS capacity)
21
Human-like action types (OASIS)
📣

Ad & Campaign Pre-Flight

Which creative variant resonates with which segment — before production spend. Test five concepts in the time it currently takes to brief one focus group.

🧩

Product Feature Validation

Where do users drop off, get confused, or disengage — before engineering commits? Surface friction points while they're still cheap to fix.

✍️

Content Strategy

Which topics, formats, and angles will drive engagement and spread? Use social graph dynamics to predict whether content will propagate virally or quietly die.

💰

Pricing & Messaging Tests

How do different segments respond to price framing, benefit emphasis, or urgency messaging? Run behavioral experiments at scale in minutes, not months.

This is
  • A pre-launch risk filter that raises confidence
  • Strong at group-level and directional signals
  • Calibrated to your behavioral data
  • Faster and cheaper than focus groups or A/B tests
  • Increasingly accurate with more of your data
  • Social-dynamics-aware (virality, herd effects)
This isn't
  • A replacement for real-world testing
  • A predictor of individual sequential behavior
  • Accurate out-of-the-box for all use cases
  • A guaranteed outcome
  • A general-purpose market research platform
  • A survey or polling tool
Built on
OASIS (Oxford / Shanghai AI Lab / CAMEL-AI / HKU / Max Planck) Up to 1M Concurrent Agents Dual Recommendation System Proprietary Twin Construction Domain Fine-Tuning

Simulation is a weather forecast, not a crystal ball — directional, probabilistic, and increasingly accurate the more domain data it has. We are honest about what it can and cannot do.

— Our philosophy

Simile

Stanford spinout · $100M Series A · Feb 2026 · Fei-Fei Li, Andrej Karpathy, Index Ventures

General-purpose behavior foundation model — horizontal infrastructure for human behavior prediction. Early enterprise wins include CVS Health and Telstra. Reported 80% accuracy on earnings call prediction.

Our differentiation: Simile is horizontal infrastructure. We are a vertical product for specific product and marketing use cases, grounded in your behavioral data, with social simulation dynamics Simile does not offer.

Aaru

$1B headline valuation · Dec 2025 · Redpoint, a16z, Sequoia, Accenture Ventures

Simulates synthetic populations for market research and political prediction. Recreated EY's Global Wealth Research Report (normally 6 months) in a single day at 90% accuracy. Targets broad attitudinal research and polling.

Our differentiation: Aaru focuses on population-level attitudinal research. We focus on behavioral prediction — engagement, drop-off, virality — using your own first-party behavioral data and OASIS social graph dynamics.

Artificial Societies

YC W25 · $5.35M seed · London / San Francisco

~500,000 AI personas predicting how populations react to marketing content and brand messaging. Their "Reach" product predicts LinkedIn post virality at 83% accuracy vs. ~17% for ChatGPT.

Our differentiation: Artificial Societies targets social media content and LinkedIn. We target enterprise ad creative, product feature validation, and pre-launch simulation with proprietary behavioral data upload.

Blok

$7.5M seed · July 2025 · MaC Venture Capital

AI agents built from real user and product data to simulate software product usage before shipping. Focused on UX friction and onboarding flow testing for software development teams.

Our differentiation: Blok is a UX/QA tool for software teams. We run marketing and product hypothesis tests at society scale with emergent social dynamics for broader decision intelligence.

Synthetic Users

Gartner-recognized · Bootstrapped · $2–$27 per synthetic user

AI-powered individual user interviews for UX research. Strong product with a clear use case, no social simulation layer. Priced per synthetic user for 1:1 qualitative research at scale.

Our differentiation: Synthetic Users conducts 1:1 AI interviews. We run 10,000+ agent simulations capturing emergent group dynamics, social influence, and virality — phenomena that vanish at small scales.

Expected Parrot

YC F25 · MIT Sloan co-founder · Open source

Open-source Python library and no-code app for simulating customer surveys and pricing tests. Developer-focused with hybrid AI/human validation. Strong in academic and research contexts.

Our differentiation: Expected Parrot is a research tool for developers. We are an enterprise decision-intelligence product with behavioral data integration and society-scale social simulation.

What exactly is Synthetic Simulations?

Synthetic Simulations is a pre-launch behavioral intelligence platform. It converts your real user behavioral data — engagement history, click logs, CRM signals — into AI-powered digital twins. Those twins then experience your ads, product flows, and content inside a simulated social environment, returning directional, segment-level signals before you commit budget or ship code.

How accurate is the simulation compared to real user behavior?

At the group and segment level, the research is compelling. Silicon Sampling (Hewitt et al., 2024) found AI persona responses correlate with real human treatment effects at r = 0.85 across 70 nationally representative survey experiments. Stanford Generative Agents (Park et al., 2024/2025) achieved 85% test-retest validity using 1,052 interview-grounded agents. OASIS (Yang et al., 2024) replicated real-world social spreading dynamics with RMSE < 0.2 across 198 Twitter cases. Individual sequential prediction is weaker — Lu et al. (2025) showed ~12% accuracy without fine-tuning — which is why we fine-tune on your data.

Why 10,000+ agents? Couldn't you run fewer?

You could, but the results would be qualitatively believable and behaviorally incomplete. OASIS research (Yang et al., 2024) established that critical group dynamics — virality, polarization, herd behavior, social amplification — only emerge reliably at ≥10,000 agents. Below that threshold, network effects disappear. The phenomena that determine whether a campaign spreads or a product feature gets adopted organically are invisible at small scale. We default to 10,000 agents and our infrastructure supports up to 1 million.

What data do I need to get started?

The platform is designed to work with behavioral logs, engagement history, CRM signals, and demographic data — anything that captures traces of real decisions your users have made. The more domain-specific behavioral data you provide, the higher the simulation accuracy. Fine-tuning a 7B model on real click data improved next-action accuracy by 45% relative to prompt-only baseline (Lu et al., 2025). You can start with what you have; accuracy compounds with data.

Can simulation replace A/B testing or focus groups?

No — and we say this explicitly. Simulation is a pre-launch risk filter, not a replacement for real-world validation. It is designed to filter weak hypotheses before they reach production, dramatically reducing the cost and time of what you do validate with real users. Think of it as a weather forecast: directional, probabilistic, and increasingly accurate with more domain data — not a guaranteed outcome.

What is OASIS and who built it?

OASIS (Yang et al., 2024, arXiv:2411.11581) is an open-source large-scale social simulation framework developed collaboratively across Oxford, Shanghai AI Laboratory, CAMEL-AI.org, University of Hong Kong, and Max Planck Institute. It supports up to 1 million concurrent agents, a dual recommendation system (interest-based and hot-score), dynamic social networks, and 21 human-like action types. It is published under the Apache 2.0 license. Synthetic Simulations builds the twin construction pipeline, scenario layer, and reporting system on top of OASIS.

How is this different from synthetic data generation?

Synthetic data generation creates statistically plausible records (rows, tokens, profiles) for model training or privacy compliance. Synthetic Simulations creates behaviorally grounded agents that act, react, and interact inside a dynamic social environment. The output is not a dataset — it is directional behavioral signals: engagement rates, drop-off points, virality curves, segment comparisons, and natural language reasoning from agents about why they responded as they did.