# Synthetic Simulations — Complete Site Content This file contains the full text content of syntheticsimulations.com for LLM consumption. Last updated: 2026-02-28 Canonical URL: https://www.syntheticsimulations.com/ Contact: gaurav.singh@syntheticsimulations.com --- ## Company Overview **Synthetic Simulations** is a pre-launch behavioral intelligence platform for product and marketing teams. The platform converts real user behavioral data into AI-powered digital twins — LLM agents initialized with behavioral priors derived from actual user decisions — and runs those agents through ad creatives, product flows, and content inside a simulated social environment built on OASIS. The output is directional, segment-level behavioral signals: which segments engage, where drop-off concentrates, which creative variant resonates more strongly, and how content propagates through a simulated social graph. Tagline: Pre-launch behavioral intelligence for product and marketing teams. Current status: Research phase, seeking design partners and early collaborators. --- ## The Problem: The $140 Billion Failure Rate Every year, more than 30,000 products launch in the United States. According to research from Harvard Business School, 95% fail — not because of poor execution, but because of poor pre-launch understanding of the customer. The single largest cause of failure, across category after category, is inadequate market insight: 35% of products fail due to lack of real market need, another 35% from research that was too slow, too small, or too late to inform the decision. The tools available to product and marketing teams were built for a world where validation was expensive and slow: - **A/B testing** burns real users, requires traffic volume you don't have pre-launch, and takes weeks to reach significance - **Focus groups** cost $15,000–$75,000 per study, take 4–6 weeks to complete, and suffer from well-documented observer bias and social desirability effects - **Analytics** are retrospective — they tell you what happened, not what will - **Surveys** capture stated preference, not revealed behavior The result: billions of dollars committed to product launches and ad campaigns based on gut feel, small-n qualitative feedback, and creative team confidence. The $63 billion in global ad fraud and the fact that 53% of B2B go-to-market spend is considered ineffective are symptoms of the same underlying failure — teams betting blind. --- ## The Solution: Agentic Simulation A new class of research has emerged suggesting that large-scale agentic simulation — populations of LLM-powered agents initialized with real behavioral data — can surface directional behavioral signals before real-world exposure, faster and cheaper than any existing validation method. This is being published in Nature, at NeurIPS, and at ACM UIST. It is being backed with serious capital: Simile (the Stanford spinout built by the authors of Generative Agents) raised $100M in February 2026. Aaru, which simulates synthetic populations for market research, hit a $1B headline valuation in December 2025 backed by Redpoint, a16z, and Sequoia. Synthetic Simulations is building the domain-specific version of this thesis — calibrated for product and marketing decisions, grounded in your behavioral data, and honest about what simulation can and cannot do. --- ## What the Science Shows ### Where Simulation Performs Well **OASIS (Yang et al., 2024, arXiv:2411.11581 — Oxford, Shanghai AI Lab, CAMEL-AI, HKU)** The most rigorous large-scale social simulation validation to date. Tested against 198 real-world Twitter information-propagation cases, it successfully replicated observed spreading dynamics with a mean normalized RMSE under 0.2. It reproduced group polarization, herd behavior, and misinformation amplification — phenomena that only emerged reliably at ≥10,000 agents. Below that threshold, critical group dynamics disappear. This finding directly shapes the Synthetic Simulations architecture: simulations run at 10,000+ agents by default. **Stanford Generative Agents (Park et al., 2023, ACM UIST)** Demonstrated emergent social coordination from a single seed instruction — 25 agents autonomously organized a Valentine's Day party across multiple social hops without any explicit programming for these behaviors. Human evaluators rated agent behavior as "highly believable." A follow-on study (Park et al., 2024/2025) used 1,052 interview-grounded agents to replicate real participants' survey responses 85% as accurately as individuals replicate their own answers over two weeks — a compelling test-retest validity benchmark. **Silicon Sampling (Hewitt et al., 2024, building on Argyle et al., 2023)** Demonstrated that across 70 nationally representative U.S. survey experiments, GPT-4-generated persona responses correlated with actual human treatment effects at r = 0.85. This holds across a large, heterogeneous sample of social science research. **LLM Trust Behavior (Xie et al., 2024, NeurIPS)** Validated GPT-4 as a credible proxy for human behavior in trust games — the foundational paradigm of behavioral economics. The same study identified specific documented failure modes: LLMs show higher trust toward human counterparts than AI agents, exhibit gender-based trust asymmetries, and are more easily damaged by negative stimuli than enhanced by positive ones. These biases are documented and can be compensated for. ### Where Simulation Falls Short **Lu et al., 2025 (arXiv:2503.20749)** The most important limiting paper in the field. Using 31,865 real-world e-commerce shopping sessions and 230,965 individual user actions as ground truth, prompt-only LLM agents achieved only 11.86% accuracy on predicting the correct next user action in a sequential multi-turn browsing session. DeepSeek-R1, Llama, Claude — all fell into this range without fine-tuning. This is a hard constraint on what zero-shot simulation can do at the individual, sequential level. **Toubia et al., 2025** LLM simulations replicated approximately 50% of aggregate treatment effects in behavioral economics experiments — a floor, not a ceiling. Agents also consistently over-purchase and over-filter compared to real users, reflecting biases baked into RLHF training. ### The Honest Synthesis Simulation is strong at group-level and directional signals: which segments engage more, which creative variant surfaces stronger resonance, where drop-off concentrates, how content propagates through a population with social influence dynamics. It is weak at individual-level sequential action prediction without domain-specific fine-tuning. The good news: fine-tuning closes the gap substantially. Qwen2.5-7B fine-tuned on real click data achieved 17.26% action accuracy versus 11.86% for prompt-only DeepSeek-R1 — a 45% relative improvement. The Synthetic Simulations twin construction pipeline is built around this finding. --- ## Our Approach: Three Principles ### Principle 1: Fine-tune on your data, not generic personas The Lu et al. findings are unambiguous: prompt-only simulation achieves ~12% individual action accuracy. Fine-tuning on domain-specific behavioral data achieves 17%+ and continues to improve with more data. Off-the-shelf LLM personas reflect the modal English-language internet user — not your customers. The twin construction pipeline converts behavioral logs, engagement history, and demographic signals into agent initialization profiles. These are dynamic behavioral priors — fine-tuned on traces of actual decisions your users made — that seed agents with your customers' revealed preferences, not assumed ones. ### Principle 2: Run at society scale The OASIS research is definitive: critical group dynamics — virality, polarization, herd behavior, social amplification — only emerge reliably at ≥10,000 agents. Smaller simulations are qualitatively believable but behaviorally incomplete. They miss the network effects that determine whether a campaign spreads or a feature gets adopted organically. Default simulation size: 10,000 agents. Maximum supported: 1,000,000 agents. ### Principle 3: Measure group behavior, not individual prediction Synthetic Simulations does not claim to predict what any one user will do. It measures: - Which segments engage — and which ones ignore - Where drop-off concentrates — before engineering commits - Which creative variant surfaces stronger resonance — before production spend - How content propagates through a simulated social graph — before publishing - Agent reasoning — the qualitative texture of why agents behaved as they did --- ## How It Works: Five Steps **Step 1 — Connect Your Data** Behavioral logs, engagement history, CRM signals, and demographic data fed through an ingestion API or secure file upload. **Step 2 — Build the Twins** The twin construction pipeline clusters users into behavioral archetypes, initializes LLM agents with those profiles, and optionally fine-tunes on domain-specific action traces. This takes hours, not weeks. **Step 3 — Define the Scenario** Upload the ad creative, product flow, content piece, or feature description. Configure scenario parameters: platform context, social network topology, recommendation system weight. **Step 4 — Run the Simulation** 10,000+ agents encounter the stimulus inside an OASIS environment with a realistic dual recommendation system (interest-based + hot-score). Agents act across 21 human-like action types: engage, ignore, share, comment, abandon, convert, and more. **Step 5 — Get the Report** - Segment-level engagement and conversion signals - Drop-off analysis with session replay of agent reasoning - Side-by-side variant comparison (which creative performed better, with which segment, and why) - Virality score: social propagation curve across the simulated graph - Agent verbatims: natural language explanations of why agents behaved as they did --- ## Use Cases **Ad & Campaign Pre-Flight** Which creative variant resonates with which segment — before production spend. Test five concepts in the time it currently takes to brief one focus group. Know whether a campaign will land with 35–44 year old high-income women before setting a single dollar live. **Product Feature Validation** Where do users drop off, get confused, or disengage — before engineering commits? Run a simulated onboarding flow against behavioral archetypes and surface friction points while they are still cheap to fix. **Content Strategy** Which topics, formats, and angles will drive engagement and spread before publishing? Use the social graph dynamics in OASIS to predict whether a piece of content will propagate virally or quietly die. **Pricing & Messaging Tests** How do different segments respond to price framing, benefit emphasis, or urgency messaging? Run behavioral economics experiments at scale in minutes, not months. --- ## Market Opportunity | Market | 2024 Size | CAGR | |---|---|---| | Global Market Research / UX Research Industry | $140B | 7–9% | | UX Research Software (AI-native tools) | $27.9B | 22.5% | | A/B Testing & Experimentation Platforms | $1.2B | 11.5–14% | | Synthetic Data Generation | $1.8B | 33–39% | | Digital Twin Software (simulation layer) | $14.7B | 37–65% | The most directly addressable segment is the $27.9B UX and market research software market. Enterprise research tools charge $25,000–$100,000+ per year, with per-study costs of $10,000–$75,000 and timelines of 4–6 weeks. Synthetic Simulations runs faster, cheaper, and at a scale no human panel can match. --- ## Competitive Landscape **Simile** — Stanford spinout, $100M Series A (Feb 2026). Backed by Fei-Fei Li, Andrej Karpathy, Index Ventures. Building a general-purpose behavior foundation model — horizontal infrastructure for human behavior prediction. Early enterprise wins: CVS Health (retail display simulation), Telstra (pricing change simulation). Reported 80% accuracy on earnings call question prediction. Differentiation: Simile is horizontal infrastructure; Synthetic Simulations is a vertical product for specific product/marketing use cases with behavioral data integration and social simulation. **Aaru** — $1B headline valuation (Dec 2025). Backed by Redpoint, a16z, Sequoia, Accenture Ventures. Simulates synthetic populations for market research and political prediction. Recreated EY's Global Wealth Research Report (normally 6 months) in a single day with 90% accuracy. Differentiation: Aaru focuses on population-level attitudinal research; Synthetic Simulations focuses on behavioral prediction (engagement, drop-off, virality) using first-party behavioral data with OASIS social network dynamics. **Artificial Societies** — YC W25, $5.35M seed, London/SF. ~500,000 AI personas predicting how populations react to marketing content and brand messaging. "Reach" product predicts LinkedIn post virality at 83% accuracy vs. ~17% for ChatGPT. Differentiation: Artificial Societies focuses on social media content and LinkedIn; Synthetic Simulations targets enterprise ad creative, product feature validation, and pre-launch simulation with proprietary behavioral data upload. **Blok** — $7.5M seed (July 2025), MaC Venture Capital. AI agents from real user and product data to simulate software product usage before shipping. Focused on UX friction and onboarding flow testing for software teams. Differentiation: Blok is a UX/QA tool; Synthetic Simulations runs marketing and product hypothesis tests at society scale. **Synthetic Users** — Gartner-recognized, bootstrapped. AI-powered individual user interviews for UX research at $2–$27 per synthetic user. No social simulation layer. Differentiation: Synthetic Users does 1:1 AI interviews; Synthetic Simulations runs 10,000+ agent simulations capturing emergent group dynamics, social influence, and virality. **Expected Parrot** — YC F25, MIT Sloan academic co-founder. Open-source Python library + no-code app for simulating customer surveys and pricing tests. Developer-focused, hybrid AI/human validation. Differentiation: research tool vs. enterprise decision-intelligence product. --- ## Technical Foundation Synthetic Simulations is built on OASIS (open-source, Apache 2.0 license) — the large-scale social simulation framework developed collaboratively across Oxford, Shanghai AI Laboratory, CAMEL-AI.org, University of Hong Kong, and Max Planck Institute (arXiv:2411.11581). **OASIS provides:** - Scalability: up to 1,000,000 concurrent agents via distributed inference - Realistic environment: dual recommendation system (interest-based + hot-score), dynamic social network, temporal simulation engine - Action space: 21 human-like action types mirroring real platform affordances - Validated accuracy: tested against 198 real-world Twitter cases with RMSE < 0.2 **Synthetic Simulations contributes above OASIS:** - Twin construction pipeline: behavioral data ingestion → archetype clustering → agent initialization and domain fine-tuning - Scenario layer: product/ad stimulus definition, scenario configuration, controlled injection into the simulation environment - Reporting layer: segment analysis, variant comparison, virality scoring, agent verbatim extraction - Calibration methodology: parallel testing framework for measuring prediction accuracy against real-world outcomes **Tech stack:** - OASIS (Oxford / Shanghai AI Lab / CAMEL-AI / HKU / Max Planck) - Up to 1M Concurrent Agents - Dual Recommendation System - Proprietary Twin Construction - Domain Fine-Tuning --- ## What Synthetic Simulations Is — and Isn't **This is:** - A pre-launch risk filter that raises confidence - Strong at group-level and directional signals - Calibrated to your behavioral data - Faster and cheaper than focus groups or A/B tests - Increasingly accurate with more of your data - Social-dynamics-aware (virality, herd effects) **This isn't:** - A replacement for real-world testing - A predictor of individual sequential behavior - Accurate out-of-the-box for all use cases - A guaranteed outcome - A general-purpose market research platform - A survey or polling tool --- ## Frequently Asked Questions **What exactly is Synthetic Simulations?** Synthetic Simulations is a pre-launch behavioral intelligence platform. It converts your real user behavioral data — engagement history, click logs, CRM signals — into AI-powered digital twins. Those twins then experience your ads, product flows, and content inside a simulated social environment, returning directional, segment-level signals before you commit budget or ship code. **How accurate is the simulation compared to real user behavior?** At the group and segment level, the research is compelling. Silicon Sampling (Hewitt et al., 2024) found AI persona responses correlate with real human treatment effects at r = 0.85 across 70 nationally representative survey experiments. Stanford Generative Agents (Park et al., 2024/2025) achieved 85% test-retest validity using 1,052 interview-grounded agents. OASIS (Yang et al., 2024) replicated real-world social spreading dynamics with RMSE < 0.2 across 198 Twitter cases. Individual sequential prediction is weaker — Lu et al. (2025) showed ~12% accuracy without fine-tuning — which is why agents are fine-tuned on domain-specific behavioral data. **Why 10,000+ agents? Couldn't you run fewer?** OASIS research (Yang et al., 2024) established that critical group dynamics — virality, polarization, herd behavior, social amplification — only emerge reliably at ≥10,000 agents. Below that threshold, network effects disappear. The phenomena that determine whether a campaign spreads or a product feature gets adopted organically are invisible at small scale. The platform defaults to 10,000 agents and supports up to 1 million. **What data do I need to get started?** The platform is designed to work with behavioral logs, engagement history, CRM signals, and demographic data — anything that captures traces of real decisions your users have made. Fine-tuning a 7B model on real click data improved next-action accuracy by 45% relative to prompt-only baseline (Lu et al., 2025). **Can simulation replace A/B testing or focus groups?** No — and Synthetic Simulations says this explicitly. Simulation is a pre-launch risk filter, not a replacement for real-world validation. It is designed to filter weak hypotheses before they reach production, reducing the cost and time of what you do validate with real users. **What is OASIS and who built it?** OASIS (Yang et al., 2024, arXiv:2411.11581) is an open-source large-scale social simulation framework developed collaboratively across Oxford, Shanghai AI Laboratory, CAMEL-AI.org, University of Hong Kong, and Max Planck Institute. It supports up to 1 million concurrent agents, a dual recommendation system, dynamic social networks, and 21 human-like action types. Published under the Apache 2.0 license. **How is this different from synthetic data generation?** Synthetic data generation creates statistically plausible records for model training or privacy compliance. Synthetic Simulations creates behaviorally grounded agents that act, react, and interact inside a dynamic social environment. The output is directional behavioral signals: engagement rates, drop-off points, virality curves, segment comparisons, and natural language reasoning from agents. **How is Synthetic Simulations different from competitors like Simile, Aaru, or Synthetic Users?** Simile builds horizontal infrastructure for any domain. Aaru focuses on attitudinal market research and political polling. Synthetic Users runs 1:1 AI interviews. Synthetic Simulations differentiates by targeting specific product and marketing decisions — behavioral prediction using first-party data — with full society-scale simulation (10,000+ agents) capturing emergent social dynamics that attitudinal and interview-based tools cannot. --- ## Contact Email: gaurav.singh@syntheticsimulations.com Website: https://www.syntheticsimulations.com/ Status: Research phase — seeking design partners and early collaborators