Wizard of Oz experiment

Tomáš Veselý - podpořen AI
1 day ago
4 min read

We're building a comprehensive knowledge library about product development as part of our mission. The library is for anyone looking to make better decisions — primarily decisions about product development. Whether you're an inventor, a product manager, or a Chief Product Officer, using the right research methods and experiments increases your chances of building the right things for the right audience. Today we'll introduce the Wizard of Oz validation method.

When to Use This Experiment?

Wizard of Oz is a good fit when you need to validate demand or behavior, especially for automation and data products. The user believes they're interacting with a finished automated system, while in reality a human hidden behind the scenes is handling the responses. Unlike the related Concierge method, where the user knows the service is being delivered manually by a person, Wizard of Oz preserves the illusion of automation, so what you're testing is how people react to a seemingly finished product. The method works well when:

building full automation would be expensive or time-consuming;
working with any type of automation or automation-related product;
working with any AI system or AI-related product;
the product is primarily a software product;
it's unclear whether there's any interest in the proposed solution at all;
you're testing a new interface (chatbot, voice assistant, AI) where user behavior is uncertain;
the team has neither the budget nor the capacity to build a working version of generic automated behavior;
you need real behavioral data before setting the rules and logic of the automation.

Basic Experiment Principles

The core idea is to fake the automation in the product itself. The user works with a seemingly finished product, while behind the curtain a human — the "wizard" — performs the tasks by hand. This lets you validate the value of a automation solution quickly, without building an expensive backend.

Define the goal and the task. Determine which assumption you're testing and pick a concrete task the user would normally do with the product. Prepare a script for the "wizard."
Build a believable interface. A simple clickable prototype or a real application interface whose hidden parts the wizard controls is enough. On the surface, it has to feel like a real product.
Set up the behind-the-scenes operation. The wizard simulates the system's responses (changing the screen, sending an email, delivering the expected output, writing AI texts) and sticks to the script so the illusion of automation stays consistent.
Run the test with users. Participants from the target group interact with the prototype believing it's fully automated. The wizard responds in real time while an observer records their behavior.
Collect data and read the signal. It's worth tracking the completion rate of the key action, the number of orders or payments, satisfaction, and repeat use. The signal to build real automation comes when users who believe in the automation repeatedly complete the key action (for example, pay) at a rate that clearly confirms demand.
Identify the risks. By its very nature, the method doesn't scale — manual operation can't handle large volumes or complex scenarios. Human operation introduces inconsistency that distorts the data. It only works for simple, controlled flows of data and information. It demands a lot of real-time coordination. And because it's built on deception, you risk losing trust if the user sees through the illusion — which is why a follow-up debrief and ethical caution are in order.

Real-World Experiment Example

Link to research: We charged $100/month for an AI that was really just two guys surviving on pizza.

In 2017, Fireflies.ai founders Sam Udotong and Krish Ramineni wanted to validate whether companies would pay for an assistant that joins a meeting and sends notes from it. Instead of building a transcription engine, they took the manual route: for $100 a month, they offered a seemingly AI assistant that would "join the meeting." In reality, one of the founders quietly dialed into the call under the name "Fred from Fireflies.ai," stayed silent throughout, and took notes by hand.

From the customer's perspective, it looked like an automated tool — the summary arrived roughly ten minutes after the meeting ended. This way, the founders handled around a hundred meetings, and the revenue covered their San Francisco rent. Taking notes by hand also showed them exactly what a high-quality note should look like, which later directly shaped the development of the real AI.

Once demand was proven, the team built a real transcription engine (working by the end of 2018). The "two guys surviving on pizza" experiment grew into a company worth over a billion dollars.

What Can Be Tested With This Experiment?

The method's main strength is validating whether real demand for an automated solution exists, before you build expensive technology. In particular, you can test:

Demand validation: whether people actually want the proposed product. The signal is that they complete the key action (place an order, sign up, use a given part of the app) even when the output arrives manually and with a delay.
Automation output validation: whether the output of the automation has the expected quality and form. The signal is that people use the service repeatedly.
Willingness-to-pay validation: whether the interest is strong enough to pay for. The signal is a real payment or a binding order for the simulated service.
Interaction and expectation validation: how users use the interface and what they expect from it. The signal is the points where they get stuck and the steps they skip.
Language and intent validation for AI and voice interfaces: how people phrase their queries and which words they understand. The signal is the natural phrasings the wizard captures without a real algorithm.
Understanding the effort to build final automation solution: understand what tasks are required to deliver the output to the user and if they can be actually automated. The signal is a map of the manual steps that will later need to be automated, which serves as a roadmap.