A/B Testing experiment

Tomáš Veselý - podpořen AI
Jan 2
3 min read

We're building a comprehensive knowledge library about product development as part of our mission. The library is for anyone looking to make better decisions — primarily decisions about product development. Whether you're an inventor, a product manager, or a Chief Product Officer, using the right research methods and experiments increases your chances of building the right things for the right audience. Today we'll introduce the A/B Testing validation method.

When to Use This Experiment?

A/B testing primarily serves as a tool for optimizing various business metrics through product optimization. It is a universal framework for measuring the impact of changes on user behavior. The method makes it possible to isolate, with scientific precision, the effect of one specific change on the whole. The goal is to find changes that have a positive impact on product metrics.

Basic Experiment Principles

The core idea behind A/B testing is to test different product variants against a chosen metric on a small portion of all users. If the test turns out highly positive, the new changes are rolled out to everyone.

The A/B testing process follows these steps:

Formulate a hypothesis: Testing starts by defining the problem and an assumption. It is established what change should be made and what specific impact is expected from it.
Select metrics: A primary metric is chosen to determine the success of the test, along with secondary metrics to monitor side effects.
Prepare the variants: Two versions of the product are created: a control version (A), representing the current state, and a variant version (B), containing the tested change. To keep the data clean, only one variable is changed.
Determine the sample size: Before launch, the number of users needed to reach statistical significance is calculated. Usually hundreds to thousands of users.
Randomization (launching the test): A small portion of existing product users is randomly split between variants A and B. Random assignment is necessary to eliminate bias in the results.
Analyze the results: Once the data is collected, the difference between the variants is evaluated and statistical significance (p-value) is verified to rule out the effect of chance.
Reflect on limitations: The evaluation takes into account factors such as seasonality or the "Novelty effect" — a temporary spike in interest caused by the newness of the change, not its quality.

Real-World Experiment Example

Link to research: Behind Bing's blue links

An example of A/B testing's business impact is Microsoft's experiment with the Bing search engine. The product team faced a decision about which color to use for links in search results. Instead of choosing subjectively, several shades of blue were tested.

The experiment data showed that one specific shade of blue (code #0044CC) led to a significantly higher click-through rate and user engagement. After this variant was deployed, an estimated increase of $80 to $100 million in annual revenue was recorded. This case illustrates that even a visually negligible change can have a massive economic impact in the digital environment — one that would never have been discovered without experimental validation.

What Can Be Tested With This Experiment?

Common examples of A/B testing include optimizing the following metrics:

Conversion Rate: verifying which changes affect the conversion rate — the percentage of users who complete a desired action, such as a purchase or signing up for the product,
Click-Through Rate: verifying which changes affect the click-through rate, for example on a button in an email,
Retention Rate: verifying which changes affect churn — the ability to keep users over time, for example after changing the intro video welcoming new users,
Revenue: verifying which changes affect company revenue, for example whether a higher price drove away too many people,
User Experience: verifying which changes affect UX, such as error rates or task completion success,
Desirability: verifying interest in a feature that does not yet exist, for example with so-called Fake Door tests, where clicks on a button for a not-yet-built feature are measured.
Algorithms: verifying which algorithm has a better impact on the business, for example improving travel distance in last-mile delivery.