Get the newsletter · 2018 essays (PDF) · Featured · Recent

How to use A/B testing for better product design

There’s more than one way to use this tool
A/B testing is a very useful tool that can be used to develop better product designs, rather than just evaluating landing pages.

In a classic A/B test, you’re metrics-driven and want to pick whatever test variant ends up with the higher numbers. This is a useful tool, but is only applicable to scenarios like signup flows where the conversion is obvious. This post will describe some different tactics that are metrics-informed and end up as an aid to your product design process, rather than driving it.

The tactics I’ll describe are for:

  • Updating your product without negatively impacting numbers
  • Streamlining your product by measuring and removing unused features
  • Designing for the right level of prominence
Let’s get started…

Updating your product without negatively impacting numbers
Product teams are constantly pushing small updates to their products in response to customers and what’s happening to the market. When an update affects a key part of the product, particularly to the main signup flow or core viral loop, it’s often important to ensure that it doesn’t hurt the numbers.

For example, let’s say you’re building a new social site and you have a Facebook-integrated “friend finder” option that you want to add. If you build this and test it, you’ll likely find that since it’s unoptimized, it’ll have worse initial numbers. A classic A/B test will often eliminate the new design because it performs worse. But instead of killing it prematurely, you can use an A/B test to iteratively “bake” the new design with a small % of users until it’s ready to replace the old one.

If you know that it’s important to have this type of Facebook integration in your product design, what you do is leave it in, but only expose 10% of your users to it. Then keep making small updates to the design, working on the copy, call to action, and other aspects, until the new design performs as well as the original.

In this way, you can update your product without impacting the numbers negatively. And unlike a classic A/B test where you aim to just pick a winner, instead you are using it to incrementally benchmark a new design until it’s ready to replace the existing one. For this, you are design-led because you know you want to execute this product in a particular way, but you use the A/B test as a safety net to make sure you don’t push out something that’s not ready.

Streamlining your product by measuring feature usage
There’s an important design principle that says, “Do less, but better.” I’ll elaborate on my POV of this philosophy more in a future post, nevertheless many product teams struggle to remove features, or even to quantify unused features.

For example, you might have a legacy feature that suggests people to follow on your social site, which you’d like to replace with a Facebook-based “friend finder” screen instead. Sometimes it can be difficult to get rid of navigation on something like this because it’s not clear how many people are really using it and how that affects their behavior overall, especially new users

A nifty way of using A/B tests to handle this is to run an A/B test to remove the feature, and get the following information back:

  • How many people actually get exposed to this feature? (Based on what % of people get added into the experiment versus your active users during the test’s time period)
  • What metrics are affected by people who have this feature removed? (As long as the metrics are neutral to positive, then you can remove it safely)
  • If some metrics are bad, can you counteract it by adding something else to the new design?

Similar to the process of updating your product, the important notion here is that you have a particular action you want to take on a design level (simplify the UX) and you use the A/B test as a tool to aid that design goal. In this case, rather than going with whatever has better metrics, instead the goal is to go with the better design as long as it’s neutral or better on the numbers.

Designing for the right level of prominence
As you model out the key metrics for your product, there’s often important assumptions that need to be made on things like what % of your users invite their friends, or how many friends they invite, etc. Oftentimes, entire product strategies hinge on making sure that certain kinds of metrics get hit- it could mean the difference between being a viral eyeballs business versus one based on lifetime value and ad spend.

From a product standpoint, this manifests itself as trying to figure out how prominent to make things like “Invite friends” or “Import your addressbook” or “Subscribe to the Pro version.” To build a great UX, you often want to make something as low-prominence as possible while still making sure it’s easy and accessible for users.

A/B testing can help a lot here since you can test multiple versions of prominence and see where it takes you. If you want to prove that a model is even possible (for example, in the very best case could we get 20% of our users to invite their friends?) then you can make a popup that asks for friend invites constantly and see if you are even close. The point here isn’t that you would ever actually close the experiment with the obnoxious popup, but rather, it helps you do a sensitivity analysis of what might even be possible, to see are realistic values within your model.

You can use this technique hand-in-hand with the other ones listed above so that you eventually take a high-prominence version of it and iterate until it’s acceptable to show to 100% of the users.

Final thoughts
The thing that all of these ideas share is that you are using A/B testing as a tool to aid in a broader and stronger design POV rather than slavishly following whatever has the better metrics outcome. As others have discussed before, it’s the difference between data-informed versus data-driven. Many features you’ll want to do in your product have lots of qualitative value, even if the short-term quantitative benefits are difficult to measure or not there at all- using these advanced tactics lets you continue to push out dramatic new designs but without hurting the metrics your business depends on.

PS. Get new updates/analysis on tech and startups

I write a high-quality, weekly newsletter covering what's happening in Silicon Valley, focused on startups, marketing, and mobile.