Sign up for my email newsletter

Get new updates, usually once a week – it features long-form essays on what’s going on here in Silicon Valley.

I’ve written 550+ essays which have been featured and quoted in The New York Times, Fortune, Wired, and WSJ. The topics range from mobile product design to fundraising to “growth hacking.”

Thanks for reading. -Andrew

Close

@andrewchen

Subscribe · Featured · Recent essays

5 warning signs: Does A/B testing lead to crappy products?


Above: Hollywood sequels follow from risk-averse design decisions, like the widely panned Godfather Part 3

The dangers of the metrics-driven design process
Many readers of this blog are expert practitioners of metrics-driven product development, and with this audience in mind, my post today is on the dangers of going overboard with analytics.

I think that this is an important topic because the metrics-driven philosophy has come to dominate the Facebook/OpenSocial ecosystem, with negative consequences. App developers have pursued short-term goals and easy money – leading to many copycat and uninspired products.

At the same time, it’s clear that A/B testing and metrics culture serves only to generate more data-points, and what you do with that data is up to you. Smart decisions made by entrepreneurs must still be employed to reach successful outcomes. (Thus, my answer to the title question is that no, A/B testing does NOT lead to crappy products, but poor decision-making around data can absolutely lead to it)

So let’s talk about the dangers of being overly metrics-driven – here are a couple of the key issues that can come up:

  1. Risk-averse design
  2. Lack of cohesion
  3. Quitting too early
  4. Customer hitchhiking
  5. Metrics doesn’t replace strategy 

Let’s dive in deeper…

#1 Risk-averse design
The first big issue is that when you design for metrics, it’s easy to become risk-averse. Why try to create an innovative interaction when something proven like status/blogging/friends/profiles/forums/mafia/etc already exists? By copying something, you’re more likely to quickly converge to a mediocre outcome quickly, rather than spending a ton of effort potentially creating something bad – but of course this also eliminates amazing, ecstatic design outcomes as well. 

This risk-averse product design can lead to watered down experiences that combine a mish-mash of features that your audience has already seen elsewhere, and done better too. So while it’s an efficient use of effort, it’s unlikely that your experience will ever be a great one. It’s a recipe for mediocrity. 

Risk aversion is responsible for a whole bunch of bad product decisions outside of the Internet industry as well: Why do Hollywood sequels get made, even though they are usually much worse than the original? Why do companies continually do “brand extensions” that dilute the value of their brand position? The reason is that it’s an efficient thing to do, and it’s pretty easy to make some money even if the end product is not that great. But it’ll hurt in the long run, since these products will inherently be mediocre rather than great.

In my opinion, the only way to avoid this is to never get lazy about design, and to always take the time to create innovative product experiences. Of course you’ll always have parts of your product which will borrow from the tried-and-true, yet I think it’s always important that the core of the experience is differentiated and compelling.

#2 Lack of cohesion
As hinted above, the next issue is that A/B tested designs often create severe inconsistency within an experience. The bottoms-up design process that results from lots of split testing is likely to come up with many local effects, which may rule global design principles.

Here’s a thought experiment to demonstrate this: Let’s say you tested every form input on your website, with different labels, fonts, sizes, buttons, etc. You’re likely, if you picked the best-performing candidate, to have wildly different looking forms across the site. While it may perform better, it also makes the experience inconsistent and confusing.

Ultimately, I think resolving this has to do with striking a balance between global design principles and local effects. One great way to do this is to split out the extremely critical parts of your product funnel to be locally optimized, and keep the rest of the experience the same. For a social gaming site, the viral loop and the transaction funnel should be optimized separately, whereas the core of the game experience should be very internally consistent.

#3 Quitting too early
Another way to get to uninspired products is to quit too early while iterating an experience because of early test data. When metrics are easy to collect on a new product feature, it’s often very tempting to launch a very rough feature and use the initial metrics to judge the success of the overall project. And unfortunately, when the numbers are negative, there can be a huge urge to quit early – this is a very human reaction to wanting to not waste a bunch of time on something that’s perceived to fail.

Sometimes a product requires features A, B, and C to work right, and if you’ve only done A, it’s hard to figure out how the entire experience will work out. Maybe the overall data is negative? Or maybe it inspire dynamics that go away once all the features are bundled? Interim data is often just that – interim. But people are great at extrapolating data, but sometimes the right approach is just to play out your hand, see where things go, and evaluate once the entire design process has completed.

#4 Customer hitchhiking
A colleague of mine once used the term “customer hitchhiking” to describe how it’s easy to follow the customer on whatever they want to do, rather than having an internal vision of where YOU want to go. This can happen whenever the data overrules internal discussion and resists interpretation, because it’s so uncompromising as hard evidence. The important thing to remember, of course, is that the analysis is only as good as the analyst, and it’s up to the entrepreneur to put the data into the context of strategy, design, team, and all the other perspectives that occur within a business.

Today, I think you see a lot of this customer hitchhiking whenever companies string together a bunch of unrelated features just to please a target audience. This reminds me of what is often called the “portal strategy” of the late-90s. Just combine a bunch of stuff in one place, and off you go. The danger of that, of course, is that that it leads to incoherent user experience, company direction, and numerous other sources of confusion.

In the Facebook/OpenSocial ecosystem, of course, this manifests itself as companies that have many unrelated apps. You can dress this up as a “portfolio” or a “platform” but at the same time, it can be a recipe for crappy product experiences.

#5 Metrics doesn’t replace strategy
What do you think it would be like to write a novel, one sentence at a time, without thinking about the broader plot? I’m sure it’d be a terrible novel, and similarly, I bet that testing one feature at a time is likely to lead to a crappy product.

Ultimately, every startup needs to decide what they want to do when they grow up – this is a combination of entrepreneurial judgement, instinct, and strategy.

Every startup has to figure out how big the market is, they have to deliver a compelling product, and they need a powerful marketing strategy to get their services in front of millions of people. Without a long-term vision of how these things will happen, an excessive amount of A/B testing will surely lead to a tiny business.

To use a mountaineering analogy: Metrics can be very helpful in helping you scale the mountain once you’re on top of the right one – but how do you figure out whether you’re scaling the right peak? Analytics are unlikely to help you there.

Conclusions
My point on this – nothing is ever a silver-bullet, and as much as I am an evangelist for metrics-driven approaches to startup building, I’m also very aware of the shortcomings. In general, these tools are great for optimizing specific, local outcomes, but they need to be combined with a larger framework to reach big successes.

Ultimately, quantitative metrics are just another piece of data that can be used to guide decision-making for product design – you have to combine this with all the other bits of information to get it right.

Agree or disagree? Have more examples? Leave me a comment! 

Want more?
If you liked this post, please subscribe or follow me on Twitter.

Like this post?
Get new updates via newsletter..

  • mattmaroon

    Related to #2, hilariously about 15 minutes ago my cofounder and I were debating whether or not to test two things separately or together. It's always nice when I can fire up Google Reader and get a quick answer to the question.

  • http://blog.jimnovo.com/ Jim Novo

    Proof of the dangers in over-reliance on MVT, which touches several of your points above, is the idea of the local maximum:

    http://en.wikipedia.org/wiki/Maxima_and_minima

    If you have not formed a Strategy, if you don't really understand your Markets and Customers, how do you know you are choosing the optimal combination of ideas to test? Testing is proof of hypothesis, not ideation itself.

  • http://www.optimizeandprophesize.com/ jonathanmendez

    Great post Andrew. The qualitative areas (strategy & creative) and the process of testing are THE MOST important parts to ensure overall success. My experience has shown one of the best strategies to mitigate failure is by create a testing roadmap at the outset that 1) understands and quantifies each tests place in the overall success metric of the product 2) accounts how results will impact follow-on tests 3) encourages test cycles for iteration based on results. Every good test needs a plan and every good product needs a testing plan.

  • http://openambition.com peter zaballos

    You beg a larger question about the integrity of the design up front. A lot of what points 1-5 refer to can be the second order effects of not having developed and wrung out your overall experience, which includes understanding where you need to have consistency of experience. You can isolate the areas where, within the overall experience, you can optimize and improve, and then would argue you gain the benefit of testing without corrupting the integrity of the customer experience or the long term direction of your service/product..

    That said, your strategy and plan are only as good as the assumptions that underpin it, and therein lies another hazard you allude to. Your strategy is critical, and you need to adhere to it, up to the point where you have data telling you one or more of your underlying assumptions that frame it have changed. Then you need to be able to revise your strategy and plan, and move off the old one and onto this new one. I wrote a post about a similar situation with regards to how to handle the operating plan for a startup http://openambition.com/2008/12/09/why-the-numb

    It takes discipline, and I would argue experience, to be able to walk this line. Your A/B testing of a call-to-action may reveal that the value proposition in general needs to be assessed, that neither A nor B is the issue. What will prevent this from a tail-chasing excercise is having the discipline to construct a strategy and experience with clearly articulated metrics, and then measure the performance along the way, and understanding what affects an underlying assumption, and what affects a tactical source of performance improvement.

  • http://www.kickstartall.com Mary Sullivan

    The problem with customer hitchhiking comes from listening only to the customer's feature description and not bothering to understand what the customer really wants to accomplish. Some probing questions and genuine listening can help you design features/capabilities that are a fit with your design vision and yet produce something customers really want.

  • nick

    Great comment on a very thought-provoking post.

  • http://andrewchen.typepad.com Andrew Chen

    glad I could help ;-)

  • http://andrewchen.typepad.com Andrew Chen

    I think you should test the crazier thing ;-) Part of my post is about taking more risk with design, to reach global max outcomes, but also to take the time to make things consistent if it pans out

  • http://andrewchen.typepad.com Andrew Chen

    yep, totally agree with you – often when I've had conversations on this topic, I talk about hill-climbing and local versus global max, etc. Absolutely the right way to think about it.

    Great comment.

  • http://andrewchen.typepad.com Andrew Chen

    jonathan – you and I have also talked about the idea that by taking MORE design risk and testing more left-field candidates, that can actually lead to a better outcome. That might be true for a particular landing page, but also true for product features in general!

  • http://andrewchen.typepad.com Andrew Chen

    Yes, it's definitely a delicate tension – and very much a judgment call. The two extremes are clearly both bad:

    1) Go too far on strategy and upfront design – then you are potentially executing a plan without soliciting external input, and making a quick learning cycle into a long (and expensive one). Classic waterfall process failure.

    2) Go too far on metrics and optimization – then you're going towards a local maximum by being too sensitive to initial conditions and data.

    As you said, this is all about discipline and experience to figure out when it's time to go one way versus the other – both extremes are bad.

  • Vengroff

    That's a great way to talk about the process. Following the analogy further, in optimization theory, there are a number of well-established techniques designed to ensure you don't get caught in local minima all the time. Perhaps the most relevant here is simulated annealing (http://en.wikipedia.org/wiki/Simulated_annealing). In SA, you are initially willing to take large random steps, which is analogous to exploring a broad spectrum of strategies. Over time, as you zero in on some good candidate strategies, you shift to more traditional hill-climbing optimization.

    2.

  • http://www.optimizeandprophesize.com/ jonathanmendez

    indeed. the biggest risk is not being risky — no matter what you are testing.

  • http://www.cindyalvarez.com cindyalvarez

    A variation on “testing doesn't replace strategy” – it doesn't replace design, either.

    If you start with a fundamentally flawed concept you can't A/B test your way out of it. “Old-school” product design methodologies like persona analysis and interviews don't give you the same degree of quantitative data, but they give you a solid, -human-, view on what you need to build.

  • http://propercloth.com Seph

    Great post. Love #3 about quitting too early. I think a lot of good ideas are thrown out when the first rough implementation doesn't get traction right away.

  • http://www.icopartners.com/blog Diane

    Great post, it's particularly interesting for creative apps like games (which very often tend to fall in the opposite excess)

  • dl

    I think point 5 is a big one here…

    the networks and studios can right novels on this… you just stated what has been debated for almost a decade in Hollywood.

    and will continue to be debated in this new industry.

    I keep saying we have seen the growth pattern in the entertainment industry since the Hollywood sign was Hollywoodland.

  • http://www.optimizeandprophesize.com/ jonathanmendez

    cindy – i would argue those old school strategies are what often lead to the “customer hitchhiking” andrew mention in #4. it's only when you find the latent needs of your customer that you truly have hit on something important and there's only two ways to do that i know of. testing and observation in the wild.

  • http://www.cindyalvarez.com cindyalvarez

    So, I would agree with you that too few traditional product designers are truly “observing in the wild”. (I've seen user testing where only the “official question and answer” was written up – ignoring literally HOURS of customer insights about their general behavior and frustrations.)

    But what do you test before you've built something? If you have a competitor doing something similar, you can test their site or software. If you know an analogous task, you can test it. But if not, you do some persona analysis, you talk to some customers and you quickly prototype something for them to bang on. (This does NOT mean you ask the customer what they want – http://www.cindyalvarez.com/decisionmaking/appr… – but that you listen to what they're doing and what they want to be doing.)

    If you skip that first step, you could be so far off the mark that your testing doesn't tell you much. At most companies, most of the time, if you put a VCR out there and user-tested it, your customer feedback doesn't lead you to TiVo.

  • http://www.smartpoppy.com.au Mark

    I think A/B testing should be restricted to tactics. Is it better 2 or 3 columns? Green button or red button? The fundamental concept should still be well and truly in people's heads.

  • http://www.owenhodda.com Owen Hodda

    Great article. I think this highlights the need to never allow one part of your process to operate in a vacuum. Metrics and A/B testing alone will give little insight without other user needs analysis or user testing. In my experience, using one to prove/disprove the other is when really useful results are seen.
    I particularly like what is coming out of the other commets; A/B testing should be used to trial more radical and left of centre ideas rather than slight variations on the tried and true. Only if tried and true is first proven to be the best practice should you spend time refining that

  • MoWidgets

    Great reminder that analysis is a means to an end. It supports the process, it should not drive else we loose sight of the goal.

  • http://www.geckogo.com geckogo

    Awesome post as always Andrew. :) I agree, it’s super important to step back and think – what’s my overall objective? Am I trying to optimize the right things at this point in time? While A/B testing is a great way to refine something that is already got potential, I think the testing has to stem off a general hypothesis or product vision that includes a unique draw. There’s no way to literally optimize the heck out of everything and I agree things tend to converge and go vanilla pretty fast if you don’t keep things in check. (though I know this but still get caught doing it. :p)

    And I have an example of something that I’m glad we didn’t give up on. The first version of our home page flash map sucked pretty hard. It comprised “Best destinations for <activity> and <activity 2> in <month>. We had a popup with information, but nobody got it. It was such a different implementation that people kept telling us it was confusing. We were close to taking it off and going with something more standard. But luckily we stuck with this basic concept and just tried different ideas around it, and our current homepage country map is among our site’s most loved features.

  • http://www.linkedin.com/in/thesubjective thesubjective

    Well said.

  • http://www.merchantcircle.com BTS

    Andrew

    This is a great post. I am an investor, board member, or founder in a few companies driven by this metrics driven approach and am often conflicted on this issue. At its most basic level, i drive my teams to think about three things beyond the analytics:

    1. Where do we want to go ?

    2. How do we want to user to feel ?

    3. What are the highest beta (craziest ideas) we can add to the product regardless of the data ?

  • http://www.codebelay.com/blog/ Barce

    This is a really good question you raise. I think Greg at Tagged.com would have a very different answer, since he tried 9 A/B driven designs and picked the one with the best analytics. The month that happened Tagged popped from 1 million to 9 million, but that was 2 years ago.

    On the other hand, Amazon still runs A/B several times a day on its site. I would say that for products it works, but for services like EC2 it has really hurt its usability. And I love EC2, I just wish it'd be easier to get to the info I want.

  • http://iaaxpage.wordpress.com Iaax Page

    Great post, I think that strategy is keystone to entrepreneurs, and such strategy most be based on the opinions of experts such as paul Buchheit, and Alan Cooper, not to let the user become the designer of the application but to understand her in order to know what is it that she wants to accomplish, how it is useful for her and whether or not it fits into your strategy!

    Iaax Page

  • TH1977

    Thanks for addressing these thought provoking issues. Yet here's a larger question specific to marketing communications design: What's the overarching goal of a successful marketing program ( SEO, PPC, multi media, online, print, direct marketing, landing pages, whatever?) The client's perspective is likely that the goal of any mar com effort is to: generate leads, increase sales, draw in qualified prospects, help fill the prospect pipeline…basically help develop business. So it makes perfect sense, in the case of marketing design, to test the vehicle and obtain quantitative metrics to help accomplish such goals.

  • incolas

    Game developer Blizzard has been one of the first companies to listen to user feedback to create amazing experiences in their games. And that contributed to second to none success. Starcraft has been a top 20 best seller in PC games for the past 11 years, and World of Warcraft has been the absolute market leader for the past 4 years. Still, as a Starcraft fan back in the day, I could tell how gamers sometimes were mad at Blizzard for not listening to them. Because, yes, Blizzard is still the main driving force behind their games. When I read every day about metrics driven product development, I always try and keep in mind that Blizzard achieved huge success by listenning to gamers, but not necessarily by applying everything they suggested, whether vocally or behaviouraly (sorry for the bad word, but i'm sure you get it).

  • http://nsavides.wordpress.com Nick Savides

    Thanks for the post.

    I'm more of a design-minded guy who realizes that metrics are important to understand, so it is interesting to read about your take as a metrics guy who values good design. Metrics seem to be useful for figuring out what people are doing: what is popular and profitable right now. They are less useful for figuring out the why or the merit of the what. Metrics reveal that more people use Windows machines than Macs, for example, but the numbers alone do not explain which machine provides a more satisfying experience, and whether a more satisfying experience is even something worth pursuing.

    Even if you did a survey to figure this out, you would be asking people to make subjective judgments, and not all subjective judgments are equal. A trained composer would be able to better appreciate the nuances of a textured symphony in ways that the musically uninitiated could not. But, since there are more musically unsophisticated people in the world than world-class musicians, the metrics could sometimes stack up against the truly excellent in favor of the mediocre and more familiar.

    While our society likes to talk about the box office that movies did, most movie fans prefer movies created by a few individuals with highly refined skills and sensibilities rather than the market-tested products that are produced by metrics-minded bureaucrats. Put another way, metrics can help us profit from well-designed stuff, but they cannot, on their own, generate the great designs that capture our imaginations and our wallets.

  • popart

    “A camel is a horse designed by committee”
    : )

  • http://www.market-by-numbers.com Brant

    Metrics doesn’t replace strategy

    And process doesn't replace intuition. One reason large companies lose their cutting edge is because the replace risk with process. Process and metrics are important, but gut feel has its place. Ultimately, the “vision” is based on gut-feel. There's a fine line entrepreneurs must walk between pursuing the vision vs. following the customer. Louis Mumford (architect and sociologist), when asked where to put a sidewalk, said “watch where they walk and pave their path.” I guess in this analogy, the sidewalk is the vision, but it will not likely weave the exact path you originally thought.

    The power of process and metrics is not only that they help determine the path, but they can be used to halt failure. Both intuition and painting by numbers will lead to mistakes. But do you have phase gates that indicate failure? Track leading indicators that foretell problems? Mistakes are okay as long as you recognize them as soon as possible and don't repeat them.

    Love your blog!

    Brant

  • http://www.userinsight.com consumer product research

    Great post, i have learned so much in that and i think it will help me a lot.

  • renren
  • David Chu

    This may be an oversimplification, but this is the way I look at the difference between design and metric driven thinking.

    Let's say you have a process:
    A –> B –> C

    Design-Driven Thinking would ask:
    “How to I design out B to get a person from A –> C directly?”

    Metric-Driven Thinking would ask:
    “How do I get a person from A –> B faster and then from B –> C faster?”

    The main reason I think that business school don't focus on design is because the term leaves too much to interpretation. If you ask 10 people the definition of 'metrics' and 'design', you'll get a lot more agreement for 'metrics'.

    The other thing that holds design back IMO, is that most of the 'design' companies I see today are more focused on aesthetics than functionality. True design combines the two.

    That said, I'm not an expert and I'd really like to hear your opinions.

  • http://rexduffdixon.com/ RexDixon

    Have you uploaded any of your more successful A/B Tests to http://www.abtests.com/ ? Have you had a sec to check out Performable yet? ( http://www.performable.com/ ) – Do you think their services are on the right track?

  • http://rexduffdixon.com/ RexDixon

    Have you uploaded any of your more successful A/B Tests to http://www.abtests.com/ ? Have you had a sec to check out Performable yet? ( http://www.performable.com/ ) – Do you think their services are on the right track?

  • rococo911

    A/B testing doesn't necessarily lead to low quality products. I enjoyed Mario Puzzo's The Godfather 3 although I could notice some of the flaws you pointed out. I guess each film has its custom signs that reflect its quality and level of originality.

Want more? Featured essays and book recommendations