A/B Testing

A vs B: Becoming The Ultimate Testing Champion Recording & Highlights

This action-packed panel explores best practices for marketing experimentation, answers audience questions, and dives into real-life testing examples.

In case you missed it, PathFactory brought B2B marketing all-stars together for a scintillating live panel discussion on all things A/B testing. The panel features the brilliant minds of Mimi Rosenheim of Demandbase, Takeshi Young of Optimizely, Jesse Ariss of Vidyard, Suneet Bhatt of Crazy Egg, Mark Bornstein of ON24, and moderated by PathFactory’s VP of Marketing, Elle Woulfe. The action-packed hour explores best practices for marketing experimentation, answers audience questions, and dives into real-life testing examples. You can check out the panel recording here (and read highlights of the discussion below!):

Elle Woulfe: The fact is, only about a third of marketers are currently using testing technology. Why is that statistic so staggeringly low? I think in B2B marketing, everybody knows they should be testing. So why aren’t people doing it? 

“Only about a third of marketers are currently using testing technology. Why is that statistic so staggeringly low?”

Testing can be overwhelming because there’s a lot of different things you can test. But in today’s world where the B2B buyers are so consumerized and bring B2C considerations to their purchase, we have to deliver on these expectations. That makes testing even more important. 

According to Gartner, over 70% of marketers aren’t using or deploying AB tests, not even at all. Why is that?

Takeshi Young: That’s a pretty surprising statistic given how critical testing is to the digital marketing world. What may be holding some people back is that it really requires a change in culture to become a company in which testing is common practice. I will say that the companies that are succeeding in the digital economy… have really embraced experimentation as a core part of their cultural values. I think that’s what it takes to succeed. If you want a shot at competing against these companies, you need to start testing.

“It really requires a change in culture to become a company in which testing is common practice.”

Elle Woulfe: It’s often a top-down thing. It’s embraced at the cultural level from the highest level of management and becomes woven in the fabric of the organization.

Suneet Bhatt: We know people want to test. We’ve run some experiments ourselves. We’ve prompted marketers with an invitation to test and we’ve seen an 85% click-through rate where people had that idea and incentive and motivation to click through. By the time they get to the bottom of the test to actually launch it, we see a massive drop off. We know they want to, we know they see value in it. They’re not doing it.

We just published a blog post for this topic today. Testing is no different than taking your medicine, exercising, eating vegetables. We know it’s good for us, it doesn’t mean that we necessarily do it.

“Testing is no different than taking your medicine, exercising, eating vegetables. We know it’s good for us, it doesn’t mean that we necessarily do it.”

Suneet Bhatt: There’s this basic anxiety about taking that first step. I think as an industry, we’ve kind of failed our people. We have made it feel complicated and convoluted and heavily technical. We scare people with things like statistical significance. As a result, we’re sort of getting in their way. We’ve put up a lot of obstacles, there’s some human reasons, some psychological reasons, some technical reasons. 

“There’s just this basic anxiety about taking that first step… people don’t necessarily know what to do or how to do or how to take that first step.”

Elle Woulfe:  I think testing used to be a lot harder than it is today. There are a lot of tools that make it really easy today. People are just kind of intimidated and afraid. 

Jesse Ariss: That anxiety of the A/B testing happens all the time. I mean when I got invited to this webinar and I saw the word test, I thought I was going to have to do a test and I was a little bit nervous about that. But what I really want to focus on is that our company [Vidyard] and I’m sure most of the companies out there are only successful if the customer who uses the products are successful.

What are some key business questions A/B testing can help answer?

Jesse Ariss: A lot of folks say, “Let’s A/B test and see if we can squeeze two or three percent out of that conversion rate on whatever our goal is.” But for me, I think A/B testing is not so much about boosting that one or two percent on the conversion end. It’s more about boosting the user experience. Hopefully, by improving the user experience for one or two of our customers, the conversion rate will follow. We’re thinking about the customer and how they’re using the website or watching that video. It’s not a silver bullet. Our goal is to just get one or two percent better every single day.

“I think A/B testing is not so much about boosting that one or two percent on the conversion end. I think it’s more about boosting the user experience.”

Mimi Rosenheim: As an account-based marketing person at Demandbase, part of what I am looking at is how digital actually plays a role in account-based marketing? Especially when I’m measured on pipeline.

Vanity metrics like click-throughs and CTRs and CPMs don’t actually mean a whole lot to my management. When I’m thinking about testing I’m thinking about using control groups. I have A/B tests where I’m going to advertise to these accounts and not advertise to those accounts. So I can see what channels are actually influencing the deal cycle and the things that are most important to our management. I’ll be looking at messaging. I’ll be looking at the impacts of certain technologies or functionalities like personalization. The kinds of questions that I’m looking at for testing are much more business oriented.

Mark Bornstein: I think that really effective AB testing can completely transform the familiar adage: know your customer. Even some of the most sophisticated marketers that tend to think of the ICP or ideal customer profile as a narrow set of demographics, roles, responsibilities.  A/B testing is how we learn the tendencies and preferences of our prospects and customers. Which we don’t spend enough time doing. If done right, it can really transform how we market to them. Are your people morning social media people? Or afternoon social media people? Do they respond more to promotions that challenge the status quo? Or promise change? Do they respond more to plain text or HTML? Bullets or no bullets?

We need to get beyond the idea of simply understanding key business challenges and thinking that that’s somehow enough to successfully market to people. I think that through A/B testing we can really effectively learn our prospect’s preferences. We can learn about their habits. I think that is how we’re going to profoundly change the effectiveness of our marketing.

“A/B testing is how we learn the tendencies and preferences of our prospects and customers. Which we don’t spend enough time doing.”

How can you get started with A/B testing, despite feelings of inertia and self-consciousness?

Takeshi Young: There are a lot of different things you could be testing. I would start by looking at your most important business challenges or KPIs. That might be generating more leads. Or, for an eCommerce company, that might be decreasing shopping cart abandonment rate. Just start with a goal in mind. Then generate a hypothesis for different areas of the site that you can optimize those KPIs for. Looking into your data is a great way to provide clues for the hypothesis.

“There are a lot of different things you could be testing. I would start by looking at your most important business challenges or KPIs.”

At Optimizely we have brainstorming sessions for ideas on how to improve KPIs for specific areas of our site. Then we just pop them into program management and, from there, we prioritize the ideas that we want to test based on the potential of the idea. Things like: Likelihood that the test will succeed? The impact it could have if successful. The level of effort. How difficult it is to implement. 

Suneet Bhatt: When we talk about testing, we’re not just talking about A/B testing, that’s part of the problem. We’re talking about an experiment versus a control. As a result, I think the first thing you have to do is change your mindset, stop talking about a test, which has a measured outcome and a clear score at the end of it. And start talking about experimentation, which gives you the space to learn and make progress.

“I think the first thing we have to do is change our mindset… stop talking about testing and start talking about experimentation, which gives you the space to learn and make progress.”

Suneet Bhatt: I think the second thing you do is take the pressure off. It’s not about testing next week. It’s not about testing in two weeks. It’s about building the confidence to test. I don’t believe people are paying enough attention to their customers and to the data they have in a way that they can find inspiration.

Set 15 minutes on your calendar once a week and do nothing more than pay attention to your customers. Use heat maps and scroll maps and Google Analytics. Look at where people are clicking, look at where people are engaging and spending time. Set a baseline and get familiar with how your users are engaging with your website, your app, your product, whatever it might be.

Then a month out, you will be overflowing with ideas you want to test.  Then run the most obvious test you think you can run.  That’s how I would think about overcoming that inertia that all of us have faced at some point when it comes to testing.

Elle Woulfe: We do a lot of testing through our email communications, using our own platform, in terms of how we position different assets and different pieces of content. Even testing different ads using things like persona level personalization. Of course, we could always be testing more. But even the small things we’ve done have helped us improve what we’re doing and deliver an experience that’s better for the buyer. I love that.

“Even the small things we’ve done have helped us improve what we’re doing and deliver an experience that’s better for the buyer.”

Jesse Ariss: How many times have you been in a meeting with your content folks, your web folks and you’re looking at the copy. You say, “you know what? I don’t know about that, but let’s A/B test it.” And you never actually get around to it. This is one of the fundamental problems that we see. Instead of A/B testing being driven that way, let’s instead use the data that we have to identify what we should be A/B testing.

What’s a common mistake people make when A/B testing?

Jesse Ariss: There’s often this urge to test more than one thing at a time. That is the biggest no-no when it comes to A/B testing. You test one thing. If you move a button you sure as heck don’t touch the navigator bar. Or you sure as heck don’t change the thumbnail on that video. Because then that’s going to be completely skewed.

“There’s often this urge to test more than one thing at a time. That is the biggest no-no when it comes to A/B testing. You test one thing.”

Jesse Ariss: Then the next question is: how many people do you want to A/B test this to? What is statistically significant when you’re a B2B company with half-year lead cycles or sales cycles and 15 customers in the pipeline.? How do you determine statistical significance there?

So there are all these questions you have to figure out. It’s going to be unique for every industry. 

Elle Woulfe: And isolate the variable. That’s always very important. I see this go wrong many times. It’s like, “Well, we changed these three things.” I’m like, well, how do you know which thing is the thing that actually worked? You can’t do that. You got to pick one.

What are some examples of A/B tests that you or a customer ran that yielded interesting or unexpected results?

Mark Bornstein: For me, the most interesting results come from the testing we do on email sends. We go crazy.  I think where A/B testing fails,  is this idea of not being able to have control because there are too many factors. If you’re really good about having discipline and you A/B test very specific elements against each other, they could be really effective. We A/B test endlessly. Every single element of our email promotions. It’s so interesting to watch how preferences keep changing.

“It’s so interesting to watch how preferences keep changing.”

For instance, for a long time image banners tested much, much higher than plain banners. Then a few years ago, that completely changed. Now, emails with image banners don’t test as well for us and we’re using very plain banners. But I bet if we tested again in two years, that could slip again.

When you come up with a discovery you still have to keep testing it. Otherwise, you may get a surprise further on down the line.

Mimi Rosenheim: We’ve done a couple of different tests and I’m going to give you two:

We did a test around name-based personalization. We wanted to see the impact of personalization on the website. I should mention that inherent in our tests is making sure that we’re only testing to the people that we care the most about. Not to our entire set of session or site visitors. Within that test, we saw an increase of 85% of click-through when we did that personalization. So it was really important for us to figure out what that impact was.

Second, we wanted to do was see what the impact of advertising was in general. Even just awareness advertising on our target account list. We sequestered a percentage of the target account list and we blacked out advertising. We did everything else the same. This is an example of using an A/B testing methodology where you’re isolating a tactic, which was an entire digital channel. We saw was a 45% increase in pipeline for the audience that had the advertising. Versus the audience that didn’t. Not just that, but the deal sizes were significantly larger as well.

Those are two methods you can use where you can take something really, really simple, and test it. And do it quickly. If you’re going to start testing just put a line in the sand, pick something really easy that doesn’t require a ton of development. Something super, super low bar. So that it gives you that practice. Then you can get more and more sophisticated.

“If you’re going to start testing just put a line in the sand and pick something really easy… So that it gives you that practice. Then you can get more and more sophisticated.”

Jesse Ariss: We do a lot of video. With our product, you can A/B test thumbnails for videos, which is really cool. So obviously we do that because everybody uses their own product. We did an ABC test on our landing page for our website. We tested three thumbnails. We had a cute guy (I would consider him cute.) We had a dog that was just by itself. And we had a screenshot of the actual product itself. Cute guy, dog, screenshot of the product. Which one do you think outperformed the others by 15%? Any guesses?

Mimi Rosenheim: Dog.

Elle Woulfe: I mean everyone at PathFactory would say dog.

Jesse Ariss: Nailed it, dog. Dog every time. Just dog.

Suneet Bhatt: In terms of tests, we’ve started heavily testing churn. We created a churn funnel, with a company called Brightback. We are A/B testing offers to people, so when they click to cancel, we ask them for a reason why. Then we’re A/B testing offers as to why. We’ve been able to decrease churn by offering anything from name your price, to we’ll do it for you, to discounts. That’s something new and it’s been fun to test. For a company that’s been around for 14 years, we have a lot of legacy customers that we need to find a way to catch. 

Takeshi Young: One example from Optimizely that was pretty surprising for us was a really small change that we made on our site that had a pretty big significant impact on our bottom line. The test that we did was basically just change the CTA on our contact sales buttons. So obviously as a B2B company, getting people to contact sales is a really high-value action that users can take on our site. We tested seven different CTAs, including “contact sales”, “talk to us”, and “contact us.” We found that the one that actually ended up getting the most clicks was “get started”, which got 129% more clicks. So more than double! Beyond just clicks on the CTA, we saw a 40% increase in actual contact sales submissions. Just by changing some of the copy on the form as well.

“We found that the [CTA] that ended up getting the most clicks was “get started.” Which got 129% more clicks.”

At Optimizely we typically recommend being bold and testing big changes by running experiments. You’re more likely to reach statistical significance. But sometimes even just changing a few words on your site can have a massive impact on your results.

What are some best practices around documenting tests so they can be easily shared and repeated?

Takeshi Young: I’m going to be tooting our own horn a little bit here, but Optimizely has a product called Program Management. It encompasses the whole experimentation process. It’s where you store all of your experiment ideas and record results from experiments. Everything is collected in one easy place so that you can access it later and easily share the learnings. So it’s not just a bunch of Google Docs or a bunch of Microsoft Word documents sitting on your computer. Everything’s organized in one place.

Mimi Rosenheim: I’m a big fan of spreadsheets like Google Spreadsheets and pivot tables. A lot of our tests tend to fall into different categories. You can create a data page that has what you test and some of the parameters. Then you can either filter or create a pivot table. If you don’t have a technology that helps you manage and orchestrate those tests, you can start to create a knowledge base. It helps to think about your tests in terms of the repeatable structures. Even if one is a really small CRO test and another is audience testing, they have some commonalities.

Make sure you think about the timing too because what you can then do is say, “well, you know what? That was a year ago, things have changed. Maybe we want to retest that.” Or, “let’s test it in a different season because seasonality might make a difference.” Like end of a quarter versus the beginning of a quarter.

“Make sure you think about the timing too because what you can then do is say, “well, you know what? That was a year ago, things have changed. Maybe we want to retest that.””

Suneet Bhatt: I think the important thing is to pay attention to how your company communicates. So, in the absence of a great tool like Optimizely, pay attention to how your company communicates and focus on getting out your surfboard and riding those communication channels.

I would focus on two things. One, think about what is the system of record for engineering information or requirements. Or that handoff between the business and engineering. 

The second thing I would do is evangelize it. Create a Slack channel or send a weekly email with the tests you’ve run. That way you’re making sure you’re storing it in a place that people commonly look and refer to. And ensureing that gospel spreads across the company. 

“Evangelize [your A/B tests]. Create a Slack channel or send a weekly email with the tests you’ve run… so that gospel spreads across the company.”

Mark Bornstein: When you share the victories you have and can show the differences your tests make, it not only gets people excited about it, then everybody starts to come up with their own ideas and things start to roll. So I’m also a big fan of making sure that you evangelize the results you’re getting from tests.

Elle Woulfe: Yeah, agreed. I think that’s how you start to create a culture of experimentation. I am a big believer that internal communication is as important as external communication. When marketing gets to present at company all-hands, we’ll share test results with the whole company. This way they know that we have a testing culture and it’s something we routinely do. Which I think helps them to feel a lot more confident in marketing’s approach.

When you have a laundry list of things you want to test, how do you prioritize?

Suneet Bhatt: Intercom has a great model and a formula that we have adopted at Crazy Egg. It’s easy to understand. It’s called the RICE: reach, impact, and confidence as the numerators. And effort as the denominator. Reach is the number of people that you will reach. Impact is some quantifiable metric. Confidence is whether you believe you’re going to be able to do it. Effort is how many days it’s going to take to actually execute the test.

Jesse Ariss: One thing we haven’t talked about is if it’s not broke, maybe you don’t need to fix it. It could be an unpopular opinion. But if you’ve got a high performing page or call to action, then you’ll probably want to focus your testing on something else, rather than focusing on that. Because even though you might be able to squeeze two or three percent out of it because it’s an experiment, you might actually lose 10%.  In terms of choosing where to start, focus on those quick wins on those underperforming CTAs or events.

“In terms of choosing where to start [with A/B testing], focus on those quick wins on those underperforming CTAs or events.”

Mimi Rosenheim: When you’re doing testing, I think it’s really important to make sure that you are focused on that primary goal, like CTR. But also what’s the downstream impact of it? Make sure you’re focused on that too. Are you delivering on expectations? Are you delighting them? That’s sort of what you want to be doing the entire time.

How often should you repeat tests?

Mimi Rosenheim: I don’t do as much time-bound testing as I do audience testing. How is my mid-market audience reacting versus my enterprise audience? Or what is my customer segment doing versus my prospect segment? It’s more important for me to figure out those things than seasonality.

Takeshi Young: We do more sequential testing rather than kind of time-based testing. Rather than just running a test once and then moving on to the next test and then revisiting it six months down the line, we tend to run an experiment. If it doesn’t work, we’ll do a follow-up experiment to test out an alternative hypothesis or try to figure out why the first one didn’t work. If that doesn’t work, we’ll run another test to follow up on that one.

“Rather than just running a test once and then moving on to the next test and then revisiting it six months down the line, we tend to run an experiment.”

Mark Bornstein: We believe in running multiple tests on a single hypothesis. Then we retest that hypothesis over a period of time. It’s not a very long period of time. It’s usually six months at the most.

Are there any results that you can get from a test that are bad results?

Jesse Ariss: I think you folks know the answer to this question. But there is no such thing as a bad result. Any data is going to be valuable data. It’s all about setting the expectation and then seeing if the conversion drives down the line.

“There is no such thing as a bad result. Any data is going to be valuable data.”

Suneet Bhatt: There is bad data and bad results. For example, if you have data but you didn’t have a hypothesis against which to measure it, then I think you haven’t really learned. Another example is a situation when you let a test outrun its conclusion. Sometimes people let tests run for the sake of statistical significance. I think there are times where you don’t necessarily need to do that. There are obvious and clear winners, you can accelerate testing by not worrying so much about that.

Mimi Rosenheim: I think there’s no such thing as bad data. I think there are poorly constructed tests. And I think there are people with an agenda who interpret the results poorly. I think it’s incumbent upon marketers to make sure that we are very diligent about understanding what the tests were, how it works, and whether there was a little bit of editorialization in terms of presenting the data. You can pretty much make data say anything you want it to. If you’re a good Jedi storyteller.

“It’s incumbent upon marketers to make sure that we are very diligent… You can pretty much make data say anything you want it to. If you’re a good Jedi storyteller.”

What’s the biggest mistake people often make with A/B testing?

Jesse Ariss: The biggest mistake would be testing too many things at a time, we know that. When we say test one thing at a time, that doesn’t necessarily mean test one CTA piece of copy. As I mentioned, you can ABCDEFG test. But as long as your testing just that one variable, you’re fine.

Mimi Rosenheim: Not knowing what question you’re trying to answer and what problem you’re trying to test. Because if you don’t understand that, not only will you not know what variables to test, but may not have the right instrumentation set up in order to get your results.

Mark Bornstein: I think the biggest mistake people make is making broad assumptions based on a single idea. And testing a single idea in a single way. You need to really think about all the different factors that go into what could be any sort of answer.

“I think the biggest mistake people make is making broad assumptions based on a single idea.”

Takeshi Young: One big mistake that I see often is just not running experiments that are big enough. Obviously, when we look at these market leaders like Google and Amazon, these guys can change the color of their links and see statistically significant results. But for most of us who aren’t getting hundreds of millions of visitors a day, especially in the B2B space, you need to make changes that are bigger in order to see a statistical significance. I would say test bigger things.

“But for most of us who aren’t getting hundreds of millions of visitors a day, especially in the B2B space, you need to make changes that are bigger in order to see a statistical significance. I would say test bigger things.”

Suneet Bhatt: The biggest problem I see is that people don’t ask or start with the right question. If you make the test about the right question, which is founded in the right data, founded in the right hypothesis, I think the experiments flow naturally. They become “the what” to that “why.” 

Have any questions about A/B testing? Ask them in the comments below!