Thursday, February 14, 2013

Upgrade To The Next Generation Mobile App Analytics Platform

Spring is coming right around the corner and it's never a bad idea to get a head start on your spring cleaning! The same goes for your mobile app analytics solution. Have you switched over to our new Mobile App Analytics Platform (v2) yet? The benefits are numerous:

A more powerful mobile SDK

  • We are providing a new mobile app analytics solution, solving the problem that there is currently no single repository to understand end-to-end value of mobile app users. This is supported by a more powerful mobile SDK (v2.0) that is easy to implement.

“One stop shop” for app measurement

  • Understanding app performance holistically through acquisition, engagement and outcome is critical to improve mobile app results, optimize user engagement and increase revenue generated. Our new reports show the entire lifecycle. 

Improve ROI and engagement

  • App developers and brands can make better, more comprehensive data-driven decisions for mobile investments with better reports. For example, marketers can optimize their mobile programs to improve ROI and app developers can improve in-app engagement.  

Though we call it “Version 2,” the truth is that we didn’t just “upgrade” our original platform. We decided to rebuild the whole thing from scratch. Our team at Google Analytics has reimagined mobile app analytics and have created a brand new experience tailored specifically for mobile app developers, providing reports on the data you care most about, in the language you understand. In addition, we also completely rebuilt our Android and iOS SDKs to be even more lightweight, efficient, and faster.

We’re continuing to build and add features to this platform all the time. So if you haven’t migrated yet, now is the perfect time to do it and find out exactly what you're missing out on. 

To make it easy to migrate, we’ve put together a Migration Guide for Android and iOS to help you make the move.

And if you’re new to Mobile App Analytics, check out the Getting Started Guide for Android or iOS to get your app up and running with Google Analytics in minutes.

Posted by Calvin Lee, Google Analytics Team

Wednesday, February 13, 2013

Multi-Currency E-Commerce Support In Google Analytics

We’ve listened to your feedback and have heard the requests loud and clear: E-Commerce should support multiple currencies. We’re pleased to announce the launch of this feature which will be rolling out to all users over the next few weeks. 

Multi-Currency support for eCommerce provides Google Analytics users with the ability to track transaction metrics (total revenue, tax, and shipping & handling) in multiple local currencies within a single web property. And Google Analytics will convert them into the one currency based on your profile setting. This provides key benefits for e-commerce brands looking to conduct analysis across an international customer base and helps make some previously complex reporting easier.

New metrics supported in multi-currency

Multi-Currency tracking code implementation:

Multi-currency is supported by web tracking and Android SDK (iOS SDK support is coming soon).

The ‘currency code’ is a global setting that can be set via tracker ‘_set()’. It only need to be set once, unless the page is sending multiple transactions in separate currencies.

Here are a few other things we think you’ll want to know:

How the conversion rate is decided?

The conversion rate is pulled from currency server which is serving Google Billing. The value is the daily exchange rate of the day before hit date. See a technical overview for additional information.

Which currency does GA support?

We support currencies which are available in GA profile currency dropdown menu. Right now, 31 currencies altogether are supported.

Currency dropdown menu

Which currency code shall I use?

A full version of currency codes shared across Google products is available on the Google Developers site.

Can I retro-process my history transaction data?

Only from the day you started using multi-currency support, you can get both local and global value. 

Several companies have already started using Multi-Currency in Google Analytics and are seeing great results. One of our Certified Partners, Blast Analytics & Marketing, helped their client implement this feature. David Tjen, Director of Analytics at reports:

"Google Analytics' new multi-currency feature increases sales metric accuracy for As an international brand, the AllPosters family of sites supports 20 currencies across 25 countries. Previously, manual adjustments were required before we could read sales metrics in Google Analytics when we had transactions with large currency conversion ratios to the US dollar, such as the Mexican Peso and Japanese Yen. The simple code update solves the issue by automatically converting all transactions to the primary currency on each site, providing sales metrics that allow us to make faster decisions with our web analytics data.”

To get started today, view our help center page with detailed instructions on how to begin.

Posted by Wayne Xu, Google Analytics team

Monday, February 11, 2013

Verify Your Measurement Setup With Tag Assistant

Google Analytics is, at its core, a simple and powerful tool. But once you start to customize the code to take advantage of all the flexibility available you may find yourself needing some help troubleshooting a nagging issue. 

A new Chrome Extension created by engineers here at Google hopes to make troubleshooting tag installs much easier. Tag Assistant aims to highlight errors, warnings, and provide useful suggestions for Google's most widely adopted tags including Google Analytics, Google Tag Manager, Adwords Conversion Tracking, the new Remarketing Tag, Trusted Stores and Floodlight. 

After installing the extension, Tag Assistant will alert you if tags are found on any page you are currently browsing. For each tag we will tell you if it appears to be working or if we notice any problems with your implementation. Tag Assistant will even make recommendations on how to improve your installation if we notice any optimizations. For example, if you have 2 or more tags implemented separately we might suggest that you migrate to use Google Tag Manager instead. 

How does it work? Tag Assistant looks for errors in two different ways. First, we check the source code to look for common errors like forgetting to include a closing </script> tag. We also review the HTTP headers to ensure that we are getting expected responses. 

Since launching in October of 2012 we have collected a lot of your feedback and have added dozens of new checks. Over the course of the year we will be adding more checks that will make the Tag Assistant more accurate and helpful. 

We encourage you to try it out for yourself by installing it via the Chrome Web Store. If you have feedback on new checks to add or if you have questions about the tool, join our Google+ community where our team and users can help you out.

Posted by Geoff Pitchford, Google Tag Assistant PM

Friday, February 8, 2013

Google Tag Manager: Implementation webinar video, cheat-sheet, and Q&A

Last Tuesday, we held a webinar on the technical implementation of Google Tag Manager, a free tool that makes it easy for marketers to add and update website tags, freeing up webmaster time while providing users with more reliable data and insights. This technical session includes a more in-depth look than our introductory webinar, illustrating how the product operates in a live environment and showing how flexible Google Tag Manager is for enterprise systems.

Watch the webinar video here for:

  • Step-by-step implementation process + live product demo

  • Advanced use cases, including the Data Layer API

  • Best practices and common pitfalls

And don’t forget to download our handy implementation Cheat-Sheet, which outlines each of the steps involved in migrating onto Google Tag Manager.

Click here to download the Implementation Cheat-Sheet:

And as usual, we like to provide a recap of some of the top questions we received during the webinar. Please note that this webinar is intended for technical audiences, so some of the Q&A below gets into the nitty-gritty technical details. If you’re less experienced technically, we invite you to check out our forum or reach out to one of our certified partners for implementation assistance.

Questions and Answers

Where can I find more detailed information about all of this stuff?

In addition to the walkthrough we provide in the webinar and our Cheat-Sheet, you can find a detailed description of the implementation process in the Google Developer docs, and helpful articles about how to use the Google Tag Manager user interface in our Help Center, including some notes about what to think about before you begin implementing. And as noted above, if you still have questions, check out our forum or reach out to one of our certified partners for implementation assistance.

Where can I place the GTM snippet? Can I put it in <head>? Does placing it in the footer have any adverse effects? Can I place the data layer in <head>?

The recommended location for the GTM snippet is just after the opening <body> tag. The only exception to this would be in the case where you want to declare page-level metadata by declaring the data layer immediately above the GTM snippet.

The GTM snippet can be deployed later in the page, like the footer, but doing so increases the time before the snippet loads. This can cause incremental amounts of data loss, since the user could navigate away before all your tags finish loading.

We do not recommend placing the GTM snippet in head, because the GTM snippet contains an <iframe> for the <noscript> case. Iframes are not officially supported by any browsers in <head> and might cause unexpected behavior.

What should I do about collecting macros and tagging events if I don’t have access to my client’s site or if IT is too busy?

If you can’t access values on the page via the data layer, there are several different Macro types to help you capture data without needing a code change. These include DOM element, DOM attribute, and JS variable macros. Simply input the ID or variable names, and the macro will pull out the data for you. NOTE: If you go this route, you may want to accompany the tag being fired with an “{{event}} equals gtm.dom” rule. This makes sure the element has loaded in the page before you request it, so you don’t get an undefined macro value.

If you're trying to add events to the page, currently this requires code changes. We're working on a solution that doesn't need code changes, but in the meantime we've heard of a couple of folks using the Custom HTML template to inject the dataLayer.push() API into relevant parts of the page. We can’t guarantee this as a solution due to the asynchronous nature of tag loading in Google Tag Manager, but we have heard some success stories.

How do I do cross-domain tracking in Google Analytics using Google Tag Manager?

It's now possible to do cross-domain tracking in GA using the custom HTML template and a new track type within the Google Analytics tag template. We've got some exciting things in the works here to make cross-domain tracking even easier; stay tuned for more soon.

Do you have any account and container setup best practices? What if I’m an agency? What if I have separate sites for mobile and desktop?

In general, an account should be owned by a single advertiser or publisher. Within each account, there can be multiple containers, and containers should be split according to how the site or sites are managed. For instance, if there’s a separate marketing team managing different countries and therefore probably different tag vendors, then there should be a separate container per country. If you have a mobile site and a desktop site that use the same tags across both subdomains, then you should probably only use a single container. We have found that one container per domain is pretty standard, but there are always different situations that call for a different setup.

If you’re an agency, we strongly recommend that your client creates the initial Google Tag Manager account and container, and then have your client add you to the container. Google Tag Manager includes user permissions controls as well as multi-account access to make it easier for agencies and clients to work together.

Are all tags with document.write off limits? Are there any workarounds?

Most tags that utilize document.write are just trying to construct an image pixel with dynamic parameters using JavaScript. Luckily, our Custom Image Tag allows you to construct an image pixel with dynamic parameters. Look at the tag you’re trying to add, pick out the URL, paste it into the Image URL field, and then add any dynamic variables by using the {{macro}} syntax. See the live demo in the webinar video above for an example of how to do this.

Do not add tags that contain document.write in either the initial snippet or in any linked JavaScript. Doing so will cause undesirable effects.

How do Google Analytics events differ from Google Tag Manager events?

Events in Google Tag Manager are basically an indication that this is an event where other tags could fire. It does not collect any data. GTM events are used in tag firing rules to initiate the placement of other tags.

Google Analytics events are actually data events, and can be set up in Google Tag Manager via the Google Analytics template, tracking type “Event”. This tag sends data to Google Analytics to be reported on within the Google Analytics interface.


We hope the webinar and Q&A will help you implement Google Tag Manager smoothly and easily -- many business, including GoPro, are already enjoying easier tagging. Keep watching this blog for more tips and tricks!

Monday, February 4, 2013

Win moments that matter in 2013 with Learn with Google webinars

A version of the following post originally appeared on the Inside AdWords Blog.

What was your business’ New Year’s resolution, and how do you plan to keep it? At Google, ours is to help make the web work for you. Our new series of Learn with Google webinars will teach you how to use digital to build brand awareness and give you the tools you need to drive sales. By tapping into technology that works together across your business needs, you can resolve to win moments that matter in 2013.

Check out our upcoming live webinars:

Build Awareness

02/12 [Multiscreen] Brand Building in a Multiscreen World

02/20 [YouTube] How to Build your Business with YouTube Video Ads

03/05 [Social] How to Use Google+ and Make Social Work for You

03/12 [Mobile] Understanding Mobile Ads Across Marketing Objectives

03/27 [Wildfire by Google] The Call for Converged Media

Drive Sales

02/07 [Search] Your Shelf Space on Google: Get Started with Google Shopping

02/26 [YouTube] From Awareness to Sales: Making the Most of Video Remarketing

02/27 [Search] What's New and Next in AdWords

03/06 [Display] Biggest Loser: Digital Ad Spend Edition

03/13 [Mobile] The Full Value of Mobile

03/20 [Display] Getting Started with Dynamic Remarketing

Visit our webinar site to register for any of the sessions and to access past webinars on-demand. You can also stay up-to-date on the schedule by adding our Learn with Google Webinar calendar to your own Google calendar to automatically see upcoming webinars.

During our last series of webinars, attendees had the chance to win a Nexus 7. Our lucky winner was Donella Cohen, who is happily enjoying her new tablet. Check out our upcoming webinars for another chance to win!

Learn with Google is a program to help businesses succeed through winning moments that matter, enabling better decisions and constantly innovating. We hope that you’ll use these best practices and how-to’s to maximize the impact of digital and grow your business. We’re looking forward to seeing you at an upcoming session!

Posted by Erin Molnar, Learn With Google

Friday, February 1, 2013

Optimize Your Website with SiteApps and GA

Google Analytics excels at collecting an incredible amount of information about how visitors interact with the web and mobile properties of its users. This data provides marketers and analysts who know what they’re looking for with with an incredibly powerful platform to understand what’s working and what’s not. To those who aren’t sure what they’re looking for though, all of this information can be overwhelming and make it easy to take no action at all.

SiteApps enables businesses to get instantaneous, free recommendations on how to optimize their website based on their Google Analytics data. SiteApps’ technology runs hundreds of automated analyses on its customers’ web data to identify opportunities for improvement. Based on these tailored recommendations, SiteApps then enables businesses to install apps from their marketplace to help solve these problems.

One of SiteApps’ customers is a family-owned home furnishings designer that was having difficulty maintaining their eCommerce presence while still focusing on the day-to-day operations of their brick and mortar retail store.  Within minutes of signing up for SiteApps, they were able to identify dozens of opportunities for site optimization. By installing the apps that were recommended to them, they were able to create a compelling web presence that increased their conversion rate by 108% and led to 65% more time spent on site by its visitors.  This led to a substantial increase in revenue for the business simply by unlocking the power of their web analytics data.

Our business is completely based on data. It’s incredibly important to us that customers know - or learn - just how valuable their data is,” says Phillip Klien, co-founder of SiteApps. “We consider Google Analytics the foundation for our platform and use the results to help customers make the most of the data their website produces.”

SiteApps is free to try and takes a matter of minutes to set-up.  Give it a try today to see what you can uncover from your web analytics.

Posted by the Google Analytics team

Monday, January 28, 2013

Dashboards, Advanced Segments, And Custom Reports For Your Business Needs

We’ve heard you loud and clear that getting started on Google Analytics can be challenging. It’s such a robust tool with a variety of reports, filters, and customizations that for a new user it can be overwhelming to figure out where to look first for the data and insights that will enable you to make better decisions. For more advanced users it can be time consuming to build out different variations of reports and dashboards to get the clearest snapshot of your performance. That is why we’ve created the Google Analytics Solution Gallery.

The Google Analytics Solution Gallery hosts the top Dashboards, Advanced Segments and Custom Reports which you can quickly and easily import into your own account to see how your website is performing on key metrics. It helps you to filter through the noise to see the metrics that matter for your type of business: Ecommerce, Brand, Content Publishers. If you're not familiar with DashboardsAdvanced Segments and Custom Reports, check out these links to our help center for detailed descriptions on how they work and the insights they can help provide.

Solution examples

Here are a few examples of the solutions that you can download into your account to see how the analysis works with your data.

  • Social sharing report - Content is king, but only if you know what it's up to. Learn what content from your website visitors are sharing and how they're sharing it. 

  • Publisher dashboard - Bloggers can use this dashboard to see where readers come from and what they do on your site.

  • Engaged traffic advanced segment - Measure traffic from high-value visitors who view at least three pages AND spend more than three minutes on your site. Why do these people love your site? Find out!

How do I add these to my account?

We’ve designed it so it’s easy to get started. Simply go to the Google Analytics Solution Gallery, pick from the drop drown menu the solutions that will be most helpful for your business. Select from Publisher, Ecommerce, Social, Mobile, Brand, etc.. . Hit “Download” for the solution you want to see in your account. If you are not already logged into Google Analytics we’ll ask you to sign in. Then you’ll be asked if you want to accept this solution into your account and what Web Profile do you want to apply it to. After you select that it will be in your account and your own data will populate the report.

We’re planning on expanding on this list of top solutions throughout the year so be sure to check back and see what we’ve added. A big thank you to Justin Cutroni & Avinash Kaushik for supplying many of the solutions currently available.

Posted by Ian Myszenski, Google Analytics team

Friday, January 25, 2013

Digital Analytics Association Awards Are Back

It’s that time of year again - award season. No, not Hollywood awards, Digital Analytics awards! 

The Digital Analytics Association has announced its list of nominees for the DAA Awards of Excellence. These awards celebrate the outstanding contribution to our profession of individuals, agencies, vendors and practitioners.

This year we’re honored to be nominated for two awards.

Google Tag Manager has been nominated for New Technology of the Year. Launched in October 2012, Google Tag Manager has helped many companies simplify the tag management process.

Google, as an organization, has been nominated in the category Agency/Vendor of the year. 

We’re incredibly humbled by these nominations - thank you. Our goal is to provide all businesses with the ability to improve their performance using data. We’re excited to be part of this community and we look forward to an even more amazing future.

In addition, a few Googlers have been nominated for individual awards:

Eduardo Cereto Carvalho and Krista Seiden have been nominated for Digital Analytics Rising Star.

Our Analytics Advocate, Justin Cutroni and our Digital Marketing Evangelist, Avinash Kaushik, who travel the world sharing Analytics love have each been nominated as Most Influential Industry Contributor (individual).

If you’re a DAA member make sure you vote by February 6. Winners will be announced at the 2013 DAA Gala in San Francisco on April 16. Tickets are available now.

Posted by the Google Analytics Team finds that traditional conversion tracking significantly undervalues non-brand search

The following post originally appeared on the Inside AdWords Blog.

Understanding the true impact of advertising

Advertisers have a fundamental need to understand the effectiveness of their advertising. Unfortunately, determining the true impact of advertising on consumer behavior is deceptively difficult. This difficulty in measurement is especially applicable to advertising on non-brand (i.e. generic) search terms, where ROI may be driven indirectly over multiple interactions that include downstream brand search activities. Advertising effectiveness is often estimated using standard tracking processes that rely upon ‘Last Click’ attribution. However, ‘Last Click’ based tracking can significantly underestimate the true value of non-brand search advertising. This fact was recently demonstrated by, a leading travel brand, using a randomized experiment - the most rigorous method of measurement.

Experimental Approach recently conducted an online geo-experiment to measure the effectiveness of their non-brand search advertising on Google AdWords.  The study included offline and online conversions.  The analysis used a mathematical model to account for seasonality and city-level differences in sales.  Cities were randomly assigned to either a test or a control group. The test group received non-brand search advertising during the 12 week test period, while the control group did not receive such advertising during the same period. The benefit of this approach is that it allows statements to be made regarding the causal relationship between non-brand search advertising and the volume of conversions - the real impact of the marketing spend.

Download the full case study here.


The results of the experiment indicate that the overall effectiveness of the non-brand search advertising is 43% greater1 than the estimate generated by’s standard online tracking system.

The true impact of the non-brand search advertising is significantly larger than the ‘Last Click’ estimate because it accounts for

  • upper funnel changes in user behavior that are not visible to a ‘Last Click’ tracking system, and

  • the impact of non-brand search on sales from online and offline channels.

This improved understanding of the true value of non-brand search advertising has given the opportunity to revise their marketing strategy and make better budgeting decisions.

How can you benefit?

As proven by this study, ‘Last Click’ measurement can significantly understate the true effectiveness of search advertising. Advertisers should look to assess the performance of non-brand terms using additional metrics beyond ‘Last Click’ conversions. For example, advertisers should review the new first click conversions and assist metrics available in AdWords and Google Analytics. Ideally, advertisers will design and carry out experiments of their own to understand how non-brand search works to drive sales.

Read more about AdWords Search Funnels

Read more about Google Analytics Multi-Channel Funnels

-- Anish Acharya, Industry Analyst, Google; Stefan F. Schnabl, Product Manager, Google; Gabriel Hughes, Head of Attribution, Google; and Jon Vaver, Senior Quantitative Analyst, Google contributed to this report.

1 This result has a 95% Bayesian confidence interval of [1.17, 1.66].

Posted by Sara Jablon Moked, Google Analytics Team

Thursday, January 24, 2013

Increasing Your Analytics Productivity With UI Improvements

We’re always working on making Analytics easier for you to use. Since launching the latest version of Google Analytics (v5), we’ve been collecting qualitative and quantitative feedback from our users in order to improve the experience. Below is a summary of the latest updates. Some you may already be using, but all will be available shortly if you’re not seeing them yet. 

Make your dashboards better with new widgets and layout options

Use maps, devices and bar chart widgets in order to create a perfectly tailored dashboard for your audience. Get creative with these and produce, share and export custom dashboards that look exactly how you want with the metrics that matter to you. We have also introduced improvements to customize the layout of your dashboards to better suit individual needs. In addition dashboards now support advanced segments!

Get to your most frequently used reports quicker

You’ll notice we’ve made the sidebar of Google Analytics even more user-friendly, including quick access to your all-important shortcuts:

If you’re not already creating Shortcuts, read more about them and get started today. We have also enabled shortcuts for real-time reports, which allows you to set up a specific region to see its traffic in real-time, for example.

Navigate to recently used reports and profiles quicker with Recent History

Ever browse around Analytics and want to go back to a previous report? Instead of digging for the report, we’ve made it even simpler when you use Recent History.

Improving search functionality

Better Search allows you to search across all reports, shortcuts and dashboards all at once to find what you need.

Keyboard shortcuts

In case you've never seen them, Google Analytics does have some keyboard shortcuts. Be sure you’re using them to move around faster. Here are a few useful ones:

Search: s , / (Access to the quick search list)

Account List: Shift + a (access to the quick account list)

Set date range: d + t (set the date range to today)

On screen guide: Shift + ? (view the complete list of shortcuts)

Easier YoY Date Comparison

The new quick selection option lets you select previous year to prefill date range improving your productivity to conduct year over year analysis.

Export to Excel & Google Docs 

Exporting keeps getting better, and now includes native Excel XSLX support and Google Docs:

We hope you find these improvements useful and always feel free to let us know how we can make Analytics even more usable for you to get the information you need to take action faster.

Posted by Nikhil Roy, Google Analytics Team

Wednesday, January 23, 2013

Multi-armed Bandit Experiments

This article describes the statistical engine behind Google Analytics Content Experiments. Google Analytics uses a multi-armed bandit approach to managing online experiments. A multi-armed bandit is a type of experiment where:

  • The goal is to find the best or most profitable action

  • The randomization distribution can be updated as the experiment progresses

The name "multi-armed bandit" describes a hypothetical experiment where you face several slot machines ("one-armed bandits") with potentially different expected payouts. You want to find the slot machine with the best payout rate, but you also want to maximize your winnings. The fundamental tension is between "exploiting" arms that have performed well in the past and "exploring" new or seemingly inferior arms in case they might perform even better. There are highly developed mathematical models for managing the bandit problem, which we use in Google Analytics content experiments.

This document starts with some general background on the use of multi-armed bandits in Analytics. Then it presents two examples of simulated experiments run using our multi-armed bandit algorithm. It then address some frequently asked questions, and concludes with an appendix describing technical computational and theoretical details.


How bandits work

Twice per day, we take a fresh look at your experiment to see how each of the variations has performed, and we adjust the fraction of traffic that each variation will receive going forward. A variation that appears to be doing well gets more traffic, and a variation that is clearly underperforming gets less. The adjustments we make are based on a statistical formula (see the appendix if you want details) that considers sample size and performance metrics together, so we can be confident that we’re adjusting for real performance differences and not just random chance. As the experiment progresses, we learn more and more about the relative payoffs, and so do a better job in choosing good variations.


Experiments based on multi-armed bandits are typically much more efficient than "classical" A-B experiments based on statistical-hypothesis testing. They’re just as statistically valid, and in many circumstances they can produce answers far more quickly. They’re more efficient because they move traffic towards winning variations gradually, instead of forcing you to wait for a "final answer" at the end of an experiment. They’re faster because samples that would have gone to obviously inferior variations can be assigned to potential winners. The extra data collected on the high-performing variations can help separate the "good" arms from the "best" ones more quickly.

Basically, bandits make experiments more efficient, so you can try more of them. You can also allocate a larger fraction of your traffic to your experiments, because traffic will be automatically steered to better performing pages.


A simple A/B test

Suppose you’ve got a conversion rate of 4% on your site. You experiment with a new version of the site that actually generates conversions 5% of the time. You don’t know the true conversion rates of course, which is why you’re experimenting, but let’s suppose you’d like your experiment to be able to detect a 5% conversion rate as statistically significant with 95% probability. A standard power calculation1 tells you that you need 22,330 observations (11,165 in each arm) to have a 95% chance of detecting a .04 to .05 shift in conversion rates. Suppose you get 100 visits per day to the experiment, so the experiment will take 223 days to complete. In a standard experiment you wait 223 days, run the hypothesis test, and get your answer.

Now let’s manage the 100 visits each day through the multi-armed bandit. On the first day about 50 visits are assigned to each arm, and we look at the results. We use Bayes' theorem to compute the probability that the variation is better than the original2. One minus this number is the probability that the original is better. Let’s suppose the original got really lucky on the first day, and it appears to have a 70% chance of being superior. Then we assign it 70% of the traffic on the second day, and the variation gets 30%. At the end of the second day we accumulate all the traffic we’ve seen so far (over both days), and recompute the probability that each arm is best. That gives us the serving weights for day 3. We repeat this process until a set of stopping rules has been satisfied (we’ll say more about stopping rules below).

Figure 1 shows a simulation of what can happen with this setup. In it, you can see the serving weights for the original (the black line) and the variation (the red dotted line), essentially alternating back and forth until the variation eventually crosses the line of 95% confidence. (The two percentages must add to 100%, so when one goes up the other goes down). The experiment finished in 66 days, so it saved you 157 days of testing.

Figure 1. A simulation of the optimal arm probabilities for a simple two-armed experiment. These weights give the fraction of the traffic allocated to each arm on each day.

Of course this is just one example. We re-ran the simulation 500 times to see how well the bandit fares in repeated sampling. The distribution of results is shown in Figure 2. On average the test ended 175 days sooner than the classical test based on the power calculation. The average savings was 97.5 conversions.

Figure 2. The distributions of the amount of time saved and the number of conversions saved vs. a classical experiment planned by a power calculation. Assumes an original with 4% CvR and a variation with 5% CvR.

But what about statistical validity? If we’re using less data, doesn’t that mean we’re increasing the error rate? Not really. Out of the 500 experiments shown above, the bandit found the correct arm in 482 of them. That’s 96.4%, which is about the same error rate as the classical test. There were a few experiments where the bandit actually took longer than the power analysis suggested, but only in about 1% of the cases (5 out of 500).

We also ran the opposite experiment, where the original had a 5% success rate and the the variation had 4%. The results were essentially symmetric. Again the bandit found the correct arm 482 times out of 500. The average time saved relative to the classical experiment was 171.8 days, and the average number of conversions saved was 98.7.

Stopping the experiment

By default, we force the bandit to run for at least two weeks. After that, we keep track of two metrics.

The first is the probability that each variation beats the original. If we’re 95% sure that a variation beats the original then Google Analytics declares that a winner has been found. Both the two-week minimum duration and the 95% confidence level can be adjusted by the user.

The second metric that we monitor is is the "potential value remaining in the experiment", which is particularly useful when there are multiple arms. At any point in the experiment there is a "champion" arm believed to be the best. If the experiment ended "now", the champion is the arm you would choose. The "value remaining" in an experiment is the amount of increased conversion rate you could get by switching away from the champion. The whole point of experimenting is to search for this value. If you’re 100% sure that the champion is the best arm, then there is no value remaining in the experiment, and thus no point in experimenting. But if you’re only 70% sure that an arm is optimal, then there is a 30% chance that another arm is better, and we can use Bayes’ rule to work out the distribution of how much better it is. (See the appendix for computational details).

Google Analytics ends the experiment when there’s at least a 95% probability that the value remaining in the experiment is less than 1% of the champion’s conversion rate. That’s a 1% improvement, not a one percentage point improvement. So if the best arm has a conversion rate of 4%, then we end the experiment if the value remaining in the experiment is less than .04 percentage points of CvR.

Ending an experiment based on the potential value remaining is nice because it handles ties well. For example, in an experiment with many arms, it can happen that two or more arms perform about the same, so it does not matter which is chosen. You wouldn’t want to run the experiment until you found the optimal arm (because there are two optimal arms). You just want to run the experiment until you’re sure that switching arms won’t help you very much.

More complex experiments

The multi-armed bandit’s edge over classical experiments increases as the experiments get more complicated. You probably have more than one idea for how to improve your web page, so you probably have more than one variation that you’d like to test. Let’s assume you have 5 variations plus the original. You’re going to do a calculation where you compare the original to the largest variation, so we need to do some sort of adjustment to account for multiple comparisons. The Bonferroni correction is an easy (if somewhat conservative) adjustment, which can be implemented by dividing the significance level of the hypothesis test by the number of arms. Thus we do the standard power calculation with a significance level of .05 / (6 - 1), and find that we need 15,307 observations in each arm of the experiment. With 6 arms that’s a total of 91,842 observations. At 100 visits per day the experiment would have to run for 919 days (over two and a half years). In real life it usually wouldn’t make sense to run an experiment for that long, but we can still do the thought experiment as a simulation.

Now let’s run the 6-arm experiment through the bandit simulator. Again, we will assume an original arm with a 4% conversion rate, and an optimal arm with a 5% conversion rate. The other 4 arms include one suboptimal arm that beats the original with conversion rate of 4.5%, and three inferior arms with rates of 3%, 2%, and 3.5%. Figure 3 shows the distribution of results. The average experiment duration is 88 days (vs. 919 days for the classical experiment), and the average number of saved conversions is 1,173. There is a long tail to the distribution of experiment durations (they don’t always end quickly), but even in the worst cases, running the experiment as a bandit saved over 800 conversions relative to the classical experiment.

Figure 3. Savings from a six-armed experiment, relative to a Bonferroni adjusted power calculation for a classical experiment. The left panel shows the number of days required to end the experiment, with the vertical line showing the time required by the classical power calculation. The right panel shows the number of conversions that were saved by the bandit.

The cost savings are partially attributable to ending the experiment more quickly, and partly attributable to the experiment being less wasteful while it is running. Figure 4 shows the history of the serving weights for all the arms in the first of our 500 simulation runs. There is some early confusion as the bandit sorts out which arms perform well and which do not, but the very poorly performing arms are heavily downweighted very quickly. In this case, the original arm has a "lucky run" to begin the experiment, so it survives longer than some other competing arms. But after about 50 days, things have settled down into a two-horse race between the original and the ultimate winner. Once the other arms are effectively eliminated, the original and the ultimate winner split the 100 observations per day between them. Notice how the bandit is allocating observations efficiently from an economic standpoint (they’re flowing to the arms most likely to give a good return), as well as from a statistical standpoint (they’re flowing to the arms that we most want to learn about).

Figure 4. History of the serving weights for one of the 6-armed experiments.

Figure 5 shows the daily cost of running the multi-armed bandit relative to an "oracle" strategy of always playing arm 2, the optimal arm. (Of course this is unfair because in real life we don’t know which arm is optimal, but it is a useful baseline.) On average, each observation allocated to the original costs us .01 of a conversion, because the conversion rate for the original is .01 less than arm 2. Likewise, each observation allocated to arm 5 (for example) costs us .03 conversions because its conversion rate is .03 less than arm 2. If we multiply the number of observations assigned to each arm by the arm’s cost, and then sum across arms, we get the cost of running the experiment for that day. In the classical experiment, each arm is allocated 100 / 6 visits per day (on average, depending on how partial observations are allocated). It works out that the classical experiment costs us 1.333 conversions each day it is run. The red line in Figure 5 shows the cost to run the bandit each day. As time moves on, the experiment becomes less wasteful and less wasteful as inferior arms are given less weight.

Figure 5. Cost per day of running the bandit experiment. The constant cost per day of running the classical experiment is shown by the horizontal dashed line.

1The R function power.prop.test performed all the power calculations in this article.

2See the appendix if you really want the details of the calculation. You can skip them if you don’t.

Posted by Steven L. Scott, PhD, Sr. Economic Analyst, Google