Statistical calculations in A/Bn tests
This article documents the detailed statistical calculations used in manual A/Bn tests in 51黑料不打烊 Target. Definitions are provided for Conversion Rate, Confidence Interval of Conversion Rate, Lift, Confidence Interval for Lift, and Confidence.
           
          
Mean performance
The following section explains the calculations used in the previous illustration.
Conversion Rate and Revenue Per Visitor (RPV) Campaigns
The following illustration shows Conversion Rate, Confidence Interval of Conversion Rate, and the number of Conversions in a Target report. For example, the first line shows that for Experience A: the Conversion Rate is 25.81% with a Confidence Interval of 卤7.7% and 32 conversions were recorded. Given that 124 Visitors saw the experience, this equates to 32/124 = 25.81%.
           
          
The conversion rate or mean, 渭谓, for each experience 谓 in an experiment is defined as a ratio of the sum of the metric to the number of units assigned to that metric, N谓:
           
          
Here,
- 
                  Y颈谓 is the value of the metric for each unit i, that has been assigned to a given experience 谓. 
- 
                  The sum over units i depends on the choice of counting methodology. - If Visitors is used as the counting methodology, each unit is a unique visitor defined as a unique participant in the activity for the life of the activity.
- If Visits is used as the counting methodology, each unit is a unique visit defined as a unique participant in an experience during a Target session (with a unique sessionId). When thesessionIdchanges, or the visitor reaches the conversion step, a new visit is counted.
- If Activity Impressions is used as the counting methodology, each unit is a unique impression defined as each time a visitor loads any page of the activity.
 
Confidence Interval of Mean/Conversion Rate
The confidence interval of the conversion rate is intuitively defined as range of possible conversion rates that is consistent with the underlying data.
When running experiments, the conversion rate for a given experience is an estimate of the 鈥渢rue鈥 conversion rate. To quantify the uncertainty in this estimate, Target uses a confidence interval. Target always reports a 95% confidence interval, which means that in the end, 95% of confidence intervals calculated include the true conversion rate of the experience.
A 鈥淐onfidence鈥 number is also reported next to the currently leading or winning experience. This figure is reported only until the leading experience鈥檚 Confidence reaches at least 60%. If two experiences are present in the activity, this number represents the confidence level that the experience is performing better than the other experience. If more than two experiences are present in the activity, this number represents the confidence level that the experience is performing better than the defined 鈥淐ontrol鈥 experience. If the 鈥淐ontrol鈥 experience is winning, no 鈥淐onfidence鈥 figure is reported.
A 95% confidence interval of conversion rate 渭谓 is defined as the range of values:
           
          
Where the standard error for the mean is defined as
           
          
Where an unbiased estimate of the sample standard deviation is used:
           
          
When the campaign is a conversion rate campaign (i.e., the conversion metric is binary), the standard error reduces to:
           
          
Lift
The following illustration shows Lift and Confidence Interval of Lift in a Target Report. The number represents the average of the range of the lift bounds, and the arrow reflects if the lift is positive or negative. The arrow displays in grey until the confidence passes 95%. After confidence passes the threshold, the arrow is green or red based on a positive or negative lift.
           
          
The lift between an experience 谓, and the control experience 谓0 is the relative 鈥渄elta鈥 in conversion rates, defined as
           
          
Where the individual conversion rates are as defined above. More simply,
Lift(Experience N) = (Performance_Experience_N - Performance_Control)/ Performance_Control
If the conversion rate of the control experience 谓0 is 0, there is no lift.
Confidence Interval of Lift
The boxplot graph in the Average Lift and Confidence Interval column represents the average value and 95% Confidence Interval of Lift. The boxplot is grey when there is any overlap in the confidence interval of a given non-control experience with the confidence interval of control experience. The boxplot is green or red when the range of given experience鈥檚 confidence interval is above or below the confidence interval of control experience.
The standard error of the lift between an experience 谓, and the control experience 谓0 is defined as:
           
          
Then the 95% Confidence Interval of the lift is:
           
          
This calculation uses the 鈥淒elta鈥 method, and is described in more detail in this document
Confidence
The last column shows the confidence in a Target report. The confidence of an experience is a probability (denoted as a percentage) of obtaining a result as extreme as the one that is observed, given the null hypothesis is true. In terms of p-values, the confidence displayed is 1 - p-value. Intuitively, higher confidence means that it is less likely that the control and non-control experience have equal conversion rates.
In Target, a two-tailed Welch鈥檚 t-test is performed between the test experience and the control experience to test if the means of test and control experiences are the same. Because we usually do not know if sample sizes and variances of two groups are the same before running the experiment, and Target also allows you to have unequal percentages of traffic sent to each experience, we do not assume that the variance for each experience is equal. Thus, Welch鈥檚 t-test is chosen instead of Student鈥檚 t-test.
To perform Welch鈥檚 t-test, we first start calculating the t-statistic and the degrees of freedom, then run a two-tailed t-test to generate the p-value. Finally, we calculate the confidence based on p-value.
The t-statistic is defined to be the difference of the means of any two independent random variables, 谓 and 谓0, divided by the standard error of the difference:
           
          
Where 渭v and 渭v0 are the means of 谓 and 谓0 respectively, and the standard error of the difference between 渭v and 渭v0 are given by:
           
          
Where 蟽2v and 蟽2v0 are the variances of two experiences 谓 and 谓0 respectively, and Nv and Nv0 are sample sizes for 谓 and 谓0 respectively.
For Welch鈥檚 t-test, the degree of freedom is calculated as following:
           
          
And degree of freedom for 谓 and 谓0 are defined as:
           
          
           
          
Then the p-value can be computed from the area in the tails of the t-distribution:
           
          
Finally, the confidence reported in Target is defined as:
           
          
Performing Calculations offline
The downloaded CSV report includes only raw data and does not include calculated metrics, such as revenue per visitor, lift, or confidence used for A/B tests.
To compute these statistical quantities, download the Target Complete Confidence Calculator Excel file to input the activity鈥檚 value.