Two-Factor ANOVA Analysis - A Programmers Approach

Introduction

A two-Factor ANOVA, also known as a factorial ANOVA is when we are trying to determine whether one factor influences the outcome of an experiment over the other factor. For example, we can perform a two-factor ANOVA analysis if we want to study the proportion of popcorn kernels popped when they are in different pot sizes or using different fats.

In other words, we want to know what influences the proportion of popcorn kernels popped. Is it pot sizes, fat, or both?

The two factors in this example are

Factor A = pot sizes
Factor B = type of fat


Hypothesis Testing

The null hypothesis of a two-factor ANOVA Analysis can be described as followed:


HoA : a1 == a2 == ... == ai = 0 “No Factor A Effect”
HaA : a1 != 0 at least one ai != 0
               
HoB : b1 == b2 == ... == bi = 0 “No Factor B Effect”
HaB : b1 != 0 at least one bi != 0
            

If p-value is less than the level of significance (usually 0.05 if not specified), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis. Another way to test when to reject the null hypothesis is if the test statistic is greater than or equal to the critical value.

Things we need to calculate

Note: The notations used to explain a concept or a mathematical formula will be replaced by programming syntax; primarily Python pseudo syntax.


let I = number of factors
let J = number of observations in each factor

# sum of all your data divide by the number of elements in i
let mean_of_i = sum(data) / data[i].count

# sum of all your data divide by the number of elements in j
let mean_of_j = sum(data) / data[j].count 

# sum of all your data divide by the total number of observations
let grand_mean = sum(data) / (data[i].count + data[j].count)

let SST = total sum of squares
let SSA = total sum of squares for Factor A
let SSB = total sum of squares for Factor B
let SSAB = total sum of both factors
let SSE = error sum of squares
let MSA = mean square treatment of Factor A
let MSB = mean square treatment of Factor B
let MSAB = mean square treatment of both Factors
let MSE = mean square for error
let Fa = test statistic for Factor A
let Fb = test statistic for Factor B
let Fab = test statistic for both Factors
            

The Calculations

To demonstrate a two-factor analysis, lets work on an example: Given a dataset of an arithmetic test taken by a group of boys and girls ages 10,11, and 12. We want to figure out if the scores are influenced by their gender, age, or both.

Refer to the dataset below:

Dataset

Gender Score Age
Boy 4 10 Years Old
Boy 6 10 Years Old
Boy 8 10 Years Old
Girl 4 10 Years Old
Girl 8 10 Years Old
Girl 9 10 Years Old
Boy 6 1 Years Old
Boy 6 1 Years Old
Boy 9 11 Years Old
Girl 7 11 Years Old
Girl 10 11 Years Old
Girl 13 11 Years Old
Boy 8 12 Years Old
Boy 9 12 Years Old
Boy 13 12 Years Old
Girl 12 12 Years Old
Girl 14 12 Years Old
Girl 16 12 Years Old

The best way to do the calculations is to create a table of means for Factor A and Factor B. In this dataset, we can represent Gender as Factor A and we can represent Age as Factor B. The assignment Factors are irrelevant, it is just a way to label our sets.

Table of Means

10 Year Olds 11 Year Olds 12 Year Olds Average
Boys 6 7 10 7.7
Girls 7 10 14 10.3
Average 6.5 8.5 12 Grand Mean: 9

Calculations

Next, we want to calculate all the predefined variables we made earlier.

To calculate the SST


function get_sst(data);
    sst = 0
    for item in data:
        sst += (item - grand_mean)**2
    return sst
            

To calculate the SSA and SSB


Function get_ss(data):
    ss = 0
    for group in data:
        group_mean = sum(group) / len(group)

        for item in group:
            ss += (group_mean - grand_mean)**2

    return ss
            

To calculate the grand mean


Function get_grand_mean(data):
    sum = 0
    item_count = 0

    for group in data:
        for item in group:
            sum += item
            item_count += 1

    return sum / item_count 
            

To calculate the SSE


Function get_sse(data):
    sse = 0
    df = 0
    for group in data:
        for sub_group in group:
            means = sum(sub_group) / sub_group.count
            df += sub_group.count - 1

            for item in sub_group:
                sse += (item - means)**2

    return sse, df
            

To start using these functions, first we need to format our data appropriately


# All of the data
let raw_data = [4,6,8,6,6,9,8,9,13,4,8,9,7,10,13,12,14,16]

# Organized by [boys score] [girls score]
let factor_A = [[4,6,8,6,6,9,8,9,13],[4,8,9,7,10,13,12,14,16]] 

# Organized by [10 year olds] [11 year olds] [12 year olds]
let factor_B = [[4,6,8,4,8,9],[6,6,9,7,10,13],[8,9,13,12,14,16]]

# Organized by 
# [[boys 10 year old scores],[boys 11 year old scores],[boys 12 year old scores]]
# [[girls 10 year old scores],[girls 11 year old scores],[girls 12 year old scores]]
let grouped_data = [[[4,6,8],[6,6,9],[8,9,13]],[[4,8,9],[7,10,13],[12,14,16]]]   
            

The rest of the calculations


Grand_mean = get_grand_mean(factor_A) # or factor B
SST = get_sst(raw_data)
SSA = get_ss(factor_A)
SSB = get_ss(factor_B)
SSE = get_sse(grouped_data)
SSAB = SST - SSA - SSB - SSE[0]

df1 = i - 1
df2 = j - 1
df3 = SSE[1]
df4 = df1 * df2
df5 = df1 + df2 + df3 + df4

MSA = SSA / df1
MSB = SSB / df2
MSE = SSE[0] / df3
MSAB = SSAB / df4

# Test statistic
Fa = MSA / MSE
Fb = MSB / MSE
Fab = MSAB / MSE
            

Your results should be the following:

   
1 2 12 2 17 # df1, df2, df3, df4, df5 respectivley

SSA 32.0
SSB 93.0
SSE 68.0
SST 200.0
SSAB 7.0

MSA 32.0
MSB 46.5
MSE 5.666666666666667
MSAB 3.5

Test statistics
Fa 5.647058823529411
Fb 8.205882352941176
Fab 0.6176470588235293
            

Interpreting the Results

After calculating the test statistic, we need to determine based on the test statistic when to reject the null hypothesis. If Fa is greater than the critical value, we reject the null hypothesis. Otherwise we fail to reject the null hypothesis. Same is try with the Fb test statistic.