Take and organize notes like text messages.
A two-Factor ANOVA, also known as a factorial ANOVA is when we are trying to determine whether one factor influences the outcome of an experiment over the other factor. For example, we can perform a two-factor ANOVA analysis if we want to study the proportion of popcorn kernels popped when they are in different pot sizes or using different fats.
In other words, we want to know what influences the proportion of popcorn kernels popped. Is it pot sizes, fat, or both?
The two factors in this example are
Factor A = pot sizes
Factor B = type of fat
The null hypothesis of a two-factor ANOVA Analysis can be described as followed:
HoA : a1 == a2 == ... == ai = 0 “No Factor A Effect” HaA : a1 != 0 at least one ai != 0 HoB : b1 == b2 == ... == bi = 0 “No Factor B Effect” HaB : b1 != 0 at least one bi != 0
If p-value is less than the level of significance (usually 0.05 if not specified), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis. Another way to test when to reject the null hypothesis is if the test statistic is greater than or equal to the critical value.
Note: The notations used to explain a concept or a mathematical formula will be replaced by programming syntax; primarily Python pseudo syntax.
let I = number of factors let J = number of observations in each factor # sum of all your data divide by the number of elements in i let mean_of_i = sum(data) / data[i].count # sum of all your data divide by the number of elements in j let mean_of_j = sum(data) / data[j].count # sum of all your data divide by the total number of observations let grand_mean = sum(data) / (data[i].count + data[j].count) let SST = total sum of squares let SSA = total sum of squares for Factor A let SSB = total sum of squares for Factor B let SSAB = total sum of both factors let SSE = error sum of squares let MSA = mean square treatment of Factor A let MSB = mean square treatment of Factor B let MSAB = mean square treatment of both Factors let MSE = mean square for error let Fa = test statistic for Factor A let Fb = test statistic for Factor B let Fab = test statistic for both Factors
To demonstrate a two-factor analysis, lets work on an example: Given a dataset of an arithmetic test taken by a group of boys and girls ages 10,11, and 12. We want to figure out if the scores are influenced by their gender, age, or both.
Refer to the dataset below:
|Boy||4||10 Years Old|
|Boy||6||10 Years Old|
|Boy||8||10 Years Old|
|Girl||4||10 Years Old|
|Girl||8||10 Years Old|
|Girl||9||10 Years Old|
|Boy||6||1 Years Old|
|Boy||6||1 Years Old|
|Boy||9||11 Years Old|
|Girl||7||11 Years Old|
|Girl||10||11 Years Old|
|Girl||13||11 Years Old|
|Boy||8||12 Years Old|
|Boy||9||12 Years Old|
|Boy||13||12 Years Old|
|Girl||12||12 Years Old|
|Girl||14||12 Years Old|
|Girl||16||12 Years Old|
The best way to do the calculations is to create a table of means for Factor A and Factor B. In this dataset, we can represent Gender as Factor A and we can represent Age as Factor B. The assignment Factors are irrelevant, it is just a way to label our sets.
|10 Year Olds||11 Year Olds||12 Year Olds||Average|
|Average||6.5||8.5||12||Grand Mean: 9|
Next, we want to calculate all the predefined variables we made earlier.
To calculate the SST
function get_sst(data); sst = 0 for item in data: sst += (item - grand_mean)**2 return sst
To calculate the SSA and SSB
Function get_ss(data): ss = 0 for group in data: group_mean = sum(group) / len(group) for item in group: ss += (group_mean - grand_mean)**2 return ss
To calculate the grand mean
Function get_grand_mean(data): sum = 0 item_count = 0 for group in data: for item in group: sum += item item_count += 1 return sum / item_count
To calculate the SSE
Function get_sse(data): sse = 0 df = 0 for group in data: for sub_group in group: means = sum(sub_group) / sub_group.count df += sub_group.count - 1 for item in sub_group: sse += (item - means)**2 return sse, df
To start using these functions, first we need to format our data appropriately
# All of the data let raw_data = [4,6,8,6,6,9,8,9,13,4,8,9,7,10,13,12,14,16] # Organized by [boys score] [girls score] let factor_A = [[4,6,8,6,6,9,8,9,13],[4,8,9,7,10,13,12,14,16]] # Organized by [10 year olds] [11 year olds] [12 year olds] let factor_B = [[4,6,8,4,8,9],[6,6,9,7,10,13],[8,9,13,12,14,16]] # Organized by # [[boys 10 year old scores],[boys 11 year old scores],[boys 12 year old scores]] # [[girls 10 year old scores],[girls 11 year old scores],[girls 12 year old scores]] let grouped_data = [[[4,6,8],[6,6,9],[8,9,13]],[[4,8,9],[7,10,13],[12,14,16]]]
The rest of the calculations
Grand_mean = get_grand_mean(factor_A) # or factor B SST = get_sst(raw_data) SSA = get_ss(factor_A) SSB = get_ss(factor_B) SSE = get_sse(grouped_data) SSAB = SST - SSA - SSB - SSE df1 = i - 1 df2 = j - 1 df3 = SSE df4 = df1 * df2 df5 = df1 + df2 + df3 + df4 MSA = SSA / df1 MSB = SSB / df2 MSE = SSE / df3 MSAB = SSAB / df4 # Test statistic Fa = MSA / MSE Fb = MSB / MSE Fab = MSAB / MSE
Your results should be the following:
1 2 12 2 17 # df1, df2, df3, df4, df5 respectivley SSA 32.0 SSB 93.0 SSE 68.0 SST 200.0 SSAB 7.0 MSA 32.0 MSB 46.5 MSE 5.666666666666667 MSAB 3.5 Test statistics Fa 5.647058823529411 Fb 8.205882352941176 Fab 0.6176470588235293
After calculating the test statistic, we need to determine based on the test statistic when to reject the null hypothesis. If Fa is greater than the critical value, we reject the null hypothesis. Otherwise we fail to reject the null hypothesis. Same is try with the Fb test statistic.