Single-Factor ANOVA Analysis - A Programmers Approach

Introduction

ANalysis Of VAriance, also known as ANOVA Analysis, is a method to help analyze data with more than two groups. The goal is to decide wether or not the population means for those groups are equal. If you are only comparing two groups, you can simply use the t-test to decide whether there is a difference in population mean.


Application

Some example applications of the ANOVA analysis:

• Compare the gasoline mileage for a sample size of 20 Fords, Toyotas, and Mercedes-Benz
• Compare the effects of a blood pressure lowering medication in patients from three different risk groups (healthy, pre-hypertension, hypertension)
• Allow groups of 20 students to take the same test within a different period of time (60,90, 120 minutes)


Hypothesis Testing

As mentioned before, the goal is to figure out if there is a difference in population mean. To that note, the null hypothesis of a one-way ANOVA test is as follow:


let m1, m2, ... mi = population mean of group i

Ho : m1 == m2 == m3 == ... == mi
Ha : at least two of the population means are different
              


Main Idea

The main idea is to find the ANOVA test statistic which will be denoted as F and compare that to the critical value.

If the test statistic is greater than the critical value, we reject Ho. Otherwise, we fail to reject Ho.
If the p-value is less than the level of significance, we reject Ho. Otherwise, we fail to reject Ho.


Calculations

We need to calculate the following:

*Note that this is for when each group has the same number of elements.

let SST = Total sum of squares

let SSTr = Treatment sum of squares

let SSE = error sum of squares

let MSTr = Mean Square for Treatment 

let MSE = Mean Square for Error

let F = MSTr / MSE

let cm = Sum of all data
                

To Calculate


let data = [[5.2,4.5,6.0,6.1,6.7,5.8],
            [6.5, 8.0, 6.1, 7.5, 5.9, 5.6],
            [5.8, 4.7, 6.4, 4.9, 6.0, 5.2],
            [8.3, 6.1, 7.8, 7.0, 5.5, 7.2]]

cm = sum(data)

for group in data:

	SSTr += sum(group)**2 / group.count
	
	for item in group:
		SST += item**2

let I = number of groups
let J = number of elements in group.

cm = (cm^2) / I * J
SSTr -= cm
SST -= cm
SSE = SST - SSTr

MSTr = (SSTr / I - 1)
MSE = (SSE / (I * (J - 1))

F = MSTr / MSE

if F > critical_value:
	Reject Ho
else:
	Fail to reject Ho
                

To Calculate the Critical Value

Refer to the F table here

let df1 = I - 1         #column
let df2 = I * (J - 1)   #row
let significance_level = given

func critical_value(df1, df2, significance_level):
  return refer to the F-table here
                


Tukey's Procedure

As stated earlier, when we reject the null hypothesis, further investigation may be in order. We do this by performing the Tukey's procedure or t procedure.

The Tukey's procedure helps us find any significant difference between the sample means. If we fail to reject the null hypothesis, that means the sample means are the same. Otherwise, the at least one of the means are not the same as the rest.

Calculate the following

Refer to the Q table here here

func getQ(number_of_means, degrees_of_freedom, significance_level):
     return refer to the Q table here


let t = sqrt(mse / n) * q

# get the absolute difference of all sample means and compare it with the t value.
let abs_mean_i = abs(mean[i] - mean[j]) for mean i and mean j 

If abs_mean_i > t:
	There is a difference between groups i and j
else:
	There are no significant differences