Single-Factor ANOVA Analysis - A Programmers Approach


ANalysis Of VAriance, also known as ANOVA Analysis, is a method to help analyze data with more than two groups. The goal is to decide wether or not the population means for those groups are equal. If you are only comparing two groups, you can simply use the t-test to decide whether there is a difference in population mean.


Some example applications of the ANOVA analysis:

• Compare the gasoline mileage for a sample size of 20 Fords, Toyotas, and Mercedes-Benz
• Compare the effects of a blood pressure lowering medication in patients from three different risk groups (healthy, pre-hypertension, hypertension)
• Allow groups of 20 students to take the same test within a different period of time (60,90, 120 minutes)

Hypothesis Testing

As mentioned before, the goal is to figure out if there is a difference in population mean. To that note, the null hypothesis of a one-way ANOVA test is as follow:

let m1, m2, ... mi = population mean of group i

Ho : m1 == m2 == m3 == ... == mi
Ha : at least two of the population means are different

Main Idea

The main idea is to find the ANOVA test statistic which will be denoted as F and compare that to the critical value.

If the test statistic is greater than the critical value, we reject Ho. Otherwise, we fail to reject Ho.
If the p-value is less than the level of significance, we reject Ho. Otherwise, we fail to reject Ho.


We need to calculate the following:

*Note that this is for when each group has the same number of elements.

let SST = Total sum of squares

let SSTr = Treatment sum of squares

let SSE = error sum of squares

let MSTr = Mean Square for Treatment 

let MSE = Mean Square for Error

let F = MSTr / MSE

let cm = Sum of all data

To Calculate

let data = [[5.2,4.5,6.0,6.1,6.7,5.8],
            [6.5, 8.0, 6.1, 7.5, 5.9, 5.6],
            [5.8, 4.7, 6.4, 4.9, 6.0, 5.2],
            [8.3, 6.1, 7.8, 7.0, 5.5, 7.2]]

cm = sum(data)

for group in data:

	SSTr += sum(group)**2 / group.count
	for item in group:
		SST += item**2

let I = number of groups
let J = number of elements in group.

cm = (cm^2) / I * J
SSTr -= cm
SST -= cm

MSTr = (SSTr / I - 1)
MSE = (SSE / (I * (J - 1))

F = MSTr / MSE

if F > critical_value:
	Reject Ho
	Fail to reject Ho

To Calculate the Critical Value

Refer to the F table here

let df1 = I - 1         #column
let df2 = I * (J - 1)   #row
let significance_level = given

func critical_value(df1, df2, significance_level):
  return refer to the F-table here

Tukey's Procedure

As stated earlier, when we reject the null hypothesis, further investigation may be in order. We do this by performing the Tukey's procedure or t procedure.

The Tukey's procedure helps us find any significant difference between the sample means. If we fail to reject the null hypothesis, that means the sample means are the same. Otherwise, the at least one of the means are not the same as the rest.

Calculate the following

Refer to the Q table here here

func getQ(number_of_means, degrees_of_freedom, significance_level):
     return refer to the Q table here

let t = sqrt(mse / n) * q

# get the absolute difference of all sample means and compare it with the t value.
let abs_mean_i = abs(mean[i] - mean[j]) for mean i and mean j 

If abs_mean_i > t:
	There is a difference between groups i and j
	There are no significant differences