Hypothesis Testing

Introduction

A statistical hypothesis is a claim about a population parameter such as the mean or a proportion. There are two contradicting statements; a null hypothesis denoted as Ho and the alternative hypothesis denoted as Ha (not the same as alternative facts). The null hypothesis is said to be the claim that is initially assumed to be true while the alternative contradicts the null hypothesis.


The Goal

The goal is to reject or fail to reject the null hypothesis. We never test the alternative but rather we test the null hypothesis and we want to see if we reject the null hypothesis or we fail to reject the null hypothesis. Rejecting the null hypothesis would mean we favor the alternative and to fail to reject the null hypothesis would mean we keep the null hypothesis as the true claim.


Testing Procedure

• The null hypothesis should always be phrased as an equality while the alternative hypothesis can be phrased as an equality or an inequality.

• A test statistic is calculated based on the sample data. There are various forms that depend on what you are calculating and how you are calculating it. If you are performing a one-way or two-way ANOVA, your test statistic is known as the f test and if you are calculating the probability of an event to occur using the Central Limit Theorem, you would use the z-score.

• A rejection region is based on the test statistic in which is when you decide whether to reject the null hypothesis.


Hypothesis Testing with p-values

The p-value is the area under a standard normal bell curve. If the p-value is smaller than the significance level (usually 0.05 if not specified), then we reject the null hypothesis. Otherwise, we fail to reject.


Test Statistics

Tests for a Population Mean

The test statistic for a population mean is as followed:


let p_0 = average of sample size
let p_1 = given mean

let s = point estimate

def test_statistic_population_mean(data, p_0)
    s = 0
    p_1 = sum(data) / len(data)
    
    for x in data:
        s += (x - p_1)**0.5
    s = (s / (len(data) - 1)**0.5
    
    return (p_1 - p_0) / (s / len(data)**0.5) 
            

If your sample size is larger than or equal to 40, we will reference the z-table. Otherwise, we will refer to our t-table; with the caveat that you are assuming the population is normal. Of course, you can always find the p-value using computer.

Tests for a Population Proportion


let n = sample size
def test_statistic_population_proportion(p_0, p_1, n):
    return (p_0 - p_1) / (((p_1 * (1 - p_1)) / n)**0.5)