# Analysis of Variance (ANOVA)

An ANOVA test a type of hypothesis test to evaluate whether the mean
between two or more groups are the same or not. The null hypothesis of
ANOVA is

- H0: Each group mean is the same as grand mean.
- H1: Not all group mean is the same as grand mean.

Typical parameters/terms associated witt ANOVA

- Variance: Sum of the squares (SS) / degree of freedom (df)
- SSB: sum of squares between groups
- SSW: sum of squares within group
- SST: total sum of the squares (SST = SSB + SSW)
- F statistic: variability between groups / variability within
groups
- p: the number of groups
- n: group size (observations in each groups)
- \(DF_{T}\): total degrees of
freedom (n*p - 1)
- \(DF_{B}\): degree of freedom
between groups (p - 1)
- \(DF_{W}\): degree of freedom
within groups \(\sum_{k=1}^{p} n_{k} -
1\)

## One-way ANVOA

ANOVA applied to dataset with one predictor and more than two
groups.

## Assumptions of ANOVA

### 1. Normal data

Samples are from a normally distributed population. It concerned with
skew and kurtosis mostly. Also, outliers should be addressed. Ideally,
abs(Standardized residuals) < 2.5. However, the violation of
normality assumption does not significantly affect the F statistic.
F-test is very robust if each group size is equal and the data is
identically distributed. It can be checked by following methods:

- density plot
- QQ plot
- histogram
- D’Agostino skewness test
- Shapiro-Wilks Normality test

Null hypothesis of Shapiro-Wilks’s test is that samples are from a
normal distribution. The variance are equal among all variables.

### 2. Equality of variances

In other words, it is the homogeneity of variance. It is most
important if group sizes are different. It can be checked by following
methods:

- Levene’s test
- Bartlett’s test
- Box plot
- scatterplot of ANOVA residuals vs. predicted(fitted) values.

Bartlett’s test assumes the normal disribution of the data. Null
hypothesis of Hartlett’s test is that the variance are equal among all
variables.

Levene’s test should be used if the data does not come from normal
distribution

### 3. Independence of
observations

There is no formal test to verify this assumption. This should be
handled in the design of experiment phase to make sure random sampling.
Note that F-test is not robust to this violation.

## Post-hoc Test

One-way ANOVA deos not compare the different in mean between each
group. So, once it turns out the means between groups are not the same,
post-hoc test may be necessary to clarify the sigfnicant different in
mean between groups.

- bonferroni : It tests all possible pairs. So, conservative.
- Tukey’s Test
- Kurskal-Wallis
- Dunnett

Back