Analysis of Variance (ANOVA) is a powerful statistical tool used to analyze the variance among group means and determine whether these differences are statistically significant. It’s commonly used in various fields such as agriculture, biology, psychology, and more to test hypotheses about different groups. Below are some practical examples:
The effect of different diets on the growth of fishes
Comparing the height of three different species of a plant
The type of flour used for baking bread
Data for ANOVA could be collected through designed experiments or by sampling from populations. This article helps explain how ANOVA analyzes variance and identifies situations when these differences are significant, using both simulated and real data.
Often we Analysis of Variance (ANOVA) to analyze the variances to find if different cases results in similar outcome and if the difference is significant. Following are some simple examples,
Consider the following model with \(i=3\) groups and \(j=n\) observations,
\[
y_{ij} = \mu + \tau_i + \varepsilon_{ij}, \; i = 1, 2, 3 \texttt{ and } j = 1, 2, \ldots n
\]
where, \(\tau_i\) represetns the effect corresponding to group \(i\) and \(\varepsilon_{ij} \sim \mathrm{N}(0, \sigma^2)\), the usual assumption of linear model. In order to better understand how ANOVA finds the differences between groups and how the group mean and their standard deviation influence the results from ANOVA, we will explore the following four cases:
anova_result <- Model[, map_df(.x = fit,.f =~ broom::tidy(anova(.x))), by = Cases] %>%rename(DF = df,SSE = sumsq,MSE = meansq,Statistic = statistic,`p value`= p.value)anova_result %>%mutate(Cases =case_when( Cases =="Case1"~"Case 1: Similar group means with high variation within the groups", Cases =="Case2"~"Case 2: Similar group means with low variation within the groups", Cases =="Case3"~"Case 3: Distant group means with high variation within the groups",TRUE~"Case 4: Distant group means with low variation within the groups" ) ) %>% gt::gt(groupname_col ="Cases", rowname_col ="term") %>% gt::fmt_number(columns =4:6) %>% gt::fmt_number(columns =7, decimals =4) %>% gt::sub_missing(missing_text ="") %>% gt::tab_options(table.width ="100%",row_group.font.weight ="600" )
DF
SSE
MSE
Statistic
p value
Case 1: Similar group means with high variation within the groups
Group
2
32.09
16.05
0.64
0.5304
Residuals
147
3,704.26
25.20
Case 2: Similar group means with low variation within the groups
Group
2
1.98
0.99
1.00
0.3714
Residuals
147
146.03
0.99
Case 3: Distant group means with high variation within the groups
Group
2
1,951.86
975.93
50.15
0.0000
Residuals
147
2,860.66
19.46
Case 4: Distant group means with low variation within the groups
The results show a high p-value, indicating no significant difference between the groups due to high within-group variability.
Here, the p-value is still high, suggesting no significant difference, but the small variance within groups provides clearer insights compared to Case 1.
Despite the high variation within groups, the distant group means lead to a low p-value, indicating statistically significant differences among the groups.
With low within-group variation and distant means, the p-value remains extremely low, strongly indicating significant group differences.
In conclusion, ANOVA helps determine if there are significant differences between multiple group means by comparing variances within groups to variances between groups. The power of ANOVA lies in its ability to detect even subtle differences when variations are minimal within groups.