Design of Experiment and Analysis of Variance

class: center, middle, inverse, title-slide

# Design of Experiment and Analysis of Variance
## Repetition
### Raju Rimal
### 29 Aug, 2017

---

background-image: url(https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

???

Image credit: [NMBU](https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

---
class: center, middle, inverse

# Statistical Inference
## Hypothesis Testing

---

.left-column[
# Inference Steps

]
.right-column[

### Steps

1. Make a hypothesis

2. Collect data

3. Calculate test-statistic

4. Compare test-statistic with theoretical distribution </br>(T-statistic, F-statistic, `$\chi^2$` statistic)

5. Conclusion and decision

### Remember

For an estimate we always use .red[hat]. For instance, Sample mean `$\bar{x} = \hat{\mu}$` is an estimate of population mean `$\mu$`.

Similarly, `$\hat{\sigma}$` is an estimate of population standard deviation `$\sigma$`.

]

---
class: spread

.left-column[
# Hypothesis
## Null Hypothesis

### Exam questions
- <a href="images/2011-1.png" data-fancybox >2011: 2(a)</a>
- <a href="images/2012-1.png" data-fancybox >2012: 2(b)</a>
]

.right-column[
### Use cases

One sample t-test
: `$H_0: \mu = 5$`

Two sample t-test
: `$H_0: \mu_1 = \mu_2$`

ANOVA model
: `$H_0: \tau_1 = \tau_2 = \tau_3 = 0$`

Random effect Model
: `$H_0: \sigma_\tau^2 = 0$`

.red[_Always write hypothesis in terms of **population parameter**_]
]

---
class: spread

.left-column[
# Hypothesis
## Alternative Hypothesis

### Exam questions
- <a href="images/2011-1.png" data-fancybox >2011: 2(a)</a>
- <a href="images/2012-1.png" data-fancybox >2012: 2(b)</a>
]

.right-column[

### One sided vs Two sided

<details>
<summary>Test if expected variance of $x$ is greater than 11.</summary>
\[H_1: \sigma^2 > 11\]
</details>

<details>
<summary>Are the three soya groups significantly different?</summary>
\[H_1: \mu_i \ne \mu_j \text{ for any } i \ne j\]
If $\tau_i = \mu_i - \mu$ is the effect of group $i$,
\[H_1: \tau_i \ne 0 \text{ for at least one } i = 1, 2, 3\]
</details>

<details>
<summary>Can we conclude that the average of `soya` diet gives higher protein than `non-soya` diet?</summary>
\[H_1: \Gamma < 0 \text{ where, } \Gamma = \tau_1 - \frac{1}{2}(\tau_2 + \tau_3)\]
</details>

<details>
<summary>Does fat percent vary accross these randomly chosen farms?</summary>
\[H_1: \sigma_\tau^2 > 0\]
</details>
]

---

.left-column[
# T Statistic
## T-test

<div class="side-caption">The .italics[darker region] under the curve is .italics[rejection region]. If the calculated t-value lies in this region, we reject Null hypothesis.</div>
]

.right-column[

### One sample mean
`$$\text{t-statistic} = \dfrac{\bar{y}}{\mathrm{SE}(\bar{y})} = \dfrac{\bar{y}}{\hat{\sigma}/\sqrt{n}} \sim t_{\alpha/2, n-1}$$`
### Difference between two groups 
`$$\text{t-statistic} = \dfrac{\bar{y}_i - \bar{y}_j}{\mathrm{SE}(\bar{y}_i - \bar{y}_j)} = \dfrac{\bar{y}_i - \bar{y}_j}{S_\text{pooled}\sqrt{\cfrac{1}{n_1} + \cfrac{1}{n_2}}} \sim t_{\alpha/2, N-a}$$`
### C.I. true difference between two groups
`$$\left[\bar{y}_i - \bar{y}_j \pm t_{\alpha/2, N-a} \times S_\text{pooled}\sqrt{\cfrac{1}{n_1} + \cfrac{1}{n_2}} \right]$$`

**Remember:** Here `$N = n_1 + n_2$` is total number of observation in all groups.
]

---

.left-column[
# T Statistic
## Example
<img src="Day1_files/figure-html/unnamed-chunk-4-1.png" width="90%" />

```
  Modified Unmodified
1     16.9       16.6
2     16.4       16.8
```
]
.right-column[
### Two sample t-test

```

Two Sample t-test

data:  Modified and Unmodified
t = -2, df = 20, p-value = 0.04
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.54239 -0.00961
sample estimates:
       mean of x        mean of y  pooled std.dev. 
          16.766           17.042            0.284 
```

### ANOVA test

```
Analysis of Variance Table
          Df Sum Sq Mean Sq F value Pr(>F)  
key        1  0.381   0.381    4.74  0.043 *
Residuals 18  1.447   0.080                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

<h3><a href="javascript:;" data-fancybox data-src="#hidden-content">Pooled Variance</a></h3>

.hidden[
<div id="hidden-content">
<h3>Pooled Variance</h3>
Pooled Variance $(S_\text{pooled}^2)$ is same as MSE in Anova. We can calculate it as,
\[S_\text{pooled}^2 = \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2}\]
</div>
]
]

---

.left-column[
# `$\chi^2$` Statistic
## Chisq Test

<div class = "side-caption">
The shaded region covers .italics[5%] area of the curve.
</div>

]

.right-column[
### Variance Test
`$$\chi^2\text{ statistic } = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{\alpha, n-1}$$`

### ANOVA: Confidence Interval of MSE `$(\hat{\sigma}^2)$`

`$$\left[\dfrac{(N-a)\text{MSE}}{\chi^2_{\alpha/2, N-a}}, \dfrac{(N-a)\text{MSE}}{\chi^2_{1-\alpha/2, N-a}}\right] =
\left[\dfrac{\text{SSE}}{\chi^2_{\alpha/2, N-a}}, \dfrac{\text{SSE}}{\chi^2_{1-\alpha/2, N-a}}\right]$$`

### Things to remember
- `$(N-a)$` is the degree of freedom for residual
- `$(N-a)\text{MSE} = \text{SSE}$`
- `$\chi^2_{N-a}$` is unsymmetric unlike t-distribution

]

---

.left-column[
# F Statistic
## F test

<img src="Day1_files/figure-html/unnamed-chunk-11-1.png" width="90%" />
<div class = "side-caption">
The ratio of two chisq distribution is F-distribution
</div>
]

.right-column[
### Testing difference in variablility of two groups
`$$H_0: \sigma_1^2 = \sigma_2^2 \text{ vs } H_1: \sigma_1^2 \ne \sigma_2^2$$`
### F statistic
If `$S_1^2 \sim \chi^2_{n_1 - 1}$` and `$S_2^2 \sim \chi^2_{n_2 - 1}$` are sample mean of two groups respectively,
`$$F = \frac{S_1^2}{S_2^2} \sim F_{n_1-1, n_2 - 1}$$`
### Secret Trick
`$$F_{(1-\alpha), n_1, n_2} = \frac{1}{F_{\alpha, n_2, n_1}}$$`
]

---

.left-column[
# F Statistic
## Example

```
    group  mean   var n
 not.soya 77.75 11.36 8
     soya 82.89 21.86 9
```
]

.right-column[
### Soya diet experiment
`$$\text{F-value} = \frac{S_1^2}{S_2^2} = \frac{21.861}{11.357} = 1.925 \sim F_{7, 8}$$`

F-value from <a href="images/F0.05.png" data-fancybox >Table</a> is 3.726

Reject `$H_0$`: Variation between two groups significantly differs.
]

---
class: center, middle, inverse

# ANOVA Model
---

.left-column[
# ANOVA Model
## Assumptions

<img src="Day1_files/figure-html/unnamed-chunk-17-1.png" width="90%" />
<div class="side-caption">
<ol>
<li>Errors are random and independent</li>
<li>Errors are normally distributed with mean 0 and constant variance $\sigma^2$</li>
</ol>
</div>
]
.right-column[
### Mean Model
`$$y_{ij} = \mu_i + \varepsilon_{ij}$$`
Here,
`$\mu_i$` is mean of group `$i$`; `$i = 1, 2, \ldots a$` and `$j = 1, \ldots n$`

### Effect Model
`$$y_{ij} = \mu + \tau_i + \varepsilon_{ij}$$`
Here,
We split `$\mu_i$` (group mean) into overall mean `$(\mu)$` and effect `$(\tau_i)$` of group `$i$` as, `$\mu_i = \mu + \tau_i$`. In this case we will have `$\sum_{i = 1}^a{\tau_i} = 0$`, i.e, `$\tau_1 + \tau_2 + \tau_3 = 0$`

### Assumptions
In ANVOA model, we assume `$\varepsilon_{ij} \sim \mathrm{NID}(0, \sigma^2)$`
]

---

.left-column[
# ANOVA Model
## ANOVA table

```
          Df  Sum Sq Mean Sq F value
Weight1    2 1012033  506017    12.5
Residuals 21  851962   40570        
```
]
.right-column[
<img src="images/oneway-anova.png" width="80%" style="display: block; margin: auto;" />

.left-40-column[
#### Hypothesis

`\begin{aligned}
H_0: \tau_i &= 0 \text{ for i = 1, 2, 3} \\
H_1: \tau_i &\ne 0 \text{ for at least one }i
\end{aligned}`

#### Decision
From <a href="images/F005-2.png" data-fancybox>F-table</a>, `$F_0 > F_c$`. So we reject `$H_0$` at 95% confidence level.

]
.right-60-column[
<img src="Day1_files/figure-html/unnamed-chunk-21-1.png" width="90%" style="display: block; margin: auto 0 auto auto;" />
]
]

---

class: inverse, center, middle

# Exam Questions

---
.left-column[
# ANOVA Model
## Exam Questions
- <a href="images/2012-2b-question.png" data-fancybox="gallery">2012:</a>
<a href="images/2012-2bcd.png" data-fancybox = "gallery">2(b), 2(c)</a>
- <a href="images/2012-appendix-45.png" data-fancybox="gallery">2012: Appendix 4-5</a>
]
.right-column[

### Question 2(b)
**Hypothesis:**
`\begin{aligned}
H_0 &:\tau_1 = \tau_2 = \tau_3 = 0 \\
H_1 &:\text{At least one }\tau_i \ne 0, i = 1, 2, 3
\end{aligned}`

We have p-value in ANOVA table given in Appendix 4. 
**If p-value is _less than_ the level of significance `$\alpha$`, we reject `$H_0$`**

### Question 2(c)
Since, `$\mu_i = \mu + \tau_i$`. All group means `$\mu_1, \mu_2, \mu_3$` and overall mean `$\mu$` are given, you can find `$\tau_1, \tau_2$` and `$\tau_3$`.
]