Design of Experiment and Analysis of Variance

class: center, middle, inverse, title-slide

# Design of Experiment and Analysis of Variance
## Repetition
### Raju Rimal
### 30 Aug, 2017

---

background-image: url(https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

???

Image credit: [NMBU](https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

---
class: center, middle, inverse

# ANOVA Model

---

.left-column[
# ANOVA Model
## ANOVA table

```
          Df  Sum Sq Mean Sq F value
Weight1    2 1012033  506017    12.5
Residuals 21  851962   40570        
```
]
.right-column[

.left-40-column[
#### Hypothesis

`\begin{aligned}
H_0: \tau_i &= 0 \text{ for i = 1, 2, 3} \\
H_1: \tau_i &\ne 0 \text{ for at least one }i
\end{aligned}`

#### Decision
From <a href="images/F005-2.png" data-fancybox>F-table</a>, `$F_0 > F_c$`. So we reject `$H_0$` at 95% confidence level.

]
.right-60-column[
<img src="Day2_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto 0 auto auto;" />
]

### Questions
a) What is the 95% **confidence interval of error variance**?

b) We found that there is significant effect of first child's weight on second child, but _how different is the effect of `High` from `low` and `medium`_?

]

---

.left-column[
# ANOVA Model
## Confidence Interval
<img src="Day2_files/figure-html/unnamed-chunk-6-1.png" width="90%" />
<img src="Day2_files/figure-html/unnamed-chunk-7-1.png" width="90%" />

```
          Df  Sum Sq Mean Sq F value
Weight1    2 1012033  506017    12.5
Residuals 21  851962   40570        
```

]
.right-column[
### Confidence interval of error variance
`$$\dfrac{\text{SSE}}{\chi^2_{\alpha/2, N-a}} 
\le \sigma^2 \le
\dfrac{\text{SSE}}{\chi^2_{1-(\alpha/2), N-a}}$$`

Using values in previous slide in above formula,
`$$\left[
\dfrac{851962.5}{35.479}, 
\dfrac{851962.5}{10.283}
\right] = \left[
24013.2, 
82852.4
\right]$$`

At **95% confidence level**, we can say that the true error variance `$\sigma^2$` lies between the interval (24013.2, 82852.4).

<h3><a href="images/2011-1b.png" data-fancybox="2011">Exercise: Exam 2011 1(b)</a><a href="images/2011-a2.png" data-fancybox="2011"></a></h3>
]

---
.left-column[
# ANOVA Model
## Post-hoc Test
<img src="Day2_files/figure-html/unnamed-chunk-9-1.png" width="90%" />
.side-caption[
Think what will happen if you increase the confidence level from 0.05 to 0.1.
]
].right-column[
### Tukey Pairwise Comparison
For multiple testing, Tukey has suggested **studentized range statistic** as,
`$$q = 
\dfrac{\bar{y}_\text{max} - \bar{y}_\text{max}}
{\sqrt{\text{MSE}/n}}$$`
Compare `$q$` with `$q_\alpha(a, f)$` found in <a href="images/qtable005.png" data-fancybox="gallery">table</a><a href="images/qtable001.png" data-fancybox="gallery"></a>. We declare a pair to be significantly different if `$q > q_{\alpha}(a, f)$`.

### Confidence Interval
`$$(\bar{y}_i - \bar{y}_j) \pm q_\alpha(a, f)\sqrt{\frac{\text{MSE}}{n}}$$`
`$$(\bar{y}_i - \bar{y}_j) - q_\alpha(a, f)\sqrt{\frac{\text{MSE}}{n}} \le \mu_i - \mu_j \le (\bar{y}_i - \bar{y}_j) + q_\alpha(a, f)\sqrt{\frac{\text{MSE}}{n}}$$`
You can also check if this interval contains zero.
]

---

.left-column[
# ANOVA Model
## Post-hoc Example
<img src="Day2_files/figure-html/unnamed-chunk-10-1.png" width="90%" />

```
 Weight1 mean n
    High 3816 8
     Low 3316 8
  Medium 3614 8
```

].right-column[
<h3><a href = "images/2013-2.png" data-fancybox="gallery">Exam 2013, 2(d)</a></h3>

From the <a href="images/2013-a1.png" data-fancybox="gallery">Appendix 1</a> and <a href="images/2013-a4.png" data-fancybox="gallery">Appendix 4</a>, we have,

```
          Df  Sum Sq Mean Sq F value   Pr(>F)
Weight1    2 1012033  506017    12.5 0.000269
Residuals 21  851962   40570      NA       NA
```

Using <a href="images/qtable005.png" data-fancybox="gallery">studentized, T-distribution table</a> `$q_{0.05}(3, 21) = 3.565$`. So,

`$$T_{0.05} = q_{0.05}(3, 21)\sqrt{\frac{40569.643}{8}} = 253.845$$`

We have three possible pairs for comparison: `Low-High`, `Medium-High` and `Medium-Low`. The difference in their means are: <code>-500, -202.5, 297.5</code> respectively.

If compared with `$T_{0.05}(3, 21)$`, we see that at 95% confidence level, we find `Low-High` and `Medium-Low` differ significantly.

<h3><a href="images/2015-1e.png" data-fancybox="2015">Exam 2015, 1(e)</a><a href="images/2015-a2.png" data-fancybox="2015"</a></h3>
]

---

.left-column[
# ANOVA Model
## Contrast test
<img src="Day2_files/figure-html/unnamed-chunk-13-1.png" width="90%" />
].right-column[
### Use cases

- Comparison of one group with average of other group
- Comparing treatments with control

<h3><a href="images/2014-2d.png" data-fancybox="2011">Exam 2014: 2(d)</a></h3>
**Hypothesis:**
`$$H_0: \frac{1}{2}(\mu_1 + \mu_2) - \frac{1}{2}(\mu_3 + \mu_4) = 0 \text{ vs }
H_1: \frac{1}{2}(\mu_1 + \mu_2)- \frac{1}{2}(\mu_3 + \mu_4) \ne 0$$`

**Decision:**

From <a href="images/2014-a8.png" data-fancybox="2011">Table 8</a>, `$\text{p-value }(0.065) > 0.05$`.
<details>
<summary><em>What should we conclude?</em></summary>
.small[
At 5% significance level we cannot reject null hypothesis. So, we can not argue that expected differs significantly between January-June and July-December.
]
</details>
]

---

.left-column[
# ANOVA Model
## Contrast Calculation
<img src="Day2_files/figure-html/unnamed-chunk-14-1.png" width="90%" />
<a href="images/2014-2d.png" data-fancybox="2011">Continue on Exam 2014: 2(d)</a><a href="images/2014-a8.png" data-fancybox="2011"></a>
].right-column[
Before any compution, **formulate a hypothesis**.

### Estimate of Contrast
`$$\Gamma = \sum_{i = 1}^a{c_i\mu_i} = \frac{1}{2}(\mu_1 + \mu_2) - \frac{1}{2}(\mu_3 - \mu_4) = -5.5$$`
Here the contrast coefficients `$c_i$` are: `$1/2, 1/2, -1/2, -1/2$`. Contrast coefficients sum to zero. For estimation of `$\Gamma$`, use respective sample mean. Also use <a href="images/2014-a6.png" data-fancybox>Table 6</a> for following calculations.

### Standard Error and test statistic
`\begin{aligned}
\text{SE}(\hat{\Gamma}) = \sqrt{\frac{\text{MSE}}{n}\sum_{i = 1}^a{c_i^2}} = 2.82 && 
t = \frac{\hat{\Gamma}}{\text{S.E}(\hat{\Gamma})} \sim t_{\alpha/2, N-a}
\end{aligned}`
]

---

.left-column[
# ANOVA Model
## Random Effect Model
### Random factor
`$$\tau_i \sim \mathrm{N}(0, \sigma_\tau^2)$$`
### Fixed factor
`$$\sum_{i = 1}^a{\tau_i} = 0$$`
].right-column[
### The Model
`$$y_{ij} = \mu + \tau_i + \varepsilon_{ij} \text{ where, } \varepsilon_{ij} \sim \mathrm{N}(0, \sigma^2)\text{ and } \tau_i \sim \mathrm{N}(0, \sigma^2_\tau)$$`

### Why Random effect model
_Specific levels are not of interest_. General variation is more important. Levels of a factor are **randomly selected**. For example, if `besetning` (farm) is not of interest, we randomly sample such farms from a population of farm.

Usually, **blocks** are taken as random factor.

### A comparison with Fixed Effect Model
_Specific levels of a factor is important_ than their general variation. These levels are **specifically chosen**. For instance, comparison of new and old drug where a particular drug is used for comparison.
]

---

.left-column[
# ANOVA Model
## Random Effect Model

<img src="Day2_files/figure-html/unnamed-chunk-15-1.png" width="90%" />
].right-column[
<h3><a href="images/2011-1a.png" data-fancybox="gallery">Exam 2011: 1(a)</a></h3>
**Hypothesis:**
`$$H_0: \sigma_\tau^2 = 0 \text{ vs } H_1: \sigma_\tau^2 > 0$$`
From <a href="images/2011-a2.png" data-fancybox="gallery">Appendix 2</a>, we see p-value is larger than `$\alpha = 0.05$`, we can not reject the null hypothesis. So, we claim that **at 95% confidence level** we claim that there is _not significant variation_ between the farms.

### Estimates of Variance Components
`$$\text{total variation var}(y_{ij}) = \sigma_\tau^2 + \sigma^2$$`
We estimate them as,
`$$\hat{\sigma}^2 = \text{MSE} \text{ and } \hat{\sigma_\tau}^2 = \frac{\text{MS}_\text{treatment} - \text{MSE}}{n}$$`
]

---

.left-column[
# ANOVA Model
## Random Effect Model

- <a href="images/2011-1cd.png" data-fancybox>Exam 2011: 1(c), 1(d)</a>
- <a href="images/2012-1c.png" data-fancybox>Exam 2012: 1(c)</a>
- <a href="images/2013-1c.png" data-fancybox>Exam 2013: 1(c)</a>
- <a href="images/2014-1e.png" data-fancybox>Exam 2014: 1(e)</a>
- <a href="images/2015-2c.png" data-fancybox>Exam 2015: 2(c)</a>

].right-column[
### Intraclass Correlation Coefficient
Proportion of variation between groups to total variation.
`$$\rho = \frac{\sigma_\tau^2}{\sigma_\tau^2 + \sigma^2}$$`
Using estimates of variance components, we can estimate intraclass correlation coefficient.

### Confidence interval of overall mean
The `$100(1-\alpha)$` level of confidence interval for overall mean `$\mu$` in case of **random effect model** is,
`$$\hat{\mu} \pm t_{\alpha/2, a-1}\sqrt{\frac{\text{MS}_\text{treatment}}{N}}$$`
(Refer to Thore's Lecture on Random effect Model)
]