Design of Experiment and Analysis of Variance

class: center, middle, inverse, title-slide

# Design of Experiment and Analysis of Variance
## Repetition
### Raju Rimal
### 30 Aug, 2017

---

background-image: url(https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

???

Image credit: [NMBU](https://www.nmbu.no/sites/all/themes/nmbu_university/images/logo-nb.png)

---
class: center, middle, inverse

# ANOVA Model

---

.left-column[
# ANOVA Model
## Random Effect Model

- <a href="images/2011-1cd.png" data-fancybox="2011" data-caption="Exam 2011: 1(c) and 1(d)">Exam 2011: 1(c), 1(d)</a><a href="images/2011-a2.png" data-fancybox="2011" data-caption="Appendix 2"></a>
- <a href="images/2012-1c.png" data-fancybox="2012" data-caption="Exam 2012: 1(c)">Exam 2012: 1(c)</a><a href="images/2012-a2.png" data-fancybox="2012" data-caption="Appendix 2"></a>
- <a href="images/2013-1c.png" data-fancybox="2013" data-caption="Exam 2013: 1(c)">Exam 2013: 1(c)</a><a href="images/2013-a2.png" data-fancybox="2013" data-caption="Appendix 2"></a>
- <a href="images/2014-1e.png" data-fancybox="2014" data-caption="Exam 2014: 1(e)">Exam 2014: 1(e)</a><a href="images/2014-a4.png" data-fancybox="2014" data-caption="Table 4"></a>
- <a href="images/2015-2c.png" data-fancybox="2015" data-caption="Exam 2015: 2(c)">Exam 2015: 2(c)</a><a href="images/2015-a6.png" data-fancybox="2015" data-caption="Table 6"></a>

].right-column[
### Intraclass Correlation Coefficient
Proportion of variation between groups to total variation.
`$$\rho = \frac{\sigma_\tau^2}{\sigma_\tau^2 + \sigma^2}$$`
Using estimates of variance components, we can estimate **intraclass correlation coefficient**.

### Confidence interval of overall mean
The `$100(1-\alpha)$` level of confidence interval for overall mean `$\mu$` in case of random effect model is,
`$$\hat{\mu} \pm t_{\alpha/2, a-1}\sqrt{\frac{\text{MS}_\text{treatment}}{N}}$$`
(Refer to Thore's Lecture on Random effect Model)
]

---

.left-column[
# ANOVA Model
## Random Effect Model
### Intraclass Correlation
`$$\rho = \frac{\sigma_\tau^2}{\sigma_\tau^2 + \sigma^2}$$`
### CI for overall mean
`$$\hat{\mu} \pm t_{\alpha/2, a-1}\sqrt{\frac{\text{MS}_\text{treatment}}{N}}$$`

].right-column[
### Interpretation of Intraclass correlation coefficient
- *Proportaion of variation* between groups to total variation
- Correlation between the observation **within same group**
- In `besettning` and `fettprosent` example, if the correlation is 0.90 shows that the major variation in `fettprosent` is explained by besettning and thus the `cows` in each besettning is more identical and has correlation of 0.90.

### Interpretation of Overall Mean
We can extend interpretation of overall mean for whole population

For example, if `besetning` (farms) is a random factor, than the overall mean can refer to the average `fettprosent` in the milk from the entile population of besetning.
]

---

.left-column[
# ANOVA Model
## Random Effect Model

<img src="Day3_files/figure-html/unnamed-chunk-2-1.png" width="90%" />
.side-caption[
- <a href="images/2014-a4.png" data-fancybox="2014">Table 4: Anova Output</a>
- <a href="images/F0025.png" data-fancybox="2014">F distribution table at 0.025 level</a>
]].right-column[
### Confidene interval for correlation
`$L$` and `$U$` gives the confidence interval for `$\sigma_\tau^2/\sigma^2$`.

`\begin{aligned}
L &= \frac{1}{n}\left(
\frac{\text{MS}_\text{treatments}}{\text{MSE}} \frac{1}{F_{\alpha/2, a-1, N-a}} - 1
\right) \\
U &= \frac{1}{n}\left(
\frac{\text{MS}_\text{treatments}}{\text{MSE}} \frac{1}{F_{1-\alpha/2, a-1, N-a}} - 1
\right)
\end{aligned}`

So, the confidence interval for `$\rho = \frac{\sigma_\tau^2}{(\sigma_\tau^2 + \sigma^2)}$` is,
`$$\frac{L}{1 + L} \le \frac{\sigma_\tau^2}{\sigma_\tau^2 + \sigma^2} \le \frac{U}{1 + U}$$`

Here, `$L = 8.39$` and `$U = 158.385$`, so, the confidence interval for `$\rho$` is (0.893, 0.994).
]

---

.left-column[
# ANOVA Model
## Two factors

<img src="Day3_files/figure-html/unnamed-chunk-5-1.png" width="90%" />
.side-caption[
- Do we need interaction?
- How about Gender, is it significant?
- What can we see if interaction is not significant?
- Is blocking a two factor model?
]].right-column[

### ANOVA model with two factors
`$$y_{ijk} = \mu + \tau_i + \beta_j + (\tau\beta)_{ij} + \varepsilon_{ijk}$$`
where, `$\varepsilon_{ijk} \sim N(0, \sigma^2)$`, `$i = 1, 2, \ldots, a(3)$`, `$j = 1, 2, \ldots, b(2)$` and `$k = 1, 2, \ldots n(4)$`
When `$\mu$` is a overall mean, we will have,

`\begin{aligned}
\sum_{i = 1}^a{\tau_i} = 0, && \sum_{j = 1}^b{\beta_j} = 0, && \sum_{i = 1}^a\sum_{j = 1}^b{(\tau\beta)_{ij}} = 0
\end{aligned}`

```
Analysis of Variance Table

Response: Weight2
               Df  Sum Sq Mean Sq F value  Pr(>F)    
Weight1         2 1012033  506017   13.03 0.00032 ***
Gender          1  124704  124704    3.21 0.08992 .  
Weight1:Gender  2   28433   14217    0.37 0.69842    
Residuals      18  698825   38824                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---

.left-column[
# ANOVA Model
## Two factors

```
                          Estimate
(Intercept)                 3582.1
Weight1(High)                234.2
Weight1(Low)                -265.8
Gender(Boy)                   72.1
Weight1(High):Gender(Boy)     29.2
Weight1(Low):Gender(Boy)     -48.3
```

].right-column[

### Prediction
<a href="images/2013-2ab.png" data-fancybox="2013">Exam 2013: 2(b)</a><a href="images/2013-a3.png" data-fancybox="2013"></a> wants us to predict the weight of second child (girl) if the first child has `High` weight.

**The Model:**
`$$y_{ijk} = \mu + \tau_i + \beta_j + (\tau\beta)_{ij} + \varepsilon_{ijk}$$`
where, `$\varepsilon_{ijk} \sim N(0, \sigma^2)$`, `$i = 1, 2, 3$`, `$j = 1, 2$` and `$k = 1, 2, 3, 4$`
So, the predicted weight for `Girl` child whose first sibling has `High` weight is,

`$$\hat{y}_{\texttt{High, girl}} = \hat{\mu} + \hat{\tau}_\texttt{High} + \hat{\beta}_\texttt{Girl} + (\widehat{\tau\beta})_\texttt{High, Girl}$$`
.right-column[

```
       factor2 Weight1(High) Weight1(Low) Weight1(Medium)
1  Gender(Boy)          29.2        -48.3              NA
2 Gender(Girl)            NA           NA              NA
```
].left-column[

```
          factor1 coef
1   Weight1(High)  234
2    Weight1(Low) -266
3 Weight1(Medium)   NA
```
]
.full-width[
`$$\hat{y}_{\texttt{High, girl}} = 3582.083 + (234.167) + (-72.083) + (-29.167) = 3715\text{ gram}$$`
]]

---

.left-column[
# ANOVA Model
## Two factors
<img src="Day3_files/figure-html/unnamed-chunk-12-1.png" width="90%" />
<img src="Day3_files/figure-html/unnamed-chunk-13-1.png" width="90%" />

].right-column[
### Reducing a two factors Model
In compulsory assignment 3(c), you are asked to choose between <a href="javascript:;" data-fancybox data-src="#hidden-content">two models.</a>

.hidden[
<div id="hidden-content">
<h3>Model 1</h3>

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Df </th>
   <th style="text-align:right;"> Sum Sq </th>
   <th style="text-align:right;"> Mean Sq </th>
   <th style="text-align:right;"> F value </th>
   <th style="text-align:right;"> Pr(&gt;F) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fortype </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 49.3 </td>
   <td style="text-align:right;"> 12.32 </td>
   <td style="text-align:right;"> 9.54 </td>
   <td style="text-align:right;"> 0.001 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Dommer </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 29.0 </td>
   <td style="text-align:right;"> 9.67 </td>
   <td style="text-align:right;"> 7.48 </td>
   <td style="text-align:right;"> 0.004 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:right;"> 15.5 </td>
   <td style="text-align:right;"> 1.29 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

<h3>Model 2</h3>
<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Df </th>
   <th style="text-align:right;"> Sum Sq </th>
   <th style="text-align:right;"> Mean Sq </th>
   <th style="text-align:right;"> F value </th>
   <th style="text-align:right;"> Pr(&gt;F) </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Fortype </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 49.3 </td>
   <td style="text-align:right;"> 12.32 </td>
   <td style="text-align:right;"> 4.15 </td>
   <td style="text-align:right;"> 0.018 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 44.5 </td>
   <td style="text-align:right;"> 2.97 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

</div>
]

<details>
<summary>What happened when <code>Dommer</code> is removed from <code>Model 1</code>?</summary>

When we remove a significant factor, its variation adds up to residual rising the model error. When the model error increases, the difference between the treatment becomes more difficult to see.

</details>

### Interaction Term and Degree of freedom

With only one observation for each combination of `Fortype` and `Dommer` we cannot include _interaction term_ in the model.

No _degree of freedom_ left for residuals. So, we will only be able to find the estimate, but can not perform any kind of test for there significance.
]

---

.left-column[
# ANOVA Model
## Model Assessment

- Error should be random, i.e. free from any kind of pattern

- Error should be have constant variation for all the groups

].right-column[
### Assumption of random error with constant variance
<img src="Day3_files/figure-html/unnamed-chunk-18-1.png" width="48%" /><img src="Day3_files/figure-html/unnamed-chunk-18-2.png" width="48%" />
### Normality of Error term
]

---

.left-column[
# ANOVA Model
## Model Assessment

- Error (Residuals) should be randomly distribution

- All the error should align with Normal Q-Q plot

- You can also see histogram and/or density plot and compare with normal distribution plot

].right-column[
### Assumption of random error with constant variance

### Normality of Error term
<img src="Day3_files/figure-html/unnamed-chunk-20-1.png" width="45%" /><img src="Day3_files/figure-html/unnamed-chunk-20-2.png" width="45%" /><img src="Day3_files/figure-html/unnamed-chunk-20-3.png" width="45%" /><img src="Day3_files/figure-html/unnamed-chunk-20-4.png" width="45%" />
]

---
class: center, middle, inverse

# Best of Luck
# Lykke til

---
class: center, middle, inverse