Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-03-07
Now, think about our example of the effects of teaching methods (M1, M2, M3, M4) and measurement forms (F1, F2, F3, F4) to math performance.
Answer
The RCBD utilizes an additive model (two-way ANOVA without interaction)
Both the treatments and blocks can be considered as random effects rather than fixed effects, if the levels were selected at random from a population of possible treatments or blocks. We consider this case later, but it does not change the test for a treatment effect.
What are the consequences of not blocking if we should have? Generally the unexplained error in the model will be larger, and therefore the test of the treatment effect less powerful.
How to determine the sample size in the RCBD?
\[ Y_{ij} = \mu + \tau_i + \rho_j + \epsilon_{ij} \]
where:
\(\bar{y}_{. .}\): the grand mean across all factor levels. \(y_{i j}\): the observed outcome for each individual.
\(\bar{y}_{i .}\): marginal means of treatment groups; \(\bar{y}_{j .}\): marginal means of blocks
We can partition the total sum of squares of outcome Y: \(\mathrm{SS}_{\mathrm{T}}=\sum \sum\left(y_{i j}-\bar{y}_{. .}\right)^{2}\) into:
\[ \mathrm{SS}_{\mathrm{T}}= n_b \sum\left(\bar{y}_{i .}-\bar{y}_{. .}\right)^{2}+ n_a \sum\left(\bar{y}_{. j}-\bar{y}_{. .}\right)^{2}+\sum \sum\left(y_{i j}-\bar{y}_{i .}-\bar{y}_{. j}+\bar{y}_{. .}\right)^{2} \]
\(\mathrm{SS}_{\mathrm{treatment}}= n_b \sum\left(\bar{y}_{i .}-\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = a -1\)
\(\mathrm{SS}_{\mathrm{block}}= n_a \sum\left(\bar{y}_{. j}-\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = b -1\)
\(\mathrm{SS}_{\mathrm{Residual}}= \sum \sum\left(y_{i j}-\bar{y}_{i .}-\bar{y}_{. j}+\bar{y}_{. .}\right)^{2}\) with \(\mathrm{df} = (n_a-1)(n_b -1)\)
\[ \mathrm{SS}_{\mathrm{T}} = \mathrm{SS}_{\mathrm{treatment}} + \mathrm{SS}_{\mathrm{block}} + \mathrm{SS}_{\mathrm{Residual}} \]
\[ SS_{Total} = \sum_{i=1}^{n_a}\sum_{j=1}^{n_b}(y_{ij})^2-(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
\[ SS_{Treatment} = \frac{1}{n_b}\sum{(y_{i.})}^2 -(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
\[ SS_{Block} = \frac{1}{n_a}\sum{(y_{.j})}^2 -(\sum_{i=1}^{n_a}\sum_{j=1}^{n_b}y_{ij})^2/N \]
Background
An experiment was designed to study the performance of four different detergents in cleaning clothes. The following βcleannessβ readings (higher=cleaner) were obtained with specially designed equipment for three different types of common stains. Is there a difference between the detergents?
| Detergent | Stain1 | Stain2 | Stain3 |
|---|---|---|---|
| 1 | 45 | 43 | 51 |
| 2 | 47 | 46 | 52 |
| 3 | 48 | 50 | 55 |
| 4 | 42 | 37 | 49 |
Marginal Sums of treatment: \(y_{i.}\); R code: rowSums(detergents[, 2:4])
Marginal Sums of Stain: \(y_{.j}\); R code: colSums(detergents[, 2:4])
[1] 110.9167
[1] 110.9167
[1] 135.1667
[1] 135.1667
\[ F = \frac{SS_{\mathrm{treatment}}/n_a}{SS_\mathrm{residual}/ ((n_a-1)*(n_b-1))} \]
SS_total <- sum((detergents[, 2:4])^2) - (sum(detergents[, 2:4]))^2 / 12
SS_treatment <- 3 * sum((treatment_marginal_Sums/3 - grand_mean)^2)
SS_block <- 4 * sum((block_marginal_Sums/4 - grand_mean)^2)
SS_residual = SS_total - SS_treatment - SS_block
cat("Sum of square of residual errors:\n")
SS_residualSum of square of residual errors:
[1] 18.83333
Df Sum Sq Mean Sq F value Pr(>F)
Detergent 3 110.92 36.97 11.78 0.00631 **
Stain 2 135.17 67.58 21.53 0.00183 **
Residuals 6 18.83 3.14
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation: Detergents have significant differences, and the Stain type was a useful blocking factor.

Note
DV: Midterm Score (in cells)

Note
DV: Midterm Score (in cells)
IV: Time (π= 2 )
Nuisance factor: Tutor (π= 4 )
DV: Midterm Score
Now:
Thus, we can partition the effects into three parts:
Sum of squares due to treatments (IV = Time),
Sum of squares due to the blocking factor,
and Sum of squares due to error.
We do not model an interaction with blocked designs. (we will talk about it later.)

\[ SS_{\mathrm{Total}}=\sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2 \]
[1] 9.325
[1] 0.0875
Do this for everyone and then sum over all people \(\sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2\)
However, when calculating Sum of Squares for IVs: \(SS_{Model}\), \(SS_{Block}\), \(SS_{error}\), we need to compute βmarginal meansβ
For example:

\[ SS_{Total} = \sum_{i=1}^{n}(y_{ij}-\bar{y}_{..})^2=15060.48 \]
\[ SS_{Model(time)} = \sum_{a=1}^{a}n_a(\bar{y}_{a.}-\bar{y}_{..})^2=4489.02 \]
where \(n_a\) is the group size for AM/PM and \(\bar{y}_{a.}\) are the marginal means for AM and PM {22.95, 13.43}
This is similar to how we computed \(ππ_{πππππ}\) before: marginal group mean subtract off the grand mean and square it. Sum over all groups.
\[ SS_{Block} = \sum_{b=1}^{b}n_{b}(\bar{y}_{.b}-\bar{y}_{..})^2=3239.43 \]
Technically, the blocking factor is just another IV (but we are not interested in or is not within the scope of research question).
Note
Under \(\alpha=.05\), for βModelβ factor β Time, we have \(df_{Model}\) = 1, \(df_{error} = 195\): \(F_{crit}=3.89\) so sig.
Similarly, for βBlockingβ - Tutor, we have \(df_{block}\) = 3, \(df_{error} = 195\): \(F_{crit}=2.65\) so sig.
The Rockwell hardness test
The Rockwell hardness test is a hardness test based on indentation hardness of a material. The Rockwell test measures the depth of penetration of an indenter under a large load (major load) compared to the penetration made by a preload (minor load).
Metal Tip Hardness
1 Metal1 Tip1 9.9
2 Metal2 Tip1 9.5
3 Metal3 Tip2 9.4
4 Metal4 Tip2 9.3
5 Metal5 Tip3 9.6
6 Metal6 Tip3 9.0
7 Metal7 Tip4 9.8
8 Metal8 Tip4 9.1
If we conduct this as a blocked experiment, we would assign all four tips to the same test specimen, randomly assigned to be tested on a different location on the specimen. Since each treatment occurs once in each block, the number of test specimens is the number of replicates.
Back to the hardness testing example, the experimenter may very well want to test the tips (treatment) across specimens (block) of various hardness levels. This shows the importance of blocking. To conduct this experiment as a RCBD, we assign all 4 tips to each specimen.
Suppose that we use b = 4 blocks as shown in the table below:
We are primarily interested in testing the equality of treatment means, but now we have the ability to remove the variability associated with the nuisance factor (the blocks) through the grouping of the experimental units prior to having assigned the treatments.
tribble(
~`1`, ~`2`, ~`3`, ~`4`,
"Tip 3", "Tip 3", "Tip 2", "Tip 1",
"Tip 1", "Tip 4", "Tip 1", "Tip 4",
"Tip 4", "Tip 2", "Tip 3", "Tip 3",
"Tip 2", "Tip 1", "Tip 4", "Tip 3"
) |>
gt() |>
tab_header(
title = "The Hardness Testing Experiment",
subtitle = "Randomized Complete Block Design"
) |>
tab_spanner(
label = "Test Coupon (Block)",
columns = everything()
) |>
tab_options(
table.width = px(500),
table.font.size = px(20)
)| The Hardness Testing Experiment | |||
| Randomized Complete Block Design | |||
Test Coupon (Block)
|
|||
|---|---|---|---|
| 1 | 2 | 3 | 4 |
| Tip 3 | Tip 3 | Tip 2 | Tip 1 |
| Tip 1 | Tip 4 | Tip 1 | Tip 4 |
| Tip 4 | Tip 2 | Tip 3 | Tip 3 |
| Tip 2 | Tip 1 | Tip 4 | Tip 3 |
Important
Notice the two-way structure of the experiment. Here we have four blocks and within each of these blocks is a random assignment of the tips within each block.
aov(). We can see four levels of the Tip and four levels for Coupon: Df Sum Sq Mean Sq F value Pr(>F)
Tip 3 0.385 0.12833 14.44 0.000871 ***
Coupon 3 0.825 0.27500 30.94 4.52e-05 ***
Residuals 9 0.080 0.00889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note
The Analysis of Variance table shows three degrees of freedom for Tip three for Coupon, and the residual (error) degrees of freedom is nine.
The ratio of mean squares of treatment over error gives us an F ratio that is equal to 14.44 which is highly significant since it is greater than the .001 percentile of the F distribution with three and nine degrees of freedom.
Our 2-way analysis also provides a test for the block factor, Coupon. The ANOVA shows that this factor is also significant with an F-test = 30.94. So, there is a large amount of variation in hardness between the pieces of metal.
This is why we used specimen (or coupon) as our blocking factor. We expected in advance that it would account for a large amount of variation. By including block in the model and in the analysis, we removed this large portion of the variation, such that the residual error is quite small. By including a block factor in the model, the error variance is reduced, and the test on treatments is more powerful.
Tip N_Tip Hardness_Tip Coupon N_Coupon Hardness_Coupon
1 1 4 9.575 1 4 9.400
2 2 4 9.600 2 4 9.425
3 3 4 9.450 3 4 9.725
4 4 4 9.875 4 4 9.950
y gender env
1 5.5 male ah
2 5.0 male ac
3 4.0 female ah
4 6.2 female ac
Try to obtain (1) the sum of squares for treatment and block (2) F-statistics. Then, interpret the results.
Df Sum Sq Mean Sq F value Pr(>F)
gender 1 0.0225 0.0225 0.012 0.930
env 1 0.7225 0.7225 0.396 0.642
Residuals 1 1.8225 1.8225
0.7225<<1.8225,i.e, here blocking wasnβt necessary. And as Pr value is 0.642 > 0.05 (5% significance) we fail to reject the null hypothesis - there is no sufficient evidence suggesting females and males have significant differences in performance.GENDER, STATUSSchools, ClassesIn this lecture, we covered:
aov() function with formula: Outcome ~ Treatment + BlockImportant

ESRM 64503