Lecture 01: Welcome to ESRM 64103

Experimental design in Education

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

2025-01-13

1 Overview

Presentation Outline

  1. Self introduction

  2. The syllabus

    1. Goal of this class

    2. Delivery method of class materials

    3. Assignments

    4. Policy

    5. Weekly schedule

  3. Brief introduction to statistical software

Self Introduction

University of Kansas (2015-2019)

University of Iowa (2019-2022)

Chinese University of Hong Kong (2022-2023)

My job

Your turn

  • Tell me:

    1. Your name
    2. Department name
    3. Which year in your program
    4. Why you are interested in Experimental design

About this class

  • No examination

  • Light homeworks, 3-5 assignments with multiple-choice questions and light calculations

  • To teach big picture and general directions rather than statistical details

  • A lot of examples

  • All materials use R!

This Course is:

  1. Required for Graduate Certificate in Educational Statistics and Research Methods
  2. Prerequisite to Multiple Regression and Applied Multivariate Statistics
  3. to provide the broad picture of one of the most popular research design - group comparisons

Learning Path of Graduate Certificate

Statistics Learning Path:

  1. ESRM 64003 — Educational Statistics and Data Processing: Establish fundamental statistical skills.
  2. ESRM 64103 — Experimental Design: Learn research design methodologies.
  3. ESRM 64203 — Multiple Regression: Explore complex relationships between variables.
  4. Elective: ESRM 64503 — Applied Multivariate Statistics: Extend analysis to multiple dependent and independent variables.
  5. Elective: ESRM 65203 — Structural Equation Modeling: Develop expertise in complex model testing.

Measurement Learning Path:

  1. ESRM 64003 — Educational Statistics and Data Processing: Gain foundational knowledge in statistics.
  2. ESRM 64103 — Experimental Design: Acquire skills in designing educational research.
  3. ESRM 64203 — Multiple Regression: Understand the influence of multiple predictors.
  4. Elective: ESRM 66503 — Measurement and Evaluation: Focus on assessment and evaluation techniques.
  5. Elective: ESRM 67503 — Item Response Theory: Specialize in modern test theory applications.

Class Time

  1. Unit 1 (17:00 - 17:45): Lectures about concepts
  2. Unit 2 (18:00 - 18:45): Examples
  3. Unit 3 (19:00 - 19:45): Self Practice with R Code on your laptop and Ask Questions

What To Expect This Semester

  • Philosophy: Focus on accessibility + learning-by-doing

    • The AMS class heavily emphasize on hands-on task-oriented practices

    • No anxiety-prone tasks (e.g., hand calculations, memorizing formulas)

    • No anxiety-prone methods of evaluation (e.g., timed tests)

  • Materials:

    • Lecture slides present concepts—the what and the why

    • Example documents: reinforce the concepts and demonstrate the how using software—R packages

    • All available at the course website (hosted outside of Blackboard)

      • Let me can show you how to use the website

Assignments and Grading

  • Participants will have the opportunity to earn up to 100 total points in this course.

  • Up to 88 points can be earned from homework assignment (3~6 assignments in total)

  • Up to 12 points may be earned from submitting in-class quiz. In-class quiz will be delivered randomly in class. These will be graded on effort only—incorrect answers will not be penalized.

  • Bonus points (10 points)

    • There may be other opportunities to earn extra credit at the instructor’s discretion.
  • Late Assignment:

    • Assignments submitted any time after the deadline will incur a 1-point penalty. However, extensions will be granted as needed for extenuating circumstances (e.g., conferences, comprehensive exams, family obligations) if requested at least two weeks in advance of the due date.

Homework delivery

  1. Online Microsoft form or Word document via Email
  2. https://www.menti.com/alu3mgbzvd6g

Our Other Responsibility

  • My job (besides providing materials and assignments):

    • Answer questions via email, in individual meetings, or in group-based zoom office hours—you can each work on homework during office hours and get immediate assistance (and then keep working)

      • Email me first
  • Your job (in descending order of timely importance):

    • Ask questions—preferably in class, but any time is better than none

    • Frequently review the class material, focusing on mastering the vocabulary, logic, and procedural skills

    • Don’t wait until the last minute to start homework, and don’t be afraid to ask for help if you get stuck on one thing for more than 15 minutes

      • Please email me (jzhang@uark.edu) a screenshot of your code+error so I can respond easily
    • Practice using the software to implement the techniques you are learning on data you care about

    • Do the readings for a broader perspective and additional example (best after the lecture)

More About Your Experience in this Class

  • Attendance: Strongly recommended but not required

    • Please do not attend in-person if you might be sick!

    • Please do not attend if you received the inclement weather notification

    • You can also join the class via Zoom

    • You won’t miss out: I will post YouTube recordings (audio + screen share) by requested at the course website.

  • Changes will be sent via email by 9 am on class days

    • I will update the homework and in-class quiz links on class days. If not uploaded, then there are two situations: (1) I forget to do that. I will re-upload later and notify you by emails. (2) I decide not to upload it or remove it.

    • I may change to zoom-only for dangerous weather or if I am sick.

Statistical Software

  • I will show examples primarily using R and R packages. Some important R packages include:

    • Tidyverse: a comprehensive R package including multiple mini packages for multiple data cleaning, data transformation.

    • ggplot2: a popular package for data visualization

  • Why not SPSS?

    • SPSS could only be used for some—but not all–of our content

    • More importantly, it doesn’t have as much room to grow; R has many new packages being developed via CRAN and GitHub

  • Why not SAS?

    • SAS is not open-sourced, meaning that we cannot check source code if something goes wrong

    • SAS is also commercial, but R is free

What We Will Cover

  • Hypothesis Testing
  • ANOVA
    • One-way
    • Two-way
    • Repeated-measure
  • ANCOVA
  • Linear Mixed Model

2 Unit 2: Brief Introduction to R

Why R?

  • There are some point to consider

    • R packages are only as good as their authors (so little quality control)

    • Syntax and capabilities are idiosyncratic to the packages

  • The good things are:

    • If you really master R, you can do by yourself (write your own algorithm for complex model)

    • You can check the source code of R packages and know where issues come from

    • You can communicate with R package authors and provide some suggestions

    • You can be R package author yourself and be famous

What is R

  • R is an comprehensive statistical and graphical programming language

  • We can use R language via multiple graphical user inferences or IDE, i.e., terminal, VS Code or RStudio.

    • We will mainly focus on RStudio because of its convenience

    • Rstudio is a product of posit company and is free to use for personal use

More RStudio

Installation of R and RStudio

  • You can download and install R base via r-project.org (currently R-4.4.1)

  • Then, after the installation of R, you can download RStudio via posit.co (currently)

  • After installation of R and RStudio, you can open up the RStudio to start your R programming.

    • however, your R only has the base package

    • To enhance its utility, most users will install R packages for certain purposes

R packages

  • R packages are uploaded to some platforms (i.e., CRAN or Github) by researchers or companies

    • Those R packages typically have their version numbers. Some functions may be available for some version (like Ver. 1.1) but not be available in other versions.

    • Do not upgrade your packages if you code is running well

  • R users are free to download and use those R packages

    • To download certain package, you should know package name

    • For example, if you want to download the latest version of tidyverse package, you can type in following command in the console panel of Rstudio

    install.packages("tidyverse")
    • Or if you want to install the older version of package
    require(devtools)
    install_version("tidyverse", version = "1.3.0", repos = "http://cran.us.r-project.org")

More about R packages

  • CRAN (Comprehensive R Archive Network) is a network of servers around the world that store identical, up-to-date, versions of code and documentation for R.
    • It contains most stable version of packages.
    • Most of time, we download package from CRAN
  • Github is for the fast development for R packages
    • It contains the up-to-date version of R which may potentially be unstable

    • You can download the package from Github using pak package

      pak::pak("tidyverse/ggplot2")
    • You can update the package and its dependencies

      pak::pkg_install("ggplot2", upgrade = TRUE)

R functions

  • To operate certain tasks, you need to use functions contained in R packages

    • There are two ways of using R functions

    • Direct way: you don’t have to load your package first

    • Use-after-load way: Package is loaded in your session before you can call the function name without specifying the package name

      require(pak)
      pkg_install("ggplot2", upgrade = TRUE)

R functions (Cont.)

  • How do you know you already load the package or not

  • You can use sessionInfo function

    sessionInfo()
    R version 4.4.1 (2024-06-14)
    Platform: aarch64-apple-darwin20
    Running under: macOS 15.2
    
    Matrix products: default
    BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
    LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    time zone: America/Chicago
    tzcode source: internal
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] cmdstanr_0.8.1.9000 rstan_2.32.6        StanHeaders_2.32.10
    
    loaded via a namespace (and not attached):
     [1] tensorA_0.36.2.1     utf8_1.2.4           generics_0.1.3      
     [4] digest_0.6.37        magrittr_2.0.3       evaluate_1.0.1      
     [7] grid_4.4.1           fastmap_1.2.0        jsonlite_1.8.9      
    [10] processx_3.8.4       pkgbuild_1.4.5       backports_1.5.0     
    [13] ps_1.8.1             gridExtra_2.3        fansi_1.0.6         
    [16] QuickJSR_1.4.0       scales_1.3.0         codetools_0.2-20    
    [19] abind_1.4-8          cli_3.6.3            rlang_1.1.4         
    [22] munsell_0.5.1        yaml_2.3.10          tools_4.4.1         
    [25] inline_0.3.20        parallel_4.4.1       checkmate_2.3.2     
    [28] dplyr_1.1.4          colorspace_2.1-1     ggplot2_3.5.1       
    [31] curl_6.0.0           vctrs_0.6.5          posterior_1.6.0     
    [34] R6_2.5.1             matrixStats_1.4.1    stats4_4.4.1        
    [37] lifecycle_1.0.4      V8_6.0.0             pkgconfig_2.0.3     
    [40] RcppParallel_5.1.9   pillar_1.9.0         gtable_0.3.6        
    [43] loo_2.8.0            glue_1.8.0           Rcpp_1.0.13-1       
    [46] xfun_0.49            tibble_3.2.1         tidyselect_1.2.1    
    [49] rstudioapi_0.17.1    knitr_1.49           htmltools_0.5.8.1   
    [52] rmarkdown_2.29       compiler_4.4.1       distributional_0.5.0
  • It outputs multiple info:

    • R version, Operations System, Matrix operation package, Locale

    • Attached packages (you can call the functions of those package)

    • Loaded package via a namespace (and not attached), which you cannot call functions and need to library or require them

Run R code

  • After you finish R script, you have multiple ways of running the code:

    • Method 1: you can click Run button in the top right-head of Rstudio

    • Method 2: you can select certain code and press Ctrl + Enter (Win) or Command + Return (Mac)

    • Method 3: you can Rscript [FILENAME].r to run the whole script

    • Method 4: you can using R notebook to interactively run R code

Script file is .R Script file is .rmd or .qmd
Run the whole script
  • Method 1
  • Method 3
  • Method 4
Run the partial script
  • Method 2
  • Method 4

Exercise 01: Make friends with R

Summary

  1. Note that the syllabus, schedule, and all materials are uploaded online the week before class.
  2. We learn that R, Rstudio, and Quarto (.qmd) can be used to execute R code/syntax.
  3. In-class quiz will be administered randomly. Should be quick and easy. Don’t be stressful ever!
  4. Office hour will be 2PM - 4PM on Monday. Feel free to stop by my office and ask questions.
  5. Next week, we will start to learn some basics about Experimental Design