22 Sample Size and Power

22.1 Key Publications

Start here, its short:

Snijders TAB (2005). Power and Sample Size in Multilevel Linear Models. In: Everitt BS, Howell DC (Hrsg.). Encyclopedia of Statistics in Behavioral Science. Chichester, UK: John Wiley and Sons, Ltd. doi: 10.1002/0470013192.bsa492

This paper includes some nice power curves for reference:

Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample sizes for organizational research using multilevel modeling. Organizational Research Methods, 12(2), 347-367.

This paper tabulated the effect of number of clusters, size of cluser, and ICC:

Maas, C. J., & Hox, J. J. (2005). Sufficient Sample Sizes for Multilevel Modeling. Methodology, 1(3), 86-92.

This paper focuses on binary outcomes in hierarchical or clustered strkucture:

Moineddin R, Matheson FI, Glazier RH. (2007). A simulation study of sample size for multilevel logistic regression models. BMC medical research methodology, 7, 34. doi:10.1186/1471-2288-7-34

This paper presents a data simulation method for estimating power for commonly used relationships research designs (via MPlus) and includes two worked examples from relationships research.

Lane, S. P., & Hennes, E. P. (2018). Power struggles: Estimating sample size for multilevel relationships research. Journal of Social and Personal Relationships, 35(1), 7-31.

This paper is very clean and organized with clear notation, tables, and figures. It investigates the performance of random effect binary outcome multilevel models under varying methods of estimation, level-1 and level-2 sample size, outcome prevalence, variance component sizes, and number of predictors using SAS software

Schoeneberger, J. A. (2016). The impact of sample size and other factors when estimating multilevel logistic models. The Journal of Experimental Education, 84(2), 373-397.

This paper’s focus is three level models:

Kerkhoff, D., & Nussbeck, F. W. (2019). The influence of sample size on parameter estimates in three-level random-effects models. Frontiers in psychology, 10.

22.2 R packages

22.2.1 powerlmm

powerlmm package described in:

Raudenbush, S. W., and L. Xiao-Feng (2001). “Effects of Study Duration, Frequency of Observation, and Sample Size on Power in Studies of Group Differences in Polynomial Change.” Psychological Methods 6 (4): 387–401.

Kristoffer Magnusson has posted an examle walk-through called Power Analysis for Two-level Longitudinal Models with Missing Data

You can also access an interactive shiny interfaces with the following code (once you install the package in R):

library(powerlmm)
shiny_powerlmm()

22.2.2 simr

simr package computed power analysis for generalised linear mixed models (GLMMs) by Monte Carlo simulation and is designed to work with models fit using the ‘lme4’ package.

It includes tools for:

running a power analysis for a given model and design; and
calculating power curves to assess trade‐offs between power and sample size

The paper below presents a tutorial using a simple example of count data with mixed effects (with structure representative of environmental monitoring data) to guide the user along a gentle learning curve, adding only a few commands or options at a time.

Green, P., & MacLeod, C. J. (2016). SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493-498.

22.2.3 sjstats::smpsize_lmm()

Note: this is for ‘standard designs’ and is very simple

sjstats::smpsize_lmm() compute an approximated sample size for linear mixed models (two-level-designs), based on power-calculation for standard design and adjusted for design effect for 2-level-designs.

Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.
Hsieh FY, Lavori PW, Cohen HJ, Feussner JR (2003). An Overview of Variance Inflation Factors for Sample-Size Calculation. Evaluation and the Health Professions 26: 239-257.
Snijders TAB (2005). Power and Sample Size in Multilevel Linear Models. In: Everitt BS, Howell DC (Hrsg.). Encyclopedia of Statistics in Behavioral Science. Chichester, UK: John Wiley and Sons, Ltd. doi: 10.1002/0470013192.bsa492

22.2.4 MLPowSim

MLPowSim is a free-download that guides you through questions and then writes R Syntax for you based on your responses.

22.3 Online Interactive Interfaces

No, G*Power won’t help you with this.

22.3.1 GLIMMPSE

GLIMMPSE 2.0 from the University of Colorado Denver, School of Public Health (NIH)

Sample Size Shop Website
Slides two examples start on slide 62

22.3.2 WebPower

WebPower Statistical Power Analysis Online or http://psychstat.org/crt2arm

Miao (“Michelle”) Yang, Mar 2015

22.4 Stand-alone Computer Programs

22.4.1 Optimal Design

Note: Works on Windows but not Mac OS.

Optimal Design was created by Steve Raudenbush and colleagues.

THis program estimates power using the intraclass correlation, effect size, a level, and sample sizes for cluster-randomized, multisite, and repeated measures designs.

The user can manipulate one factor at a time to examine the impact on power.

All results are presented graphically as power curves, which is helpful for understanding how power could be affected by particular changes in sample sizes, effect sizes, and intraclass correlations.

The program is user friendly and comes with extensive documentation.

Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2: 173–185.
Raudenbush, S.W., & Liu, X.-F. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5: 199–213.
Raudenbush, S.W., & Liu, Xiao-Feng. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6: 387–401.

22.4.2 PinT

PinT: Power in Two-levels was created by Tom Snijders, Roel Bosker, and Henk Guldemond.

This is the oldest program, but it can be used to estimate the standard errors of simple fixed effects and cross-level interactions. It can provide standard error estimates for a variety of complex models.

The major difficulty in using this program is that it requires the user to input the means, variances, and covariances for all explanatory variables and the variance and covariance for the random effects.
The major advantage is that an extensive user manual is available and the formulas used by the program are presented in Snijders and Bosker (1993). This program is recommended for models that include several Level 1 or Level 2 variables.
Snijders, T.A.B. & Bosker, R.J. (1993). Standard errors and sample sizes for two-level research. Journal of Educational Statistics, 18: 237–259.

22.4.3 RMASS2

RMASS2 calculates the sample size for a two-group repeated measures design, allowing for attrition, according to:

Hedeker, D., Gibbons, R.D., & Waternaux, C. (1999). Sample size estimation for longitudinal designs with attrition: comparing time-related contrasts between two groups. Journal of Educational and Behavioral Statistics, 24:70–93.

22.4.4 ACluster

ACluster calculates required sample sizes for various types of cluster randomized designs, not only for continuous but also for binary and time-to-event outcomes, as described in:

Donner, A., & Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in Health Research. London: Arnold.