22 Sample Size and Power
22.1 Key Publications
Start here, its short:
- Snijders TAB (2005). Power and Sample Size in Multilevel Linear Models. In: Everitt BS, Howell DC (Hrsg.). Encyclopedia of Statistics in Behavioral Science. Chichester, UK: John Wiley and Sons, Ltd. doi: 10.1002/0470013192.bsa492
This paper includes some nice power curves for reference:
- Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample sizes for organizational research using multilevel modeling. Organizational Research Methods, 12(2), 347-367.
This paper tabulated the effect of number of clusters, size of cluser, and ICC:
- Maas, C. J., & Hox, J. J. (2005). Sufficient Sample Sizes for Multilevel Modeling. Methodology, 1(3), 86-92.
This paper focuses on binary outcomes in hierarchical or clustered strkucture:
- Moineddin R, Matheson FI, Glazier RH. (2007). A simulation study of sample size for multilevel logistic regression models. BMC medical research methodology, 7, 34. doi:10.1186/1471-2288-7-34
This paper presents a data simulation method for estimating power for commonly used relationships research designs (via MPlus) and includes two worked examples from relationships research.
- Lane, S. P., & Hennes, E. P. (2018). Power struggles: Estimating sample size for multilevel relationships research. Journal of Social and Personal Relationships, 35(1), 7-31.
This paper is very clean and organized with clear notation, tables, and figures. It investigates the performance of random effect binary outcome multilevel models under varying methods of estimation, level-1 and level-2 sample size, outcome prevalence, variance component sizes, and number of predictors using SAS software
- Schoeneberger, J. A. (2016). The impact of sample size and other factors when estimating multilevel logistic models. The Journal of Experimental Education, 84(2), 373-397.
This paper’s focus is three level models:
- Kerkhoff, D., & Nussbeck, F. W. (2019). The influence of sample size on parameter estimates in three-level random-effects models. Frontiers in psychology, 10.
22.2 R packages
22.2.1 powerlmm
powerlmm
package described in:
- Raudenbush, S. W., and L. Xiao-Feng (2001). “Effects of Study Duration, Frequency of Observation, and Sample Size on Power in Studies of Group Differences in Polynomial Change.” Psychological Methods 6 (4): 387–401.
Kristoffer Magnusson has posted an examle walk-through called Power Analysis for Two-level Longitudinal Models with Missing Data
You can also access an interactive shiny
interfaces with the following code (once you install the package in R):
library(powerlmm)
shiny_powerlmm()
22.2.2 simr
simr
package computed power analysis for generalised linear mixed models (GLMMs) by Monte Carlo simulation and is designed to work with models fit using the ‘lme4’ package.
It includes tools for:
- running a power analysis for a given model and design; and
- calculating power curves to assess trade‐offs between power and sample size
The paper below presents a tutorial using a simple example of count data with mixed effects (with structure representative of environmental monitoring data) to guide the user along a gentle learning curve, adding only a few commands or options at a time.
- Green, P., & MacLeod, C. J. (2016). SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493-498.
22.2.3 sjstats::smpsize_lmm()
Note: this is for ‘standard designs’ and is very simple
sjstats::smpsize_lmm()
compute an approximated sample size for linear mixed models (two-level-designs), based on power-calculation for standard design and adjusted for design effect for 2-level-designs.
Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.
Hsieh FY, Lavori PW, Cohen HJ, Feussner JR (2003). An Overview of Variance Inflation Factors for Sample-Size Calculation. Evaluation and the Health Professions 26: 239-257.
Snijders TAB (2005). Power and Sample Size in Multilevel Linear Models. In: Everitt BS, Howell DC (Hrsg.). Encyclopedia of Statistics in Behavioral Science. Chichester, UK: John Wiley and Sons, Ltd. doi: 10.1002/0470013192.bsa492
22.2.4 MLPowSim
MLPowSim is a free-download that guides you through questions and then writes R Syntax for you based on your responses.
22.3 Online Interactive Interfaces
No, G*Power won’t help you with this.
22.3.1 GLIMMPSE
GLIMMPSE 2.0 from the University of Colorado Denver, School of Public Health (NIH)
Slides two examples start on slide 62
22.4 Stand-alone Computer Programs
22.4.1 Optimal Design
Note: Works on Windows but not Mac OS.
Optimal Design was created by Steve Raudenbush and colleagues.
THis program estimates power using the intraclass correlation, effect size, a level, and sample sizes for cluster-randomized, multisite, and repeated measures designs.
The user can manipulate one factor at a time to examine the impact on power.
All results are presented graphically as power curves, which is helpful for understanding how power could be affected by particular changes in sample sizes, effect sizes, and intraclass correlations.
The program is user friendly and comes with extensive documentation.
Raudenbush, S.W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psychological Methods, 2: 173–185.
Raudenbush, S.W., & Liu, X.-F. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5: 199–213.
Raudenbush, S.W., & Liu, Xiao-Feng. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6: 387–401.
22.4.2 PinT
PinT: Power in Two-levels was created by Tom Snijders, Roel Bosker, and Henk Guldemond.
This is the oldest program, but it can be used to estimate the standard errors of simple fixed effects and cross-level interactions. It can provide standard error estimates for a variety of complex models.
The major difficulty in using this program is that it requires the user to input the means, variances, and covariances for all explanatory variables and the variance and covariance for the random effects.
The major advantage is that an extensive user manual is available and the formulas used by the program are presented in Snijders and Bosker (1993). This program is recommended for models that include several Level 1 or Level 2 variables.
Snijders, T.A.B. & Bosker, R.J. (1993). Standard errors and sample sizes for two-level research. Journal of Educational Statistics, 18: 237–259.
22.4.3 RMASS2
RMASS2 calculates the sample size for a two-group repeated measures design, allowing for attrition, according to:
- Hedeker, D., Gibbons, R.D., & Waternaux, C. (1999). Sample size estimation for longitudinal designs with attrition: comparing time-related contrasts between two groups. Journal of Educational and Behavioral Statistics, 24:70–93.
22.4.4 ACluster
ACluster calculates required sample sizes for various types of cluster randomized designs, not only for continuous but also for binary and time-to-event outcomes, as described in:
- Donner, A., & Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in Health Research. London: Arnold.