Oval seal on OSU campus

In the world of research, missing data is a common problem. For example, participants may refuse to respond to a sensitive question because they are uncomfortable answering it. In longitudinal studies, some participants drop out and are not available for all measurements.

Typically, researchers have relied on traditional approaches to deal with missing data (e.g., listwise deletion, mean substitution); however, most of the time participants with missing data are systematically different from those who do not have missing data. Using traditional approaches can produce unreliable and biased estimates, which in turn may lead to incorrect interpretation of results.

Recommended approaches

Full information maximum likelihood (FIML) and multiple imputation (MI) are two approaches considered by methodologists as state-of-the-art because simulation studies have repeatedly shown they yield less biased estimates than traditional approaches.

FIML is considered a model-based approach for dealing with missing data. It handles the missing data and estimates parameters and standard errors in one step. The software implementing this approach reads the raw data and maximizes the FIML function one case at a time with whatever data is available. Many commercial software programs offer the FIML approach (e.g., Mplus, Stata, SAS). The open source R environment also offers packages for conducting FIML.

MI uses three steps to deal with missing data

  • Multiple imputed datasets are created that replace the missing values with plausible values that preserve the variability in the original data.
  • All of the multiple data sets are analyzed to produce multiple results.
  • The multiple results are then pooled to get one set of parameter estimates.

Many commercial software programs can be used to conduct MI (e.g., Mplus, SPSS, Stata). Free options also are available (e.g., R and Blimp).

Missing data myths

Q. Does MI just make up data?

A. No. The goal of MI is to preserve information in the original data and not treat the imputed values as real data.

Q. Why should I include auxiliary variables (e.g., demographics related to missingness)?

A. Incorporating a number of auxiliary variables helps increase statistical power and reduces bias in parameter estimates.

Writing it up

What are some of the elements that should be included when writing about missing data?

  • The percentage of missing for each variable and overall percentage of missing
  • The missing data mechanism (MCAR, MAR or MNAR)
  • The software that was used
  • The approach for dealing with the missing data

RMC resources for missing data

The RMC has just launched a resource and tutorial for dealing with missing data in our online RMC Methods Corner (rmc.ehe.osu.edu) that includes:

  • A detailed discussion of missing data, including information about the missing data mechanisms.
  • The recommended approaches for dealing with missingness (i.e., FIML and MI).
  • Available software for conducting FIML and MI.
  • How to conduct FIML in Mplus and R.
  • How to conduct MI in Mplus, R and SPSS.

Suggested Stories