Missing values and analysis

Missing values and analysis	SPSS
Resources ?	Back

Valid and missing cases

Automatic handling of missing values are one of the key features of any statistical package. To avoid stupid mistakes, it is essential to know on how many observations your current analysis is really based. Especially with multivariate procedures the automatic missing value deletion might reduce the number of valid observations drastically, if you are note careful. As always a preliminary diagnosis of your variables helps you avoid this, but still make sure to check with every procedure you run that the number of valid observations included in the analysis is sufficient.

All procedures in SPSS show information on valid/missing values usually at the very beginning of the procedure output. This usually means that you will have to look for it by scrolling up to the very beginning of the procedure output...

Let us have a look at some examples:

Frequencies

Frequencies produces an output frame labelled Statistics showing globally valid and missing values.

The frequency table clearly shows all frequencies, including those for user missing and SYSMIS ("system") values.

Crosstabs

CROSSTABS, like many, but not all statistical procedures produce, when started, a Case processing summary for the tables produced. Note that here the different types of missing values are not distinguished.

Regression

Regression is one of those exceptions, i.e. it does not produce a Case Processing Summary; in fact by default no information is shown about the number included and rejected (missing values) observations. Thetable shown has to be requested explicitly from Statistics in the regression dialog (select Descriptives).

Descriptives

For interval scaled variables - in fact for all variables if you disregard some of the statistics produced - Analyze > Descriptive Statistics > Descriptives produces a simple table with descriptive statistics for all variables listed, i.e. a good idea to produce that list for all variables you intend to analyze simultaneously, e.g. with a multiple regression.

You can see the valid cases for each variable separately, and - at the end of the table - the listwise n, i.e. the total number of valid observations for all variables in the table considered together, i.e. if these variables went together into a multiple regression it would be base on that number of observations.

This second example illustrates an extreme situation where one of the variables has mostly missing values, that together with some additional missing cases from the other variables reduce the number of observations to even fewer valid observations.