Missing values and analysis | SPSS |
Resources ? | Back |
Automatic handling of missing values are one of the key features of any statistical package. To avoid stupid mistakes, it is essential to know on how many observations your current analysis is really based. Especially with multivariate procedures the automatic missing value deletion might reduce the number of valid observations drastically, if you are note careful. As always a preliminary diagnosis of your variables helps you avoid this, but still make sure to check with every procedure you run that the number of valid observations included in the analysis is sufficient.
All procedures in SPSS show information on valid/missing values usually at the very beginning of the procedure output. This usually means that you will have to look for it by scrolling up to the very beginning of the procedure output...
Let us have a look at some examples:
Frequencies produces an output frame labelled
Statistics
showing globally valid and missing values.
The frequency table clearly shows all
frequencies, including those for user missing and SYSMIS ("system") values.
CROSSTABS, like many, but not all statistical procedures produce, when started,
a Case processing summary for the tables produced. Note that
here the different types of missing values are not distinguished.
Regression is one of those exceptions, i.e. it does not produce a Case
Processing Summary; in fact by default no information is shown about the number
included and rejected (missing values) observations.
Thetable shown has to be requested explicitly from
in the regression dialog (select Descriptives).
For interval scaled variables - in fact for all variables if you disregard some of the statistics produced -
produces a simple table with descriptive statistics for all variables listed, i.e. a good idea to produce that list for all variables you intend to analyze simultaneously, e.g. with a multiple regression.
You can see the valid cases for each variable separately, and - at the end of
the table - the listwise n, i.e. the total number of valid observations for all
variables in the table considered together, i.e. if these variables went together into
a multiple regression it would be base on that number of observations.
This second example illustrates an extreme situation where one of the variables
has mostly missing values, that together with some additional missing cases from
the other variables reduce the number of observations to even fewer valid observations.