There are no specific commands in Stata to remove from analysis or the , you will first have to find out what observations are outliers and then remove them .
A more general solution is to define numerically what an outlier is and then specify the appropriate selection, i.e. find the inner fences.
One way of doing this is to used the lv command: It displays among other information the inner fences, and then use this in a logical expression:
keep if inrange(var,lf,uf)where var is the variable of interest, lf/uf the lower and upper inner fences as you find them on the output of the lv command.
A solution that does not require to produce output and copy values by hand is to compute the inner fences: finding the first and third quartile (the hinges) and the interquartile range to define numerically the inner fences. Unfortunately the lv command does not set a result variable that contains the fences, but provides the information needed, namely r(l_F) and r(u_F) contain the hinges (lower and upper Fourth). Therefore the expressions
r(l_F)-(1.5*(r(u_F) - r(l_F))) r(u_F)+(1.5*(r(u_F) - r(l_F)))correspond to the lower and upper fences, set at 1.5 interquartile ranges from the lower and upper quartile (fourth). So we now can write:
quietly: lv gnpgrow | with quietly we do not see the output but produce the corresponding results. |
keep if inrange(gnpgrow,r(l_F)-(1.5*(r(u_F) - r(l_F))) ,r(u_F)+(1.5*(r(u_F) - r(l_F)))) | Keep only the observations inside the fences |
Note that this will also delete all missing values, as they are simply very large numbers in Stata, if you wish to keep them you will have to add a check for missings to the if condition