Dummy variables in Stata | Stata tasks |
IndexDocs Resources RunExamples ? | Back |
Dummy (logical) variables in Stata take values of 0, 1 and missing.
The most common use of dummy variables is in modelling, for instance using regression (we will use this as a general example below). For this use you do not need to create dummy variables as the variable list of any command can contain factors and operators based on factors generating indicator (dummy) variables.
When you are generating indicator variables (dummy variables, contrasts) from a categorical variables like the continent variable, you need to omit one of the categories (base or reference categories). In all regression examples below one of the continents will be omitted, i.e. in the regression you will find 5 out of the six continents. By default the first (smallest) value will be used as reference category; there is a ib operator to indicate other base values
regress infmor urb i.continent | 5 indicator variables, the first continent is the base category |
regress infmor urb ib2.continent | continent 2 is the base |
regress infmor urb ib(first).continent | First continent is the base, same as i.continent |
regress infmor urb ib(last).continent | Last continent is base |
regress infmor urb ib(freq).continent | The continent with the highest frequency count is base |
If you wish to contrast a specific continent, e.g. Asia against all others your can wite (both forms are equivalent)
regress infmor 1.continent regress infmor i1.continent
See the documentation for further variations
Generate a dummy variable: Countries below 50% of urbanization=0, above 50=1
generate urbdum = 0 replace urbdum= 1 if urb>50
Or shorter
generate urbdum= (urb>50)
generate urbdum= (urb>50) produces the variable as when urb>50 is true Stata produces a value or 1 (for true) and 0 otherwise (=false).
There is however a problem with this when you have missing values in the variable. Stata stores missing values as positive infinity, i.e. a very large positive value, i.e. a value of 1. If you wish to avoid this, you need to treat missing values specifically, namely
generate urbdum=0 replace urbdum=1 if urb>50 replace urbdum= . if missing(urb)
or
generate urbdum1= urb>50 if !missing(urb)
Examples showing how to create a dummy variable from a categorical variable, continent here:
generate Asia=continent==1 generate America=continent==4 | continent==5
This creates a variable with value of 1 if the condition is true and 0 if the condition is false. (In Stata logical values are represented by 0/1 (false/true).
The tabulate command has an option to generate automatically dummy variables from
a categorical variable:
tabulate continent, generate(cont)
Produces the variables shown to the left.
recode can also be used as shown here:
recode v3 (min/20=1 Rich ) (else=0 Not_rich) , generate (d6) label(Dummy_richcountry)
Labels can be specified directly. The above example creates a new variable d6 from v3, values below 20 will be set to 1 (labelled "Rich" in the new variable), 0 otherwiese
As dummy variables are logical variables you can use them with if to simplify the use of filters. Assuming that you have created: generate America=continent==4 | continent==5 you can simply write
list urb infmor country if America
To list only american countries.