(This article was originally published at Econometrics Beat: Dave Giles' Blog, and syndicated at StatsBlogs.)

In econometrics we often use "dummy variables", to allow for changes in estimated coefficients when the data fall into one "regime" or another. An obvious example is when we use such variables to allow the different "seasons" in quarterly time-series data.

I've posted about dummy variables several times in the past -

*e.g*., here.However, there's one important point that seems to come up from time to time in emails that I receive from readers of this blog. I thought that a few comments here might be helpful.

The following variable can legitimately called a "dummy variable":

D

= 0 ; otherwise.

The following variable is

N

= 1 ; if condition B holds (where A and B are mutually exclusive conditions)

= 2 ; otherwise. (Call this condition C, say.)

Let's see what's different about D

Let's add D

So, our model is:

y

where u

If D

On the other hand, suppose that we replace D

y

Now, if condition A holds, then the intercept is α; if condition B holds, then the intercept is (α + γ); and otherwise the intercept is (α + 2γ). Regardless of what the data tell us by way of an estimate for γ, the shift in the estimated intercept from condition A to condition C

We've essentially pre-judged part of the answer and imposed it before we even estimated the model! Generally, this is

The following variable can legitimately called a "dummy variable":

D

_{i}= 1 ; if a certain condition holds= 0 ; otherwise.

The following variable is

*not*a dummy variable:N

_{i}= 0 ; if condition A holds= 1 ; if condition B holds (where A and B are mutually exclusive conditions)

= 2 ; otherwise. (Call this condition C, say.)

Let's see what's different about D

_{i}and N_{i}, and then we can consider some further examples.Let's add D

_{i}as a regressor in a regression model. For simplicity I'll just add it (rather than interact it with another regressor) so that it just shifts the intercept. However, this doesn't affect any of the points that I make below.So, our model is:

y

_{i}= α + β x_{i}+ γ D_{i}+ u_{i}where u

_{i }is the random error term.If D

_{i}= 1, then the intercept is (α + γ); and if D_{i}= 0, then the intercept is just α. The estimated (positive or negative) "shift" in the intercept is just the estimate of γ that we obtain when we use (say) OLS.*The data entirely determine the magnitude of this shift*.On the other hand, suppose that we replace D

_{i}by N_{i}in our model:y

_{i}= α + β x_{i}+ γ N_{i}+ u_{i }.Now, if condition A holds, then the intercept is α; if condition B holds, then the intercept is (α + γ); and otherwise the intercept is (α + 2γ). Regardless of what the data tell us by way of an estimate for γ, the shift in the estimated intercept from condition A to condition C

*is constrained to be*twice the shift that we estimate from condition A to condition B.We've essentially pre-judged part of the answer and imposed it before we even estimated the model! Generally, this is

*not*something that we'd want to do.You might now ask yourself, does it make sense to use any of the following "dummy variables" as regressors?

- D
_{i}= 1, if condition A holds ; D_{i}= -1, if condition A does not hold. - D
_{i}= 0, if condition A holds ; D_{i}= 1, if condition B holds; D_{i}= -1, if condition C holds.

(In the second case, conditions A, B, and C are mutually exclusive and totally exhaustive.)

**Please comment on the article here:** **Econometrics Beat: Dave Giles' Blog**