Business
and economic statisticians are generally interested in the way in which a
dependent variable is related to more than one independent variable.
Whereas
a simple regression includes only one
independent (explanatory) variable, multiple
regression involves two or more independent variables. There are two
important reasons why a multiple regression must often be used instead of a
simple regression.
1) One frequently can predict
the dependent variable more accurately if more than one independent variable is
used. Thus, total monthly consumption of a household may be affected not only
by income, but also by its total assets. Thus, the consumption function will be
written as:
Yi = A + B1. X1i +
B2X2i + ei
where Y = consumption, X1 = income, and X2
= assets.
2) If a dependent variable
depends on more than one independent variable, a simple regression of the
dependent variable on a single variable will result in a biased estimate of the
effect of this independent variable on the dependent variable.
The basic logic of the multiple regression is
basically the same as that of simple regression, even though the derivation of
least square formulae are more involved and needs matrix algebra. For your
purposes, it is enough to know that the t-tests for individual coefficients are
exactly the same. The coefficient of determination and the standard error of
regression are also defined the same way. Many statistical softwares (like
Microsoft Excel) will do the necessary calculations for you.
Dummy Variable
Techniques:
Often
qualitative variables (for example, male or female, white/nonwhite, rents or
owns a home, etc.) are represented by dummy variables in a regression analysis.
A
dummy variable Hi takes two arbitrary values (0, 1). For example,
the variable takes value1 if the individual is male, and 0 if female. Thus, a
regression equation can be written as
Yi
= A + B1. Xi + B2Hi + ei
where
Y is (for example) the wage rate of an employee, X = education, and H a dummy
variable, indicating the gender of the individual.
In
order to interpret this regression, note that for males the regression becomes
(since H = 1 for all males):
Yi
= A + B1. Xi + B2+ ei
= (A+B2) + B1.
Xi + ei
whereas
for females the regression is (since H
= 0 for all females):
Yi = A + B1. Xi + ei.
Thus, the coefficient of the dummy variable is the
additional value of the intercept for males, the slope being the same for the
two groups. The two regressions (one for males, one for females) is depicted in
Figure 13.4 (page 531).
Let
us look at example in Table 13.5 in page 530.
A regression of savings (S) on income (I) and home ownership (dummy variable, H) can be written as:
Si = A + B. Ii + C. Hi
+ ei.
Based
on data given in Table 13.5, the following regression equation was estimated:
Sihat= -.320 + .067. Ii
+ .827. Hi.
According to this estimated equation, a $1000 increase in income will result in an increase in savings of .067 thousand dollars, or $67.50. Holding income constant, a home owning family saves an additional .827 thousand dollars (or $827) more than a renting family.