Lecture 13

Multiple Regression

(Chapter 13)

 

Business and economic statisticians are generally interested in the way in which a dependent variable is related to more than one independent variable.

Whereas a simple regression includes only one independent (explanatory) variable, multiple regression involves two or more independent variables. There are two important reasons why a multiple regression must often be used instead of a simple regression.

 

1)  One frequently can predict the dependent variable more accurately if more than one independent variable is used. Thus, total monthly consumption of a household may be affected not only by income, but also by its total assets. Thus, the consumption function will be written as:

 

Yi = A + B1. X1i + B2X2i + ei

 

where Y = consumption, X1 = income, and X2 = assets.

 

2)  If a dependent variable depends on more than one independent variable, a simple regression of the dependent variable on a single variable will result in a biased estimate of the effect of this independent variable on the dependent variable.


 

 The basic logic of the multiple regression is basically the same as that of simple regression, even though the derivation of least square formulae are more involved and needs matrix algebra. For your purposes, it is enough to know that the t-tests for individual coefficients are exactly the same. The coefficient of determination and the standard error of regression are also defined the same way. Many statistical softwares (like Microsoft Excel) will do the necessary calculations for you.

 

Dummy Variable Techniques:

 

Often qualitative variables (for example, male or female, white/nonwhite, rents or owns a home, etc.) are represented by dummy variables in a regression analysis.

A dummy variable Hi takes two arbitrary values (0, 1). For example, the variable takes value1 if the individual is male, and 0 if female. Thus, a regression equation can be written as

       Yi = A + B1. Xi + B2Hi + ei

 

where Y is (for example) the wage rate of an employee, X = education, and H a dummy variable, indicating the gender of the individual.

In order to interpret this regression, note that for males the regression becomes (since H = 1 for all males):

 

 Yi = A + B1. Xi + B2+ ei

       = (A+B2) + B1. Xi + ei

 

whereas for females the regression is (since H  = 0 for all females):

 

Yi = A + B1. Xi + ei.

 

Thus, the coefficient of the dummy variable is the additional value of the intercept for males, the slope being the same for the two groups. The two regressions (one for males, one for females) is depicted in Figure 13.4 (page 531).

 

Let us look at example in Table 13.5 in page 530.

A regression of savings (S) on income (I) and home ownership (dummy variable, H) can be written as:

Si = A + B. Ii + C. Hi + ei.

 

Based on data given in Table 13.5, the following regression equation was estimated:

 

Sihat= -.320 + .067. Ii + .827. Hi.

According to this estimated equation, a $1000 increase in income will result in an increase in savings of .067 thousand dollars, or $67.50. Holding income constant, a home owning family saves an additional .827 thousand dollars (or $827) more than a renting family.