Lecture 9

 

Regression and Correlation Techniques

Read Chapter 12 of the Text

 

 

Regression and Correlation techniques are concerned with how one variable is related to another variable. In this chapter we describe the nature of these techniques, and how they are applied. Specifically our objectives are:

1.    To show how to calculate a sample regression line.

2.     To indicate how the sample regression line can be used to predict the value of the dependent variable on the basis of knowledge of the independent variable.

3.    To calculate and interpret the sample correlation coefficient, and to test whether the population correlation is zero. 

 

REGRESSION ANALYSIS:

 

Regression analysis identifies a data based equation that can be used to estimate the unknown value of one variable (dependent variable) based on the known value of the other variable (independent variable).

Example- A hosiery mill wants to produce 4 tons of output next month, but does not know how much will it cost. Regression analysis can be used to estimate the value of the costs based on the known value of output.

 

Scatter Diagram: If one plots available data on Y and X, one gets a Scatter diagram. Example: Table 12.1 and Figure 12.1 (page 452).

Scatter diagram provides a good visual portrait of the relationship between the dependent and the independent variable, and gives impressions about the following three important questions:

Is the relationship Direct or Inverse?

Is the relationship linear or non-linear?

How strong is the relationship?

See Figure 12.2.

 

THE LINEAR REGRESSION MODEL:

A model is a simplified or idealized representation of the real world.

First, the statistician visualizes the population of all relevant pairs of values of Y and X.

See Figure 12.3.

The probability of Y, given a specified value of X, is called the conditional probability distribution of Y denoted by

                   P (Y|X)

The mean of this conditional distribution is denoted by mY.X and the standard deviation of this distribution is denoted by sy.x.

Regression analysis makes the following assumptions about the conditional probability distribution of Y.

1)mY.X  = A + B.X. –-- a linear function.

2) sy.x  is same, independent of the values of X.

3)  The values of Y are independent of one another.

4)  P (Y|X) is Normal with mean mY.X and standard deviation sy.x. This assumption is necessary for tests of significance, and not for point estimates.

 

 These assumptions together imply

 

              Yi = A + B.Xi + ei,   i=1, 2, …, N.

 

Essentially, ei is the error term, - a random amount that is added to A+BXi.

The Regression Model   - Figure 12.4

The population regression line.

 

The Sample Regression line:

 

Yhat = a + bX, where a and b are point estimates of A and B respectively.