Regression and Correlation techniques are concerned with how one variable is related to another variable. In this chapter we describe the nature of these techniques, and how they are applied. Specifically our objectives are:
1.
To
show how to calculate a sample regression line.
2.
To indicate how the sample regression line
can be used to predict the value of the dependent variable on the basis of
knowledge of the independent variable.
3.
To
calculate and interpret the sample correlation coefficient, and to test whether
the population correlation is zero.
REGRESSION
ANALYSIS:
Regression analysis identifies a data based equation that can be used to estimate the unknown value of one variable (dependent variable) based on the known value of the other variable (independent variable).
Example- A hosiery mill wants to
produce 4 tons of output next month, but does not know how much will it cost.
Regression analysis can be used to estimate the value of the costs based on the
known value of output.
Scatter
Diagram: If one plots available data on Y and X, one gets a Scatter diagram.
Example: Table 12.1 and Figure 12.1 (page 452).
Scatter
diagram provides a good visual portrait of the relationship between the
dependent and the independent variable, and gives impressions about the
following three important questions:
Is the relationship Direct
or Inverse?
Is the relationship linear
or non-linear?
How strong is the
relationship?
See Figure 12.2.
THE LINEAR REGRESSION MODEL:
A
model is a simplified or idealized representation of the real world.
First,
the statistician visualizes the population of all relevant pairs of values of Y
and X.
See
Figure 12.3.
The
probability of Y, given a specified value of X, is called the conditional probability distribution of Y
denoted by
P (Y|X)
The
mean of this conditional distribution is denoted by mY.X and the standard deviation
of this distribution is denoted by sy.x.
Regression
analysis makes the following assumptions about the conditional probability
distribution of Y.
1)mY.X = A + B.X. –-- a linear function.
2) sy.x is same, independent of the values of X.
3) The values of Y are independent of one another.
4) P (Y|X) is Normal with mean mY.X and standard deviation sy.x. This assumption is necessary
for tests of significance, and not for point estimates.
Yi = A + B.Xi
+ ei, i=1, 2, …, N.
Essentially,
ei is the error term, - a random amount that is added to A+BXi.
The
Regression Model - Figure 12.4
The
population regression line.
The
Sample Regression line:
Yhat
= a + bX, where a and b are point estimates of A and B respectively.