Next: Least Squares Regression (Continued) Up :Main Previous :Newton-Gregory Backward Difference Interpolation polynomial:

Least-Squares Regression

Let us suppose that the given data $ (x_{i},y_{i})$, $ i=1..n$ is inexact and has substantial error, right from their source where they are obtained. Experimental data is usually scattered and is a good example for inexact data. Polynomial interpolation is inappropriate in such cases. To understand this let us look at the following graphical representation of some scattered data:

Fig :1(a) Scattered Data;

Fig :1(b) A polynomial fit oscillating beyond the range of the data;

Fig :1(c) An approximate fit for data.

Now a look at the data in figure 1(a) tells us that the data has increasing trend i.e. higher values of y are associated with higher values of x. As in figure 1(b) if we fit an eigth order interpolation polynomial, it passes through the data exactly but oscillates due to the scattered nature of data and also goes well beyond the range suggested by data. Hence a more appropriate way is to find a function as shown in fig 1(c), that fits the shape or general trend of the data. One of the standard techniques for finding such a fit is Least-Square Regression.

2.4.1 Least Square Method:

The principle of least squares is one of the popular methods for finding a curve fitting a given data. Say $ (x_{1},y_{1})$, $ (x_{2},y_{2}),....(x_{n},y_{n})$ be n observations from an experiment. We are interested in finding a curve

$\displaystyle y=f(x)\qquad(1)$

Closely fitting the given data of size 'n'. Now at $ x=x_{1}$ while the observed value of $ y$ is $ y_{1}$, the expected value of $ y$ from the curve $ (1)$ is $ f(x_{1})$. Let us define the residual by

Likewise, the residuals at all other points are given by

....................(3)

...........................

 

Some of the residuals may be positive and some may be negative. We would like to find the curve fitting the given data such that the residual at any $ x_{i}$ is as small as possible. Now since some of the residuals are positive and others are negative and as we would like to give equal importance to all the residuals it is desirable to consider sum of the squares of these residuals, say and thereby find the curve that minimizes $ E$. Thus, we consider

and find the best representative curve (1) that minimizes (4).

2.4.2 Least Square Fit of a Straight Line

Suppose that we are given a data set $ (x_{1},y_{1}),(x_{2},y_{2}),(x_{3},y_{3}),......,(x_{n},y_{n})$ of observations from an experiment. Say that we are interested in fitting a straight line

to the given data. Find the ' ' residuals by:

Now consider the sum of the squares of i.e

$ by:
\par
$

Note that is a function of parameters a and b. We need to find a,b such that is minimum. The necessary condition for to be minimum is given by:

The condition yields:

i.e                              

Similarly the condition yields

Equations (5) and (6) are called as normal equations,which are to be solved to get desired values for a and b.

The expression for i.e (3) can be re-written in a convenient way as follows:

Example: Using the method of least squares, find an equation of the form

that fits the following data:

Solution: Consider the normal equations of least square fit of a straight line i.e

Here =5.

From the given data, we have,

Therefore the normal equations are given by:

30a +10b =243 ................(3)

10a+5b=76.......................(4)

On solving (3) and (4) we get

a = 9.1 , b= - 3 ................................................................(5)

Hence the required fit for the given data is

y=9.1x - 3  ...................... ..(6)


Next: Least Squares Regression (Continued) Up :Main

Previous :Newton-Gregory Backward Difference Interpolation polynomial: