3.3 General Linear Least Squares Technique
3.3.1 Choose your model functions wisely
It is possible to set up the normal equations for any arbitrary set of functions (just as we did for the quadratic formula above). The difficulty is that the more complicated the functions, the more difficult the formulation, and the more the risk that the solution to the normal equations becomes ill-behaved. This happens when the functions chosen are not very orthogonal (that is they are more similar than not), or when the measurements actually made are not well posed to distinguish between the differing functions. The latter happens more than people realize. Finally, the choice of functions don't really do a good job of matching the actual observations. Regardless of the underlying reason, when this happens the solutions end up being a delicate balance between very large numbers. This may yield the minimum chi squared for the actual measurements, but outside of the range of the measurements, or sometimes in between data points (if there are gaps) the solution does rather strange things. A classic example (and often the worst offender) is a polynomial regression to an abruptly changing data set which is otherwise very quiescent. Consider, for example a step function in data, which you try desperately to fit with increasing order of polynomial.
The left hand figure above shows the data, which is 0 for X less
than 0, and 1 for X greater than 0 (with a little noise added
on). The right hand plot shows the efforts of various order polynomial
fits ranging from order 3 (cubic) up to 13th order.
Note that you have to go to very high order (the cyan curve is
13th order) to begin to approximate the step function's
sharpness, but the price you pay is tight little oscillations
where you have data (kind of an induced "ringing") plus
wildly aberrant behavior outside of the data range. The latter
may be particularly troublesome if you wish to try extrapolating
your fits beyond your measurement range.
GoTo Lecture TOC
3.3.2 There Is an Easier Way: the Design Matrix Approach
The solution by means of the normal equations can be unstable, and it is rather tedious to have to build them when the functions are more complicated. Now we'll show you an easier way. By the way, the term "Linear Least Squares" means not that the functions you are fitting are linear, but rather that the formulation is linear in the coefficients. Thus you could have a function to fit that looks like, for example
which can be fit with linear least squares, as long as there are
no coefficients inside the parenthetically closed arguments to
the non-linear functions. For example, the following is definitely
a nonlinear regression prospect:
The culprit is the a2 term, which appears as part of the argument of a non-linear function. We will deal with this kind of problem in section 3.4.
Now about this supposed easy way. Well, think of constructing
a series of simultaneous equations with one equation for each
measurement (we'll use the nasty little equation we gave as an
example):
The above can be represented by a matrix called the design matrix which would look like:
It would be a N x 4 matrix (N measurements and 4 columns). Remember that all of the elements in this design matrix are simply numbers that you calculate from your x-data. Then you would have a column vector consisting of your 4 unknown coefficients (which you want to solve for):
And the column vector of your N observations:
The matrix version of these equations now reduces to Aa = y. Looks simple, doesn't it? But what about the weighting factors (the sigmas)? Well, just like the straight line case, you just divide your x entries and your y entries by sigma-squared. We won't repeat the matrices listed above, but for example the ith row of the design matrix would look like
and the ith element of the y-vector would also be divided by sigma-squared. Everything else proceeds as before.
3.3.3 Solving the Design Matrix Equation with SVD
Now that we've shown you how to build a design matrix, etc., we'll show you what to do. You cannot solve this as a set of simultaneous equations, because it is overdetermined. That is, there are more equations than unknowns. Now since data is always imperfect, the data points will never agree on the true coefficients. The equations will always be inconsistent to some extent. This is equivalent to saying that not all the points will all lie on the regression line. But what you want to do is to minimize the square of the distance between the regression function and the observations. That is, you want to minimize the function |Aa -y|2. This is exactly what singular value decomposition does for you. So in MATLAB you enter the commands (after constructing the matrices, of course)
[U,S,V] = svd(A,0); % SVD of design matrix W = diag(1./diag(S)); % not checked for zero singular values a = V*W*(U'*y); % your coefficients! Covmat = V*W.^2*V'; % compute covariance matrix [N M] = size(A); % size of design matrix (row X column) redchisqr = sum((A*a-y).^2)/(N-M); % compute reduced chi squared sa = sqrt(redchisqr*diag(Covmat)); % uncertainties in coefficients
And that's your answer. Note that the primes mean matrix transposes in MATLAB, and in the second line I've skipped an important step of checking for zeroes in the singular value list before inverting. The second line is a bit tricky: it is equivalent to taking the diagonal of S, inverting the individual elements of the resulting vector, and then reconstructing a square matrix with those new elements on the diagonal. The calculations look obscure, but they are efficient and powerful. You can use this 6 lines of code as an engine to a general linear least squares regression program. All you need to do is build the design matrix and data vector for the specific linear model you want to fit.
3.3.4 Multidimensional Regressions
What if you want to fit your data to higher dimensions? For example, suppose you wanted to model the distribution of dissolved oxygen at some depth level in the North Atlantic. You might do this, for example, if you had observations in an area of the North Atlantic and you were interested in calculating the large scale gradient (rate of change with distance) was. Supposing further that you believed that the distribution was best fit with a biquadratic function, that is, a function which was quadratic in both x (longitude) and y (latitude). Then your data, z would be modeled after
Be careful not to be confused here, because we are now using "y" as an independent variable, rather than an observation, and we have introduced "z" as your dependent variable (observation).Actually, this whole thing sounds more complicated than it is, but it is mathematically the same as the general linear least squares, and you already have the equations. You just calculate your design matrix just like before, as
the right hand data column vector would be
and your coefficient matrix is given by
which you then solve with the MATLAB code we showed you above.
Not too hard. You might want to look at an m-file called surfit.m
which does an unweighted two dimensional fit to an arbitrary order
polynomial. You might look at it to see how you might extend it
to weighted fits (with the sigmas).
GoTo Lecture TOC
3.3.5 Transformably Linear Models
There is one obvious case where the model equation is non-linear
in its coefficients, but you can transform your data to make the
model linear. Consider the model equation
If you take the logarithm of this equation, it reduces to
So all you have to do is to take the log of your results (yi's) do a linear fit, and transform the first coefficient by taking the exp of the intercept. You can thus avoid doing non-linear fits sometimes in this fashion.
Also, beware the "non-coefficient" problem. That is, you can sometimes introduce two model functions that are identical in behavior. The net result is that the normal equations become exactly singular. The SVD approach above, however, will give you an answer (after telling you there is a problem by giving you a zero singular value). Consider the following equation:
which looks reasonable, until you realize that a1 and
exp(a2) are essentially the same basis functions (both
constants multiplying the exp(a3x) term).
GoTo the Next Section
GoTo Lecture TOC
GoTo 12.747 TOC