Instructions: Download all your files (by shift clicking on the highlighted file names) before starting MATLAB. Write an m-file to reproduce your analysis and e-mail the m-file to the Instructor.
1). UNSTABLE ALGORITHMS AND ROUND-OFF ERROR: The m-file named goldmean.m demonstrates how an unstable algorithm can interact with roundoff error in an unhealthy way. Download the m-file by shift-clicking (or right-clicking) on its name, save it in your current/working directory. Startup up MATLAB and type goldmean.
This algorithm is based on an iterative formula for the powers of the so-called "Golden Mean":
It can shown (we leave the algebra to you if you should doubt it) that:
This m-file will calculate
both ways: the hard way (raising the first equation to each
successive power) and then the easy way (by calculating
and making use of the fact
is
always 1, then applying the recursion). When it has finished it will print on the screen a table of
four columns: the power,
the hard way,
the recursion way, and the relative error (difference
divided by the "correct" answer). Notice that MATLAB divides all of the numbers by 104, what
we see is that by the 16th iteration we are starting to get a deviation between the two, supposedly
identical, numbers. You will see that by the 38th iteration the two different ways of calculating
are diverging by quite a bit (on the order of 10%)! The results will be in the MatLab variable
R and will be placed in your MATLAB workspace, so be sure to have your m-file display this
variable. Now make a plot of the iteration (power) vs. the absolute value of the relative error,
now you can "see" how this kind of error can explode in certain types of calculations.
2). USE MATLAB'S STUDENT'S T-TEST: You suspect that there are two distinct populations of NO3 values in a nutrient data set, one between 2000 and 3000 meters and the other below 3000 m. In this problem you will use some of MATLAB's powerful data manipulation capabilities to analyze this problem. You formulate a null Hypothesis that there is no difference
and we can test this hypothesis regarding their means using MATLAB's TTEST2 function
First we must load the data. Shift-click on no3.dat will download into your current/working directory a NO3 data set from the North Atlantic. Now you can load the data into MATLAB after starting it up by issuing the following command:
load no3.dat
MATLAB will now have a variable in its workspace called no3. To make things simpler on us (and to protect ourselves from overwriting the original data) we will create a new variable equal to no3 by:
D=no3;
don't forget the ";" or you'll get a lot of numbers on the screen. Let's look at this data first, use the plot command:
plot(D(:,2),D(:,1),'+');
What have we done? We have told MATLAB to plot column 2 of D (the NO3 values) on the x-axis and column 1 of D (the depth) on the y-axis and we told it to represent each data point with only a "+" sign, no connecting lines. But it doesn't look right. That's because it is "upside down". We can fix this by multiplying every element of column 1 by -1:
D(:,1)=-1*D(:,1);
Try that plot command again. Now we can use a VERY powerful feature of MATLAB to sub-select this data into two more variables:
Du=D(D(:,1)<=-2000 & D(:,1)>-3000,:); Dl=D(D(:,1)<=-3000,:);
What have we done here? We've used MATLAB's self-referential capabilities and told it to set variable Du (for upper) to all D's where the first column of D was less than or equal to -2000 AND greater than -3000 meters. In the next line we've set Dl (for lower) to all D's where the first column of D is less than or equal to -3000. Try plotting these variables the same way you plotted D, you should get data points only in the specified depth ranges. Now we're ready to apply the t-test:
[H,sig,ci]=ttest2(Du(:,2),Dl(:,2),0.05,0)
Do the results surprise you? Calculate the mean and standard deviation for each group of NO3using:
Dmu=mean(Du(:,2)) Dsu=std(Du(:,2))
Do the same for Dl (call them Dml and Dsl). You'll see that the means are not that far apart, but one glance at the plots tells you something else might be going on. Make sure your m-file makes it clear what your decision was: reject or keep the null hypothesis and report the values for Dmu, Dsu, Dml, Dsl. Issue the command:
help ttest2
what are the odds that you got the results you did by chance assuming the null hypothesis is true? What was your probability of committing a type-I error?
3). Type I and II Regressions: Since it ultimately depends on the molecular diffusivity of constituent species, the processes of bubble injection and gas exchange may fractionate in favor of one gas over another. This may go in one of two directions: (1) you may enrich slower diffusing gases because they escape less readily, or (2) you may enrich in faster diffusing gases because they are more easily injected. As a test, consider the distributions of two gases, He.dat and Ne.dat for which you have measurements in surface waters at a variety of locations. Both measurements have uncertainties associated both with measurement errors and natural variability beyond our control. Perform linear regressions to distinguish between the two models, which predict a ratio of He:Ne of 0.956 for the first case, and 1.027 for the second. Do this both by type I regression (He on Ne and Ne on He) and by type II regression. How do the results compare, and explain the differences.
4). General Linear Regression: An experiment is constructed to measure the dissociation rate
of a very unstable compound in the presence of UV. It involves the creation of the substance by
chemical reaction and then flashing the solution with UV while simultaneously measuring its
concentration. The data are stored in photo.dat which has the time (in milliseconds) in the first
column, and the compound's concentration (in arbitrary units) in the second column. The
assumption is that the UV driven consumption rate is a "zeroeth order" rate, that is a constant rate
independent of concentration or time. The problem is that when you installed the probe, the
shielding on the cable lead was damaged and the signal was seriously contaminated with 60 hertz
electrical noise from surrounding appliances. You must then do a general linear regression which
accounts for this "contamination", using the model equation
and of
course, you know that f=60. Also try the regression without the last term (i.e., just as a straight
line fit). Compare your results (the slopes) and explain.
5). Non-linear Regression: You need to separate two overlapping peaks on a sloping background from a chromatographic record. For the sake of simplicity, assume that the peaks are gaussian, and that the background is changing linearly with time. The data are contained in chromo.dat (chromo(:,1) is the time in minutes, chromo(:,2) is the detector response) and you must use non-linear regression. Give the best estimate elution times, peak widths and the amount of material (area under the gaussian curve). Also give uncertainties in your results (remember the propagation of errors). Devise a statistical test to determine if the background is really changing with time (your null hypothesis might be Ho: a2=0). For reference, the area under a gaussian curve: