Helpful references for this assignment:
Clark, I., 1979, Practical Geostatistics, Chap. 1, 2, and 5
Davis, 1986, Statistics and Data Analysis in Geology, Chap. 4 & 5.
Journel, A.G. and Ch.J. Huijbregts, 1978, Mining Geostatistics, Chap.
2-4.
Marcotte, D. 1991, Cokriging with MATLAB, Computers and
Geosciences, 17(9): 1265-1280.
Marcotte, D. 1996, Fast variogram computation with FFT, Computers
and Geosciences, 22(10), 1175-1186.
Cressie, N. 1993, Statistics for Spatial Data, Chap. 1, 2, and 3.
1). Now the fun begins (that is if you like making maps). For some people satellite images are just as good as maps, for some, even better. In this assignment we are going to objectively analyze a sequence of three AVHRR images of SST from the 10o by 10o area surrounding Bermuda. A common problem in satellite data analysis is the gaps in the imagery retrieved, these missing data points can be due to a number of reasons but the most common one for data collected in the visible and thermal to near infrared is obscuration by clouds. The three images we are going to "analyze" are a sequence of AVHRR PathFinder SST images from 1987 (days 130, 131, and 132). At this point you should download the following binary image files:
A1987130h09da_sst.bin
A1987131h09da_sst.bin
A1987132h09da_sst.bin
Since these are binary images they will require a little manipulation so that we can Krige them in MatLab. So we are providing you with a couple of programs and ancillary data sets, they are:
cokri.m the main cokriging program;
cokri2.m called by cokri.m;
trans.m called by cokri2.m;
means.m calculates row averages, used by
cokri2.m;
surfit.m the trend surface fitting program;
excise.m removes any rows in a matrix containing
Nan's:
byte2sst.m a program to convert binary satellite data
into SSTs;
purp_map.mat a MatLab MAT file with a color map
useful for satellite data;
ps05_init.m an initialization code to get you
started with images in MatLab.
The first code you will run is ps05_init.m. It reads in the binary images and makes plots of the three successive day SSTs near Bermuda, you should open up this program and take a look at how it does this. We give you this code because MatLab handles images slightly differently than matrices, even though they are both stored as matrices. We could get real fancy and spend a lot of time getting the coastlines and other map affectations into this plot, but the "simple" image plots made with imagesc in this code will suffice. If you look real close you can see the five pixels near the center of the images that are grayed out, this is where the islands of Bermuda are and we chose a color scale so that they would come out gray. Additionally, we also arranged the color scale so that the Nan's would plot as white pixels, indicating missing data; these are the pixels we are going to use cokriging to "fill in".
Additionally, some of the calculations in this assignment take a long time to complete (cpu's keep getting faster and faster and by 2008 this is becoming less of a concern, if your code runs correctly the very first time). Calculating the semivariances for the three images took over seven and half hours on a Sun Ultra Sparc workstation doing it the straightforward way in 2000. Although considerable speed can be gained by using Marcotte's FFT approach, to be fair, not enough detail of this approach was covered in class to expect you to use the code. But we are going to give you the m-files so that you can use them for your own projects, impress friends at cocktail parties, and otherwise add to your arsenal of software:
variof1.m function to compute variogram or covariogram;
variof2.m function to compute variograms,
cross-variograms, covariograms, cross-covariograms and
pseudo-cross-variograms.
For a more complete description of these programs check out Marcotte (1996) an article from Computers and Geosciences.
For these reasons we are not going to ask you to calculate the semivariances and are going to just give them to you in the following file:
Nevertheless, there will still be some lengthy calculations and you should not be discouraged by this. Try to block out a good chunk of time to complete this assignment. And because of this, the manner in which you hand in this assignment will be a little different. Rather than running all of your programs all the way through (about 20 hours of watching numbers flip by on our screens), we're asking you to make PostScript plots of your results (save them as encapsulated postscript, .eps) and send them in with your code (we still want to see your code, it's how we give you partial credit).
a). As we said above the first thing to do is run ps05_init.m. When you get your plot on the screen study it, notice the patterns of missing data; study the code, notice how the imagesc command is used. The first plot we want you to save is this one, you create it using the following command:
print -depsc2 ps05_init.eps
But do try to think of a more student-identifying name for the plot; the above command will save the current plot (most recently active graphics window) as a color, encapsulated PostScript (.eps) file named ps05_init.eps.
Look in your work space, you'll see that you have the three images as matrices; look in the ps05_init.m code, you'll see that the binary values have been converted to SST's using one of the programs we gave you above. Additionally ps05_init.m also converted all of the -3oC data points to Nan's, or missing values.
b). Now, the next step is to do trend surface analysis on your data and remove any trend surface to create the "stationary" data fields that you will extract the semivariances from and, later, objectively analyze. But before you go any farther we suggest you remove the pixels representing Bermuda from the images and store the indices of these pixels (so you can overlay Bermuda back onto the images when you're done), try using MatLab's find.
Because the semivariances took too long to generate we have to give you a number of pieces of information (more about this in the next part of the problem). Before you can fit a trend surface to the images you will need the data in tuplet form (x,y,z groups) and since we had to choose the longitude and latitude grid to generate the semivariances we give the code of how to generate the x,y grid here.
for ilat=1:ncols
for ilon=1:nrows
Alon((ilat-1)*114+ilon)=-69+(ilon-1)*(10/(nrows-1));
Alat((ilat-1)*114+ilon)=37-(ilat-1)*(10/(ncols-1));
end
end
Note that these "coordinates" are in degrees longitude and latitude, just a quick word about this. If the data coordinates were in different units then we would need to standardize these coordinates (otherwise km in the horizontal and m in the vertical produce very flatten out grids). But because we are looking at a small piece of the world's surface where one degree in latitude is approximately equal to one degree of longitude we can get away with using longitude/latitude for coordinates (and it makes making maps a little easier since we can use the coordinates directly without having to convert them from some standardized coordinate system).
The next step will be to vectorize the image data so that it can be combined with the longitude and latitude data, unroll the data matrix using:
X1temp=X1sst(:);
etc. We need to assemble a data matrix that we can do something with. The number of rows in this matrix must be the same in each column. It is a simple matter to reattach the longitudes and latitudes to the "unrolled" SSTs. It might look something like:
X1temp=[Alon' Alat' X1temp];
this produces an n by 3 matrix of tuplets. Of course you'll need to do a similar thing for the other two images.
You may think that you are now ready to "fit" a trend surface to the data using surfit.m, but there's one more thing to consider, Nan's are sticky and if you pass Nan's to surfit.m you'll get all Nan's back, so you need to remove them first. Fortunately for you we also provided you with a m-file function excise.m that will remove only those rows with Nan's.
Here's where we tell you we decided that using a first order trend surface was adequate for rendering the residuals into a near stationary state and used those residuals to compute the semivariances. If you want to use another trend surface, say a quadratic, you'll have to recompute the semivariances as they are dependent upon the the residuals, which are dependent on the trend surface removed. This is a long-winded way of saying, compute a first order trend surface for each image tuplet set and create a trend surface and residual image for each day. Save the trend and detrended tuplets, you'll need them later.
Make a 3x3 plot of the original images, their trend surfaces, and the resultant residual fields. Save this plot in your second PostScript (.eps) file. Plotting them as images will save disk space greatly, the above plot if stored as individual plotted points is approx. 38MB in size, whereas the corresponding image-based plot is only a couple hundred KB.
c). In this part you will need to load the pre-computed semivariances we gave you. In this single file you will find the following variables: S11x, S11y, S22x, S22y, S33x, S33y, S12x, S12x, S13x, S13y, S23x, S23y. The "11" refers to the cross-semivariance of the first image upon the first, in other words the ordinary semivariance. The "23" in turn refers to the cross-semivariances between the second and third image. The distance between the grid points (i.e. the lags) can be computed from the information given about the images; this information was used in generating the semivariances.
As we mentioned earlier we chose a first order trend surface as being adequate for removing any large scale trend in the data and rendering the data nearly stationary. We could get a statistically significant improvement in fit by going to a second order trend surface, but the improvement in attaining stationarity was minimal. This is due to the fact that the mesoscale variability scales correspond to relatively high order trend surfaces and it is not feasible to remove them in this fashion. One could obtain a better objective interpolation by using a low-pass filter instead of a trend surface, but we think a trend surface makes the point and is doable in a time period shorter than the life of the universe (you do want to finish this problem set before you defend your thesis, nicht wahr?). Additionally we chose a grid that was equivalent to the original grid of pixels. These are "9km" images and 10o yields an image that is 114 by 114 (12,996 potential grid points). We don't recommend that you create a grid with more than twice as many points as data points you start with and in this case each image has more than half of its pixels with valid data.
Now plot the empirical semivariances versus their lags, we suggest a 3x2 subplot arrangement, hold each of these subplots for later over plotting with the fitted thoretical semivariograms. Make your m-file plot these semivariograms using commands modeled after:
subplot(331);plot(lags,S11x,'r+',lags,S11y,'bo');
Examine these scatter plots and decide on the following: is there any evidence for anisotropy? are there any embedded periodic structures? if there are, decide how many of the empirical semivariances you will pass to your fitting routine (you don't necessarily have to use them all), which theoretical semivariogram would fit well these empirical estimates? is there any evidence for a nugget? If you decide to include the nugget effect in your theoretical semivariograms be sure to use the reduced sill, which is nothing more than the sill minus the nugget. An example of the exponential model with a reduced sill approach is given here:

or one could use the spherical model, given by:

Where co is the nugget, c the sill, and a the range.
Now fit your choice to the empirical semivariograms (using nlleasqr.m) and report the fit parameters and their uncertainties (the following part will generate a lot of output to the screen, so write these values out to a file). Plot these fit models on top of the empirical points in the 3x2 plot and save this as your third PostScript (.eps) file.
d). Now we are ready to cokrige. You may, by now, be wondering, "what the heck is this cokriging"? As you recall from lecture, kriging is a gridding technique that allows the user to determine the best, unbiased, weighted, linear combination of data points at each grid node estimation. Best of all, kriging also gives you an estimate of the uncertainty for each grid node calculation. If kriging is done to one variable (property) at a time, cokriging is done to p properties in one execution. It gains a lot in the exploitation of the covariance information between variables, that is to say from the cross-semivariance information.
The "clever" part about this assignment is that by treating each image as just another property, we can cokrige all three images at once. To do this you will need to combine all of the data, along with the grid, into one n by d+p matrix (n=number of data points, d=number of dimensions in the grid, and p=number of properties). Unlike surfit.m, cokri.m can handle Nan's as missing values and you will need to put together all of the data with Nan's put back in so each column has the same number of rows. The data goes into a variable called x.
Now you must generate the coordinates of the grid points you will be cokriging to, these will be referred to as your m grid points and they will be stored in a variable called x0 (in this case n = m).
There are two more variables that need to be set that are of critical importance in the cokriging program: model and c. As you might expect model describes to the program the semivariogram models you have settled on. Each row of model has some of the parameters for the model, in cokri.m it is possible to have embedded models (i.e. a nugget model can be embedded in a linear model). The variable c contains the rest of the story for the semivariogram model, if r is the number of models declared in model, then c will contain r, p by p matrices containing the information to be combined with that model. If a nugget model is used first, then the first p by p submatrix in c should contain the nuggets. Read the help information at the header of cokri.m (>>help cokrig), our online notes about the program at Lecture 5, section 5, and we suggest you read over Marcotte (1991). Report here your model and c variables.
There are a few more variables to set and for consistency's sake we suggest you use the following values:
itype=1; avg=[mean(Xdt(~isnan(Xdt(:,3)),3)) mean(Xdt(~isnan(Xdt(:,4)),4)) mean(Xdt(~isnan(Xdt(:,5)),5))]; block=[1 1]; nd=[1 1]; ival=0; nk=40; % number of nearest points to use rad=1; % search radius
ntok=1;
although we will be willing to hear arguments as to why different values should be used (Xdt was our detrended master variable, x). Note that the meaning of the variable avg was left out of the explanation in cokri.m and we find it is necessary to set it, if you go into the code and root around you'll see that it is essential in cokri2.m.
Now, cokrige the data. WARNING: it may take up to 90 mins to cokrige your data, so leave time (Sun UltraSPARC 300MHz: 90 min, WinME 1Ghz Pentium IV: 22.75 min, 2 GHz LinTel: 10.5 min). Make another 3x3 plot of the original images, the objectively analyzed results in x0s (don't forget to add the trend surface back in, don't forget Bermuda either!), and the error fields. Notice how the error fields (s) compare to distribution of missing data in the original images. Make this plot your fourth and final PostScript (.eps) file.