% Problem Set #3 for 12.747 Fall 2000 % % This answer is a combination of Tim Kenna and David Glover's answers % % Started 1998:10:02 TCK % Modif'd 2000:10:27 DMG % Modif'd 2000:11:08 DMG disp('Part a:'); disp('After loading the data in, we sort it by depth and create'); disp('variable Xupp and Xlwr. Note that both variables have the lat., long.,'); disp('and depth stripped off. The reason why we strip off this information'); disp('is that eigen analysis of any data matrix is sensitive to patterns'); disp('and regularities that may be present. This will almost certainly be'); disp('true for depth as most oceanographic data exhibit a strong depth'); disp('dependence due to vertical stratification of the water column. In a'); disp('similar fashion, latitude and to perhaps a lesser extent longitude'); disp('will also exhibit trends due to changing water masses etc. We will'); disp('now do priciple component analysis using pca2. We will use the 1 option'); disp('to tell pca2 to standardize the data before analysis for two reasons,'); disp('because the data are in different units, and we are asked to make'); disp('judgements about eigenvalue contributions to total variance and the'); disp('communalities of the extracted factors. This is made much easier'); disp('by standardizing the data first. As asked, I will display the'); disp('eigenvectors, the eigenvalue summary tables and the factor matrices'); disp('for both data subsets Xupp and Xlwr.'); disp(' '); load satl.dat; disp('Xupp=satl(satl(:,3)<=1000,4:10);') Xupp=satl(satl(:,3)<=1000,4:10); disp('Xlwr=satl(satl(:,3)>1000,4:10);') Xlwr=satl(satl(:,3)>1000,4:10); disp('');pause [U1,SUM1,AR1,Sr1]=pca2(Xupp,1); [U2,SUM2,AR2,Sr2]=pca2(Xlwr,1); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp(' '); disp('Part b:'); disp('[U1,SUM1,AR1,Sr1]=pca2(Xupp,1);'); U1 SUM1 AR1 disp('[U2,SUM2,AR2,Sr2]=pca2(Xlwr,1);'); U2 SUM2 AR2 disp(' '); disp('Above is the information requested for the two data sets. The U'); disp('SUM, and AR matrices are the eigen vectors, eigenvalues summaries,'); disp('and factor matrix respectively. '); pause disp(' '); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp('Part c:'); disp('If we define the communalities as the sums of the of the squares of '); disp('the factor loading matrix for each variable, the totals are the'); disp('amount of variance each variable retained in the factors. At this');; disp('point if we have done everything correctly, they should all equal');; disp('one. In PCA, we have removed no data and thus lost no variance. One'); disp('thing to note here about MATLAB is that the "sum" command sums down'); disp('the column unless you tell it something different. What I did, was'); disp('use the command:'); disp(' '); disp('Communalities_both=[sum(AR1''.^2)'' sum(AR2''.^2)'']'); disp(' '); disp('This command first transposes the factor matrix, then squares each'); disp('individual element, sums the columns which used to be the rows'); disp('and finally displays the communalities for both data sets. '); pause disp(' '); Communalities_both=[sum(AR1'.^2)' sum(AR2'.^2)'] pause disp(' '); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp('Part d:'); disp('We must now decide how many of these factors are really'); disp('necessary to characterize the data set for the problem at hand.'); disp('First let''s look at SUM1 again to examine the eigenvalues and see'); disp('how much variance each one accounts for. The structure of the summary'); disp('tables is as follows: the first column elements are the eigenvalues,'); disp('the second column elements are the amount of the total variance each'); disp('one accounts for and the third column elements are the cumulative'); disp('variance. '); pause disp(' '); SUM1 disp('Inspection of SUM1 reveals that the first three factors account for'); disp('over 92% of the variance. You might be tempted to reduce your number'); disp('of factors to three based on this, but it is the communalities that'); disp('describe the efficiency each factor has in accounting for variance on'); disp('a variable by variable basis. I display below a table called CC1'); disp('which are the cumulative communalities (it is a looping program, look'); disp('at the m-file if you want to see how I did it). '); pause disp(' '); for i=1:7 for j=1:7 CC1(i,j)=sum(AR1(i,1:j).^2); end end CC1 disp('Each column in CC1 is the cumulative communality of each factor for'); disp('each property. Had we elected to keep only three factors, our'); disp('communalities would be found in the third column. The communality'); disp('for SiO2 is weaker (67.55%) than the others. A quick look at'); disp('column 4 reveals the improved communality (in SiO2 and salinity) so I'); disp('choose to keep 4 factors, for the moment. Now, let''s look at the deep'); disp('water. '); pause disp(' '); SUM2 disp('For this data set, the first two factors account for just over 81%.'); disp('Looking at the third eigenvalue, which is close to 1, it might be'); disp('easier in the case of the deep water to just choose 3 factors, but'); disp('let''s create CC2 and see what the communalities tell us. '); pause disp(' '); for i=1:7 for j=1:7 CC2(i,j)=sum(AR2(i,1:j).^2); end end CC2 disp('At three factors salinity is still weak (85.09%). A glance at the'); disp('forth factor seems to rectify this problem, so for the time being'); disp('I''ll keep four factors for the deep waters as well.'); pause disp(' '); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp('Part e:'); disp('Using varimax.m to perform a varimax rotation on the kept'); disp('factors for each data subset, we expect to obtain'); disp('a change of loadings on each factor. Hopefully, the'); disp('rotation will give us better separation of loadings for specific'); disp('variables. The communalities, however, will not change as the'); disp('total amount of variance accounted for is dictated by the number of'); disp('factors kept. Note: I have already assigned lding before running'); disp('varimax, so all you have to do is type "return" and follow the'); disp('varimax prompts. '); pause disp(' '); lding=AR1(:,1:4) disp('Above is the lding variable for the shallow water varimax rotation.'); disp('Before we do the rotation let''s examine the factor loadings.'); disp('We see that most properties are heavily loaded on factor 1 (some'); disp('negatively, some positively), and none of the factors, to any great'); disp('extent, are loaded onto factor 4. This is a good case for rotation,'); disp('By doing the varimax rotation, we hope to see a better separation of'); disp('factor loadings. Remember to just type "return" at the K>> prompt.'); disp(''); pause disp(' '); varimax S1=lding'*lding; B1=lding*inv(S1); SR1=standardiz(Xupp)*B1; Upper=[satl(satl(:,3)<=1000,1:3) SR1]; %pause disp(' '); disp('As expected, the communalities don''t change. Note the first column'); disp('are the communalities before rotation, and the second column is after'); disp('rotation. '); Communalities_Shallow=[CC1(:,4) hjsq] pause disp(' '); Rotated_Factor_Loadings_Shallow=lding disp('After the rotation, we can observe a better separation of loadings.'); disp('Notice that temperature and salinity are strongly loaded onto'); disp('factor one in a negative sense, the nutrients PO4 and NO3 are'); disp('still loaded onto the first factor, but some strength of the'); disp('loading has decreased from the original, unroatated, factor'); disp('loadings. Additionally, O2 has moved off strongly onto'); disp('factor two, NO2 onto factor three, and SiO2 onto factor four.'); disp('Interpretation will be given at the end of the answer. '); pause disp(' '); lding=AR2(:,1:4) disp('Above is the lding variable for the deep water before varimax'); disp('rotation. Again we see most properties, with the exception of NO2 and'); disp('perhaps T and O2, are loaded onto factor 1. Factor 2 has T'); disp('(and O2? SiO2?), Factor 3 has NO2, and again there doesn''t seem'); disp('to be anything loaded onto Factor 4. When prompted by varimax,'); disp('just type "return". '); pause disp(' '); varimax S2=lding'*lding; B2=lding*inv(S2); SR2=standardiz(Xlwr)*B2; Lower=[satl(satl(:,3)>1000,1:3) SR2]; %pause disp(' '); disp('Here are the communalities for the deep water before and after'); disp('rotation; again these do not change. '); Communalities_Deep=[CC2(:,4) hjsq] pause disp(' '); Rotated_Factor_Loadings_Deep=lding disp('After the rotation, we observe several things. PO4, NO3,'); disp('O2 and salinity are all pretty heavily loaded onto Factor 1.'); disp('Temperature and SiO2 are heavily loaded onto Factor 2 (note that'); disp('they are inversely correlated). And NO2 is loaded heavily onto'); disp('Factor 3, but there are still no variables loaded onto Factor 4'); disp('to any great extent, why? Interpretation has been put off to the'); disp('end, but consider the following plot: '); figure; subplot(121);plot(Xupp(:,5),Xupp(:,6),'+');axis('square'); xlabel('NO3');ylabel('NO2');title('Upper 1000m') subplot(122);plot(Xlwr(:,5),Xlwr(:,6),'+');axis('square'); xlabel('NO3');ylabel('NO2');title('Below 1000m') pause disp(' '); disp('Knowing that the detection limit of most NO2 measurements is'); disp('somewhere around 0.01 umol/kg we might conclude that the NO2'); disp('data in this data set at depth is something of a Red Herring.'); disp('The measurements are so close to the detection limit (or below)'); disp('that the measurements really carry no useful information (they'); disp('only got one of three possible values, compare this with the NO2'); disp('values in the upper 1000m). So, we could just decide that three'); disp('factors is all we need at depth OR we could rerun the analysis'); disp('without the NO2 data. Let''s do both! '); pause close all; disp(' '); lding=AR2(:,1:3) disp('Above is the lding variable for the deep water before varimax'); disp('rotation. Everything is the same as above except there are'); disp('only three factors kept this time. '); pause disp(' '); varimax S2=lding'*lding; B2=lding*inv(S2); SR2=standardiz(Xlwr)*B2; Lower=[satl(satl(:,3)>1000,1:3) SR2]; %pause disp(' '); disp('Here are the communalities for the deep water before and after'); disp('rotation; again these do not change, but note that'); disp('there''s only 85.09% of the salinity variance accounted for with'); disp('three factors. '); Communalities_Deep=[CC2(:,3) hjsq] pause disp(' '); Rotated_Factor_Loadings_Deep=lding disp('Now we see that salinity, O2, PO4 and NO3 all load well onto the'); disp('first factor, while temperature and SiO2 are on the second and'); disp('NO2 is on the third. The separation is pretty good, but could we'); disp('have done better if we hadn''t dragged that Red Herring through'); disp('our analysis? Let''s rerun the analysis without NO2. '); pause disp(' '); disp('First let''s make a new variable Xlwr2: '); pause disp(' '); disp('Xlwr2=satl(satl(:,3)>1000,[4:8,10]);') Xlwr2=satl(satl(:,3)>1000,[4:8,10]); disp('And then rerun PCA2 on it and look at the summary table: '); pause disp(' '); disp('[U22,SUM22,AR22,Sr22]=pca2(Xlwr2,1);'); [U22,SUM22,AR22,Sr22]=pca2(Xlwr2,1); SUM22 disp('From this it looks as though only two factors might be'); disp('necessary, but let''s check the communalities: '); pause disp(' '); for i=1:6 for j=1:6 CC22(i,j)=sum(AR22(i,1:j).^2); end end CC22 disp('Again, in order to pick up the variance in the salinity we need'); disp('to go to one more factor. So, for the time being, I will choose'); disp('three factors. '); pause disp(' '); lding=AR22(:,1:3) disp('These are the loadings before factor rotation. Again we have'); disp('most of the variables loaded onto the first factor and not much'); disp('of anything loaded onto the third factor. So let''s VARIMAX'); disp('rotate. '); pause disp(' '); varimax S22=lding'*lding; B22=lding*inv(S22); SR22=standardiz(Xlwr2)*B22; Lower2=[satl(satl(:,3)>1000,1:3) SR22]; %pause disp(' '); Rotated_Factor_Loadings_Deep2=lding; disp('These are the rotated factor loadings for the deep waters and'); disp('EEK! It looks worse! While we have a nice separation of O2 onto'); disp('the first factor, temperature and SiO2 onto the second factor,'); disp('and salinity loaded onto the third; PO4 and NO3 seem to be split'); disp('between the first and third factor. Maybe we should have stayed'); disp('with the Red Herring, but wait and see the interpretation'); disp('section at the end. '); pause disp(' '); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp('Part f:'); disp('While I was doing the varimax rotation I was using the rotated factor'); disp('loading to calculate the rotated factor scores (see the code). This'); disp('allowed to me to reuse the variable lding. I now present here the'); disp('first 10 rows of the Upper and Lower rotated factor scores with their'); disp('Longitude, Latitude and Depths reattached. '); pause disp(' '); format short g First_10_Upper=Upper(1:10,:) disp(' '); disp('Hit return to see the lower water column factor scores'); disp(' '); pause First_10_Lower=Lower(1:10,:) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% disp('Part g:') disp('Now, we can make comparison of factor/factor plots with'); disp('property/property plots. I originally plotted all the factors against'); disp('one another as well as all properties against one another. No, the'); disp('results are not a Rorschach test, but there are an awful lot of plots'); disp('to look at. I think the main point to get from the plots is the'); disp('relationship between properties and factors. I think for the purposes'); disp('of this excercise, comparing just a few plots will give you the'); disp('idea. Let''s look at the shallow water first. '); pause disp(' '); format RFLS=Rotated_Factor_Loadings_Shallow; RFLS(abs(RFLS)<=0.5)=0 disp('Now, what I have done here is a simple trick that is often used'); disp('in displaying rotated factor loadings. I have simply set any'); disp('factor loading that accounts for less than 25% (you can choose a'); disp('smaller or larger amount depending on your problem) of the'); disp('variance of that particular variable to zero. This makes the'); disp('rotated factor loadings table a little bit more easy to'); disp('interpret.'); disp(' '); disp('What is seen here is that temperature, salinity, PO4 and NO3'); disp('load onto the first factor while factor 2 has O2, factor 3 has'); disp('NO2 and factor 4 has SiO2. From this we would expect the'); disp('variables loaded onto the same factor to produce'); disp('property-property plots similar to, say, Factor 1 vs. Factor 1'); disp('plot AND property-property plots from different factors to look'); disp('different. Observe Figure 1. '); disp(' '); figure(1); subplot(221);plot(Upper(:,4),Upper(:,4),'+'); xlabel('Factor 1');ylabel('Factor 1'); subplot(222);plot(Xupp(:,1),Xupp(:,2),'+'); xlabel('Temperature');ylabel('Salinity'); subplot(223);plot(Xupp(:,4),Xupp(:,5),'+'); xlabel('Phosphate');ylabel('Nitrate'); subplot(224);plot(Xupp(:,4),Xupp(:,6),'+'); xlabel('Phosphate');ylabel('Nitrite'); %pause; disp('Notice they don''t always look the same. This is because we HAVE'); disp('thrown away some of the variance in the data by subselecting the'); disp('number of factors (compare subplot(221) with subplot(222).'); disp(' '); pause; disp('Now let''s look at different factors and their corresponding'); disp('property-property plots. See Figure 2. '); disp(' '); figure(2); subplot(221);plot(Upper(:,4),Upper(:,5),'+'); xlabel('Factor 1');ylabel('Factor 2'); subplot(222);plot(Xupp(:,1),Xupp(:,3),'+'); xlabel('Temperature');ylabel('Oxygen'); subplot(223);plot(Upper(:,4),Upper(:,7),'+'); xlabel('Factor 1');ylabel('Factor 4'); subplot(224);plot(Xupp(:,1),Xupp(:,7),'+'); xlabel('Temperature');ylabel('Silica'); pause disp('Now for the deep waters, let''s look at the rotated factor'); disp('loading the same way we did for the surface waters. '); disp(' '); pause RFLD=Rotated_Factor_Loadings_Deep; RFLD(abs(RFLD)<=0.5)=0 disp('Now we see that salinity, O2, PO4, and NO3 load together on the'); disp('first factor, while Factor 2 has temperature and SiO2 (note the'); disp('opposite sign), and Factor 3 has the Red Herring (NO2). Let''s'); disp('make some scatter plots. See Figure 3.'); disp(' '); figure(3); subplot(321);plot(Lower(:,4),Lower(:,4),'+'); xlabel('Factor 1');ylabel('Factor 1'); subplot(322);plot(Xlwr(:,2),Xlwr(:,5),'+'); xlabel('Salinity');ylabel('Nitrate'); subplot(323);plot(Lower(:,5),Lower(:,6),'+'); xlabel('Factor 2');ylabel('Factor 3'); subplot(324);plot(Xlwr(:,1),Xlwr(:,6),'+'); xlabel('Temperature');ylabel('Nitrite'); subplot(325);plot(Lower(:,4),Lower(:,5),'+'); xlabel('Factor 1');ylabel('Factor 2'); subplot(326);plot(Xlwr(:,1),Xlwr(:,2),'+'); xlabel('Temperature');ylabel('Salinity'); pause disp(' '); disp('Now suppose we look at the plots from the factor rotation done'); disp('without NO2. Again we start with the simplified factor'); disp('loadings.'); RFLD=Rotated_Factor_Loadings_Deep2; RFLD(abs(RFLD)<=0.5)=0 disp('We have a nice separation of O2 onto the first factor,'); disp('temperature and SiO2 onto the second factor, and salinity loaded'); disp('onto the third; and PO4 and NO3 seem to be split between the'); disp('first and third factor. What do the scatter plots reveal?'); disp('See Figure 4. '); figure(4); subplot(321);plot(Lower2(:,4),Lower2(:,4),'+'); xlabel('Factor 1');ylabel('Factor 1'); subplot(322);plot(Xlwr(:,3),Xlwr(:,5),'+'); xlabel('Oxygen');ylabel('Nitrate'); subplot(323);plot(Lower2(:,5),Lower2(:,6),'+'); xlabel('Factor 2');ylabel('Factor 3'); subplot(324);plot(Xlwr(:,1),Xlwr(:,2),'+'); xlabel('Temperature');ylabel('Salinity'); subplot(325);plot(Lower(:,4),Lower(:,5),'+'); xlabel('Factor 1');ylabel('Factor 2'); subplot(326);plot(Xlwr(:,3),Xlwr(:,7),'+'); xlabel('Oxygen');ylabel('Silica'); pause disp(' '); disp('Looking at Figures 3 and 4 tells us something about the'); disp('appropriateness of the data set analyzed this way, see'); disp('interpretations. '); pause disp(' '); disp('********** End of Problem Set 3 **********'); disp(' '); disp('Here begins a brief interpretation of the results. THIS IS NOT A'); disp('REQUIRED PART OF THE PROBLEM SET, but many of you have wondered:'); disp('what the heck happened? '); pause disp(' '); disp('Well, let''s begin with the surface waters. If we look again at'); disp('the rotated factor loadings (for four factors) we can see an'); disp('interpretable pattern. '); RFLS pause disp(' '); disp('Clearly temperature, salinity, PO4 and NO3 follow each other on'); disp('factor 1 and can be interpreted as the close connection between'); disp('thermohaline effects and biogeochemical recycling, but O2, NO2'); disp('and SiO2 load onto factors 2, 3 and 4 respectively. Why? '); disp(' '); pause; disp('One of the main assumptions of factor analysis is that the data'); disp('do not represent the mixture of two or more separate populations'); disp('(upon which the underlying processes, read factors, may be'); disp('acting at different rates). According to Stramma and England'); disp('("On the water masses and mean circulation of the South Atlantic'); disp('Ocean", JGR, 104(C9): 20,863-20,883) the upper 1000 m of South'); disp('Atlantic waters are composed most of SACW (South Atlantic'); disp('Central Water). But there''s this little problem of of the AAIW'); disp('(Antarctic Intermediate Water), whose main mass is centered at'); disp('about 800 m and has a subsurface salinity minimum and an O2'); disp('maximum. Why doesn''t salinity and O2 load together? because the'); disp('O2 max. is slightly above the salinity min. and we only got part'); disp('of the AAIW in our "surface" waters. The situation for SiO2 is'); disp('similarly complicated by different water masses. NO2 off by'); disp('itself may represent the process of nitrification. '); disp(' '); pause disp('In the deep waters the problem is even worse, look at the'); disp('rotated factor loadings once again:'); pause; RFLD disp('From a 1000 m downwards we have AAIW (Antarctic Intermediate'); disp('Waters), CDW (Circumpolar Deep Waters), NADW (North Atlantic'); disp('Deep Waters) and AABW (Antarctice Bottom Waters). Even though we'); disp('have violated the mixing of populations rule, some patterns'); disp('still stand out. O2 and PO4 and NO3 load inversely (more or'); disp('less) onto factor 1, this is likely to be due to the inverse'); disp('relationship between the consumption of O2 while organic matter'); disp('is being remineralized. Temperature and SiO2 load (inversely)'); disp('onto factor 2, this is undoubtable due to the inverse'); disp('relationship between temperature and the dissolution of silica.'); disp('The loadings on factor 3 are a little more complicated to'); disp('interpret due to the mixtures of water masses. '); disp(' '); pause; disp('What should we have done? Prior to doing the factor analysis'); disp('(R-mode) we should have carried out a Q-mode (cluster) analysis'); disp('to identify the water masses. Separating the data into, say,'); disp('five groups (SACW, AAIW, CDW, NADW, AABW) and then performing'); disp('separate factor analysis might have resulted in a more'); disp('straightforward interpretation of the results.'); disp(' '); disp('END END END END END END END END END END END END END END END');