Optimal Interpolation

We used optimal interpolation to merge the two data sets. This technique requires a background field, which was created from the AOA and WOA data fields as described in the paper: 3.2 Specifics. Each 1X1 degree grid point was assigned an associated background value.

Qualitatively, the Optimal Interpolation routine (oi) works as follows:
At each 1X1 degree grid point where oi wants to estimate a value, a number of surrounding data points are sampled. Based on the distance between these points, their distances from the grid point, and their error values, each sampled data point is assigned an "alpha" weight. The background field at each data point is then subtracted from each data value. These new values are multiplied by their alpha weights and then added together (in effect providing a "weighted average" of the deviation of each data point from its assosciated background value). This value plus the background field at the grid point in question is the final value calculated by the oi routine.

The Parameters of OI:

1. Number of Points used in the Calculation (n):
The oi routine uses only the n closest data points when calculating a value for a grid point.
2. The Radius of Influence (md):
The oi routine only searches for data points within a radius of md kilometers. If it is not able to find n points within this limit, it will use only the points it was able to find. Hence, some grid points may be calculated from fewer than n of its neighbors if they are too far away.
3. Correlation Length (c):
The oi routine uses gaussians to determine the influence a data point will have based on the distance from other data points, and the distance to the grid point in question. The gaussian that determines relative influence based on the distances between the data points looks like this:

r(k,j) = exp -[d(k,j)^2/c^2]

where k and j represent two different data points. d(k,j) is the distance between these points, and c is the correlation length. The resulting r(k,j) is a value between 0 and 1. r(k,k) will equal 1, since d(k,k)=0 (the distance from a point to itself). Similarly, as c gets large, the r(k,j) values will again approach 1. The greater this value, the more influence it has, so we see that as c increases, points seperated by greater distances will contribute more and more in the overall weighting scheme.

The gaussian function that determines the influence of each data point based on it's distance from the grid point is:

s(j) = exp -[d(j)^2/c^2]

where d(j) is now the distance from the jth data point to the grid point. So to calculate one grid point value, the oi routine creates an nXn matrix of r(k,j) values, and an nX1 column matrix of s(j) values. It uses these values, and the error associated with each data point to calculate an alpha weight for each data point.

4. The error:
Each data point used in the oi routine is assigned an error value. This value reflects how well the point is trusted, and is proportional to the variance of the combined WOA and AOA data field. A high error value results in a lower alpha weight, causing the associated data point's overall influence in the calculation to be reduced.

The Formulation of OI:

The oi matrix equation used to calculate the alpha weights is set up as follows:

(R + EI)A = S

R is the nXn matrix of the r(k,j) values (distance of points to each other)
E is a 1Xn column of e(j) error values
I is an nXn identity matrix
A is the 1Xn column of alpha weights
S is the 1xn column of s(j) values (distance of each point to the grid point where value is to be calculated)

Optimal Interpolation IDL Code

The code described below doesn't include the entire oi program but only the actual optimal interpolation part of the code. It is meant to be of assistance for those who wish to implement the technique. Prior to this, the data first had to be synthesized in a number of ways, which will be largely personalized depending on your data and/or artistic whim:

;The oi part of the routine begins with 2 FOR statements, to
;cause oi to cycle through all the grid points:

    FOR i=0,lat do begin    ;(where lat goes from 89.5 to -89.5)
    FOR ii=0,lon do begin   ;(where lon goes from .5 to 359.5)
         ;HERE IS FIND THE 20 CLOSEST DATA POINTS (if possible)
         ;TO THE lat(i),lon(ii) GRID POINT, WHICH IS THE POINT WHERE OI 
         ;THE VARIABLE: 
                data=fltarr(4,nuse)  --> lat, lon, value, background
      .                              (where nuse is the # of data pts found)
;nuse= # of data points. Usually, nuse=20. Sometimes there aren't 20 points 
;close enough, though, so nuse can be less.

r=fltarr(nuse,nuse)   ;The distance from each data pt to every other 
s=fltarr(nuse)        ;The distance from each point to the grid point
alpha=fltarr(nuse)    ;The final weight assigned to the data point

    for j=0,nuse-1 do begin
	for k=0,nuse-1 do begin

             ;new_distLL is a function which finds the distance between
             ;points. It take arguments: new_distLL(lat1,lon1,lat2,lon2).
	     r(k,j)=exp( -( (d_n)^2/(c^2)))  ;c is the correlation length,
	                                     ;defined at the begining of the
	                                     ;program (not shown).
	r(j,j)=r(j,j)+error(2,j)  ;r(j,j)=1 because r(j,j)=exp(0).
	                          ;"error" contains the the error associated
	                          ;        with each data point in the 
	                          ;        calculation. It's a 3 by nuse array:
	                          ;        [latitude, longitude, error]
	rinv=invert(r,status)     ;The equation we need to solve for alpha is
	                          ;[R][alpha]=[S], so we need to invert R 
	                          ;(which is r+error, now). 
	if status ne 0 then begin ;This keeps tabs on whether the matricies
	b=b+1                     ;are ill-conditioned. b will stay 0 if
	endif                     ;all goes well.

; for every grid point, calculate the weighting coefficients (alpha), 
; and the expectation value (p_hat).

	for m=0,nuse-1 do begin
            alpha(m)=total(rinv(*,m)*s)  ;each alpha value is found one at a time
                                         ;with the appropriate matrix mult.

            ;Weighted sum of expectation deviation from the background (bg). 
            ;data(3,m) is the bg value. 
            p_hat=p_hat + alpha(m)*(data(2,m)-data(3,m))

endif else p_hat=0   ;This endif refers to a previous condition not included 
                     ;here that checked to see if nuse=0. If no data points are
                     ;close enough to the grid point, the final value will just
                     ;be the background value.
	p(0,i*360.+ii)=la(i)                 ;i and ii index the lat and lon
	p(1,i*360.+ii)=lo(ii)                          grid points. 
	p(2,i*360.+ii)=p_hat+bg(2,i*360.+ii) ;<-- weighted average with the bg
	                                          added back in.
endfor  ;ends the ii (longitude) loop
endfor  ;ends the  i (latitude) loop

        output=filepath+prefix+tag+depth+'.dat'  ;prefix,tag,depth are all
                                                 ;data specific identifiers,
                                                 ;like: summer.temp.00, or 
                                                 ;some such. 
        printf,file,p & FREE_LUN,file	


Return to Contents