

The TRMM satellite will be running out of fuel in the next couple of years. In order to build on its successes, NASA and JAXA have decided to launch a pair of followon satellites starting as early as 2013, and architect them along with other platforms carrying precipitationsensitive radiometers into a constellation of satellites that can detect, estimate and monitor precipitation globally. Dubbed the Global Precipitation Measurement (GPM) mission, its main objectives are
Our task is to make the best use of the core satellite's measurements to produce
estimates of the instantaneous rain field (at least means and covariances, and
possibly additional details about the nature of the uncertainties) within the field
of view of the core satellite's instruments. The latter will include a TMIlike
radiometer ("GMI") with a few higherfrequency channels, a TRMMlike 14GHz radar, and a new 35GHz radar
whose beams match the 14 GHz radar.
Since the swaths of the different instruments will be different, in order to "make the best use of the measurements" one must consider 3 portions of the largest swath (the GMI's):
The flow chart of the reference algorithm and a brief summary are provided at the
end of the discussion.
(or, should one start with radiances then incorporate the radar, or the other way around?) The simplest way to determine how best to estimate means and covariances from a combination of instruments while retaining mathematical rigor is to use the Bayesian context. Specifically, let's lump together all the rain variables that we want to estimate within one radar beam into a single vector called R, consisting of the rain mixingratio profiles, some information about Drop Size Distribution (DSD) profiles such as the mean drop sizes, and eventually some information about cloud liquid water. Let us also assume for now that we will have a reasonable estimate of the freezing level so that one will know when to interpret "rain" as snow, "DSD" as frozen hydrometeor size distribution, and "cloud liquid water" as cloud ice (though I seriously doubt that we will be able to say anything about the latter with any confidence). Let's similarly lump together all the measured radar reflectivities within a given radar beam into a single vector called Z (so, in case A, Z consists of 14GHz and 35GHz reflectivity profiles and the two PathIntegrated Attenuations PIA's, whether derived by reference to the surface over the ocean or through ancillary global soil moisture and roughness measurements over land; Z consists only of the 14GHz reflectivities and PIA in case B; and it is empty in case C). And let's call T the vector of measured brightness temperatures (allow me to be vague for now about what exactly is in T, though it should definitely contains the STAR radiances if they are available). In this notation, the challenge is to apply Bayes's rule to calculate the probability density function One way is the "radiometer first" approach and goes as follows:
(where "p(...... )" is shorthand for the probability density function of whatever is to the left of the "" conditioned on (i.e. given) whatever is to the right of it, and where, in the last equation, G represents the 0mean Gaussian (or lognormal, where more appropriate) distribution, with the covariance matrix as a subscript). The first equation is obtained by "leaving T in the background" and applying Bayes's rule to R and Z (that gets the reflectivities out of the way, i.e. to be incorporated last, and concentrates on getting some info about R from T first); to go from the first line to the second, apply Bayes's rule to the second factor on the right. The final expression, read from right to left, says the following: start with an a priori database of the allowable R's, compare the radiances associated to each one with the measured radiances (t(R) is shorthand for the forwardradiativetransfer calculated brightness temperatures to be associated with a candidate scenario R), then compare the reflectivities associated to each R with the measured Z (in the first term, z(R;T) represents the reflectivities calculated for each R with the help of T to whatever extent the radiances are useful in calculating the reflectivities). One might be able to simplify () by combining the first two factors, though great care must be taken in doing so as Z and T are not independent (if for no other reason than the fact that the reflectivities and the lowerfrequency radiances are strongly correlated with the PIA's). As the 3 terms in the final equation indicate, for this approach to work one must figure out
The other approach to calculate Q (the "radar first" approach) goes as follows:
This time, the first equation is obtained by "leaving Z in the background" and applying Bayes's rule to R and T (this gets the radiances out of the way, to be incorporated last, and concentrates on extracting all the information that Z carries about R first); the second line is identical to the first except the two terms have been interchanged (so that it will be easier to compare the two approaches). The final expression, read from left to right, says the following: start by inverting the reflectivities Z into as much information as possible about R, or more precisely about that subset R_{0} of R consisting of the variables that the radar is most sensitive to and can best constrain (those would be the rain mixingratio profiles and the mean drop size profiles in case A, just the rain mixingratio profiles in case B), then, calling R_{1} the remaining rain variables (essentially the spread of the drop size spectrum and the deviation of the cloud water from the prescribed mean in case A, while in case B these "remaining variables" would consist mainly of the mean drop size), synthesize the expected radiances for the different possible values of R_{1} and compare with the measured brightness temperatures, and finally average over the a priori distribution of the variables R_{1} which the radar could not help constrain. For this approach to be consistent, care must be taken in subsetting the rain vector R so that the "core" rain variables R_{0} (essentially the rain mixing ratios) are independent from the additional rain variables R_{1}  this is best accomplished by defining the nonmixingratio variables in the first place so that their (mean) correlation with the rain concentrations is "factored out". Once this is done, for this approach to work, one must figure out
Since there are two ways to proceed to calculate Q, the main question has to be: how different are the two approaches, and which is better suited in what portion of the swath? The answer is clear in region C, the "outer swath". Since no radar measurements are available in that case, and since the "radar first" approach depends crucially on extracting detailed information from the radar before comparing with the radiometer, the only option is the "radiometer first" approach. One simply ignores the first term in equation (^{}) (this amounts to letting its covariance matrix "tend to infinity", i.e. one reinterprets the nonexistence of any radar measurements as the existence of infinitely variable noise) and one must rely on the a priori database, its precomputed (mean) radiances, and their covariances, to weigh each candidate scenario according to how closely it matches the measured radiances. Because the "radiometer first" approach is the only one available in case C, and if for no other reason than continuity across swath boundaries, one would tend to prefer the same approach for the remaining cases. Any reasons not to do so would have to stem from drawbacks that are not shared by the "radar first" approach. And indeed there are three such drawbacks. The first is in the a priori probabilities, represented in both cases by the third factor in the equations above. Specifically, equation (^{}) relies strongly on an a priori database, which is assumed to contain all allowed rain scenarios, in proportions that are representative of actual precipitation. While equation (^{}) similarly makes an a priori assumption, the latter specifically does not concern the rain mixing ratios; rather, it has to do with the a priori distribution of those precipitation parameters that cannot be constrained by the radar (the raindecorrelated DSD parameters in case B, the secondorder raindecorrelated DSD parameters and the rainadjusted cloud liquid mixing ratio in case A). The second drawback has to do with the amount of detail that can be kept in incorporating the radar measurements, represented by the first factor in the equations above. Specifically, equation (^{}) requires that the radar reflectivities be synthesized from one's candidate rain scenarios. Since these are produced by a cloudresolving model, they have a very coarse vertical resolution, and are therefore incapable of synthesizing reflectivities on a resolution as fine as that of the radar measurements. This is in contrast with equation (^{}), which specifically requires that the radar profiles be inverted into rain mixingratio profiles, at the same fine resolution as the radar measurements. The third drawback is the detection problem. Indeed, the discussion so far has centered exclusively on estimating the rain, assuming that rain was properly detected in the first place. However, it is of course impossible to make quantitative estimates if one is not confident of having detected precipitation. With the radars, this process is well understood in general: once the receiver operating characteristic is determined, one need only specify a tolerable falsealarm probability to obtain a detection threshold for any individual range bin. In our particular application however, the problem is not so much to detect for each individual range bin as it is to detect precipitation in the beam as a whole, and to subsequently specify its "top" and "bottom" ranges (since, aside from falsealarm rates for individual bins, an important concern is to avoid unrealistic results such as rain disappearing and reappearing haphazardly as one moves consecutively from bin to bin within a given beam). These concerns have been addressed quite successfully by the TRMM radar group. On the other hand, in the case of the passive radiometers, the brightness temperatures are not as directly sensitive to precipitation per se, though rulebased screens have been rather successfully used starting with SSM/I to detect precipitation over the ocean. However even in that case, the discrepancy between the TRMM's PR and TMI detections is large, even at high rain rates. Part of that discrepancy is no doubt due to the different geometries and passage times of the two instruments. Still, the passive radiometer screens have not been as rigorously justified as the radar detection thresholds. In practical terms, the "radiometer first" approach could well encounter cases where rain is detected and preliminarily inverted according to the radiometer measurements, only to find that the radar signals are all noise and therefore provide no information to complete the estimation. For these reasons, the "radiometer first" approach has to be implemented in region C, and the "radar first" approach has to be preferred in region A. While the latter might seem the more suitable approach for region B too, the continuity argument would dictate that both approaches be implemented there. This will help us better understand the effect on the estimates of the a priori assumptions and of the mismatching resolutions.
(and, specifically, will SRT be used?) The previous section already lists the specific algorithm developments that will be required. Here they are again:
The only difference between item 1' and item 1 is the fact that, in this case, the radiances need to be computed in real time, possibly at different incidence angles (in the case of STAR). This will require the development of a scheme to parametrize sufficiently accurately the forward radiative transfer calculations in order to produce a virtual (or actual) lookup table which efficiently associates to a set of precipitation variables (and given "background" clearair temperatures) the corresponding brightness temperatures at the required frequencies, polarizations and incidence angles. Item 2 seems to require merely that one be able to synthesize radar reflectivities and attenuations from a given rain profile (i.e. it appears one only needs socalled "ZR" and "kR" relations). In fact, the problem is more subtle than that, because one must also assume that one is given the corresponding radiances as well. This important detail makes the problem tricky, because the lowfrequency radiances are highly correlated with the expected PIA, and therefore the assumption that the radiances are specified (and equal to the ones measured) is almost equivalent to assuming that the PIA is given, up to a nottoolarge uncertainty. This directly translates into a constraint on the DSD. In other words, the radar reflectivities (and PIA) have to be synthesized from the given database precipitation profile and under the DSD constraints placed by the given radiances. There is one further reason why the forward calculation of the radar reflectivities is somewhat more difficult than it appears, and that is the vertical inhomogeneity. Since the input rain variables come from a "cloudresolving" simulation, they represent averages over several hundred vertical meters. At 14 GHz, the attenuation coefficient of rain is proportional to the rain rate, and equals about 2 dB/km at mixing ratios of about 1 g/m^{3}. This implies that, already at moderately high rain rates, the radar return from the bottom of a 1km thick layer can easily be 60% weaker ( ~ 2x2 dB attenuation) than the return from the top of the same layer. Such differences can be accounted for in the forward calculation by properly averaging the twoway attenuation within a given range bin. To complete item 2, one must calculate the covariance matrix of the reflectivity profile given the rain profile. Analyses of the TRMM data have shown that the vertical reflectivities are far from mutually independent; in fact, 90% of the vertical variability of the subfreezinglevel 14 GHz reflectivities can be accounted for by the first 4 principal components. This strongly suggests that a diagonal (sum of the square differences) comparison of the first 4 principal components should be sufficient to approximate the pdf reasonably accurately. This observation solves the problem of the mismatched resolutions (since the first 4 principal components are linear combinations of the 1km average reflectivities), and it reduces the covariance problem to one of estimating the conditional variances of each of the principal components, conditioned on the rain profiles (and DSD information). This task will be discussed in more detail in the next section.
Finally, item 3 represents one of the most important aspects of the algorithm development effort, since it entails the development of a dualfrequency radar profiling algorithm (for use within the inner swath). The main idea behind using two radar channels is to exploit the difference in the attenuation at the two frequencies to help identify the underlying drop size distribution as well as the rain mixing ratio. Indeed, figure 1 shows the ratios of a drop's actual (Mie) radar backscattering crosssection to that approximated by the Rayleigh assumption ( ~ D^{6} for a drop of diameter D), and verifies that at both frequencies the Rayleigh assumption is quite adequate for most of the raindrop size range. However, as figure 2 shows, the extinction cross section quickly diverges from Rayleigh ( ~ D^{3}) for drop diameters greater than 1 mm, and does so in significantly different ways at the two frequencies. These observations lead one to expect that, while the radar returns from very light rain (which consists mainly of small drops) will not be appreciably different at the two frequencies, the returns from more significant rain will be exploitably different, largely due to the different characteristics of the attenuation, and this difference should allow one to estimate the rain mixing ratio as well as to characterize the drop size distribution to first order (e.g. by estimating the mean drop diameter), as a function of height. Before proceeding with the details of this portion of the algorithm, it is important to identify the specific variables that will be estimated and which up to this point have been conveniently hidden under the label R = (R_{0}, R_{1}). In the inner swath (case A), R_{0} consists of the rain mixingratio M_{R} and mean dropdiameter profiles D_{0} (i.e. a pair of variables per radar range bin), while R_{1} consists of the 35GHz surface backscattering crosssection along with the rainnormalized cloud water content "deviation" C_{dev}, defined as follows: call µ the correlation coefficient of the natural logarithms of the cloud water mixing ratio ln(M_{c}) and the rain mixing ratio ln(M_{R}) within a bin inside the cloud, and define the new variable C_{dev} as C_{dev} = M_{c}/M_{R} ^{µ}, so that ln(C_{dev}) and ln(M_{R}) are by definition uncorrelated. The correlation coefficient µ will need to be estimated offline, and may turn out to depend on a range bin's relative altitude between the bottom of the cloud and the freezing level (as well as on the rain regime, maritime vs continental etc). The reason and C_{dev} must be singled out as the main complementary variables to the rain variables R_{0} = (M_{R}, D_{0}) is that cloud absorption is expected to have a significant impact on the radar reflectivities. Because cloud water droplets are very small relative to both wavelengths, their reflectivity is negligible. For the same reason, however, the attenuation coefficient due to cloud water k_{c} is proportional to the cloud water mixing ratio M_{c}: at 14 GHz, 0.14 < k_{c}/M_{c} < 0.2 (if M_{c} is expressed in g/m^{3} and k is in dB/km); at 35 GHz, 0.83 < k_{c}/M_{c} < 1.4. For example, even a very dense cloud, carrying 2 g/m^{3} of water, will absorb the 14 GHz signal at a rate of only 0.28 to 0.4 dB/km (depending on the proportion of large droplets). By contrast, the same cloud will absorb the 35 GHz signal at a rate of up to 2.8 dB/km, a very large attenuation that would cause serious biases in the estimation if it is not properly accounted for. That is what the variable C_{dev} will accomplish. In addition to the rainy regions, the nonprecipitating clouds in "clear" areas will typically remain undetected (because their droplets are small), but their attenuation will "cast a shadow" on the surface, thereby making it harder to estimate the integrated 35GHz attenuation over the ocean by comparing the rainy surface return with any averaged "clearair" (yet possibly cloudy) returns. A priori, the magnitude of this additional uncertainty could be comparable to the cloud effect within the rain. That is the reason why we need to define a variable, , specifically to reduce this uncertainty. While water vapor will have a nonnegligible attenuating effect at 35 GHz (up to 0.2 dB/km at the surface), it is assumed that the temperature data available for each retrieval will be sufficiently accurate to determine the height of the bottom of the cloud. Since it is reasonable to assume 100% relative humidity within the cloud, the errors that the watervapor variability will cause should be negligibly small. We are finally ready to describe the main part of this algorithm, namely the inversion of the reflectivity profiles Z_{14} and Z_{35} into profiles of M_{R} and D_{0}, with the help of the measured PIA's, and of variables and C_{dev} representing the error in the surface backscattering cross section and in the cloud liquid water. The measurements and unknowns are related at each range bin by simple equations:
When the ancillary temperature data implies that the precipitation must consist of frozen hydrometeor, the same procedure still applies, as long as one interprets M_{R} as the graupel/aggregates/snow mixing ratio (the altitude and measured reflectivities can help in identifying the appropriate type), and D_{0} as the hydrometeor mean equivalent diameter (at both radar frequencies, the assumptions that the hydrometeors are spheres should be adequate). The uncertainty in the hydrometeor type and density will have to be accounted for in the covariance matrices. Indeed, there remains to describe the procedure to calculate the covariance matrix C(r). To do that, the sources of uncertainty that have not been explicitly included in the variables R = (R_{0}, R_{1}) must be identified, and their effect must then be quantified as the entries of the matrix C(r). The sources of uncertainty that one can easily identify a priori are
Last, we describe the inversion in the intermediate swath. In this case (B), the definitions of both R_{0} and R_{1} need to be scaled back, because we have fewer measurements (for example, it is simply unrealistic to expect a singlefrequency radar to shed any light on vertical details of the drop size distribution), and because these measurements are much less sensitive to variables such as cloud water and vapor. In this case, R_{0} consists of the rain mixing ratio profiles, while R_{1} consists of the rainadjusted mean drop diameter D' (defined just like C_{dev} above). Essentially, D' can be viewed as the parameter which specifies a (Z,k)R relation. This case is essentially identical to the case of TRMM, and we therefore intend to use the triedandtested radar inversion algorithm used for TRMM's PR. As equation (^{}) specifies, we need to start with an a priori distribution of D' ( = R_{1}). The latter will be used to reduce the uncertainty in the measured PIA (since, given a DSD, i.e. given a value of D', the PIA can be directly expressed as an integral of the measured reflectivities, say ZIA(D')). By comparing the ZIA(D')'s to the measured PIA, a preliminary constraint on D' is obtained, as well as a hopefully sharper estimate of the PIA itself (at this stage, the two become essentially equivalent). Finally, for each value of this "D'/PIA" parameter, the reflectivity profile is inverted into a rain profile, and the mean profile is obtained by averaging according to the conditional density of D'/PIA. This is very much like the procedure followed in the case of TRMM. One important source of uncertainty was ignored in this intermediate case, and that is the vertical variability of the DSD. The latter will be accounted for in the covariance matrix using offline forward simulations exactly as described in the previous case (A).
(or, what do we need to figure out between now and launch to fill the holes in the algorithm?) The first section identified two specific activities that are key to justifying the assumptions in the algorithm, namely
Task 4 requires the analysis of DSD data from various rain regimes, the characterization of the shapes of these distributions using mutuallyindependent parameters (or at least parameters whose joint distributions are carefully described), and the calculation of the vertical autocorrelation of each parameter. Task 4 also requires the analysis of ocean surface 35GHz backscattering crosssection data to determine the variability of and its dependence on the surface wind. Task 5 requires Mie scattering calculations to quantify the radar signatures (effective reflectivity factor and attenuation coefficient) as a function of the parameters used to describe the DSD. While calculating the signatures of the liquid portion of the cloud and precipitation is relatively straightforward, specific attention will need to be paid to the modeling of the radar response in the melting layer. Task 6 was discussed in great detail at the end of the previous section, and requires the quantification, using forward simulations, of the effects of 1) the inexactness of the DSD representation, 2) the inhomogeneity of the precipitation, 3) the variability of the water vapor, and 4) the numerical inversion, on the radarestimated precipitation profiles, and the subsequent efficient tabulation of the covariances as a function of the first 4 principal components of the rain rate profile. Task 7 is similar to task 6, in that it requires the quantification and efficient tabulation of the covariance matrix of the calculated brightness temperatures corresponding to a given profile of rain mixing ratios and a given rainadjusted mean drop diameter, or more specifically that portion of the covariance matrix which represents the uncertainties due to 1) the horizontal inhomogeneity of the precipitation, 2) the effect of the cloud , and 3) the variability of the surface temperature. Note that this task achieves much the same goal as that of calculating the covariances in the radiometeronly algorithms, except that in the latter case the variability of the DSD must also be simulated. Task 8 is also very similar to task 6, in that it too requires the quantification and efficient tabulation, using forward simulations, of the covariances of the calculated 14GHz radar reflectivities associated to a given database precipitation profile, and which account for the uncertainties due to 1) the vertical and horizontal inhomogeneities in the precipitation and 2) the variability in the dropsize distribution that is unaccounted for by the PIA constraint (i.e. essentially all the uncertainty beyond knowledge of a single mean "rainadjusted" diameter for the whole column). Task 9 involves the analysis of data or cloudsimulation results to determine the mean cloud mixing ratio (and its variance) that is associated to a given rain mixing ratio as a function of altitude (and, quite probably, of rain regime). The core reference algorithm will have to distinguish between three regions: the outer swath (region C) seen only by the radiometer(s), the inner swath (region A) seen by both radar channels along with the radiometer(s), and the intermediate swath (region B) where the radiances are supplemented by 14 GHz radar reflectivities only.
Within the inner swath, the algorithm will first invert the 2frequency radar measurements into as much detailed information about the rain and DSD profiles as they allow, keeping track of uncertainties in the PIA's and the cloud water content and the ambiguities at low rain mixing ratios. The algorithm will then synthesize the corresponding radiances and compare with the radiometer measurements to constrain the variables that the radars could not constrain sufficiently. the flow chart of the algorithm is as follows:
Within the intermediate region, the algorithm will perform two parallel estimates. The "radarfirst" estimate is obtained by first estimating the pdf of the DSD, taking into account the PIA and its uncertainty, then inverting the single frequency radar measurements into rain mixing ratio profiles for each plausible DSD (up to this point the procedure is very much like the TRMM radar algorithm), and finally synthesizing the corresponding radiances and comparing them with the radiometer measurements to constrain the DSD further. The "radiometerfirst" estimate is obtained by first comparing the measured brightness temperatures with those in the appropriately constrained database, then, for each plausible match, estimating the pdf of the DSD with the help of the PIA and calculating the corresponding radar reflectivities, which will finally be compared to the measurements to further constrain the bestmatching database precipitation profiles. The flow chart is:
In the outer region, the algorithm is the same as the radiometer algorithm, with some subsetting of the database to reflect the higher likelihood of the precipitation being similar to that observed within the inner swaths. Preliminary simulations using cloudmodel results as well as storm "snapshots" synthesized from highresolution airborne radar measurements as well as from TRMM overpasses show that the uncertainty in the estimates of the surface rain rate should be substantially smaller with the core suite of instruments than it was with the TRMM radar and radiometer. Figure 4 shows a scatter plot of the surface rain rates from the simulations, with the estimates on the vertical axis, the actual values on the horizontal axis, and in the case where the estimates where obtained from a TRMMlike singlefrequency radar. Figure 5 shows the corresponding results obtained using a 2frequencyradar algorithm along the lines described above. Figure 6 summarizes the main conclusion: GPMcore's innerswath algorithm should be able to estimate rain rates to within a standard deviation of about 20% for rain rates between 1.5 and 12 mm/hr (as opposed to the TRMM radar's 40%), though we will be hardpressed to achieve such a small uncertainty at lighter rain rates (because the corresponding 14GHz and 35GHz signatures are not exploitably different) or at higher rain rates (because the 35GHz attenuation will blank the echoes). One particular case in the simulations illustrates the DSD estimates rather well: figure 7 shows the simulated vertical distribution of the rainadjusted mean drop size that was superposed on the PRprecipitation estimated during a TRMM overpass of hurricane Bonnie in August, 1998; and figure 8 shows the estimates obtained from the prototype 2frequencyradar algorithm. As expected, the estimates are quite accurate away from very light and very heavy rain areas.
One final expected result has to do with the possible bias in the TRMM radar estimates.
Indeed, a single frequency radar cannot shed any light on the vertical variability of
the DSD. As a result, any assumptions that inevitably have to made about this variability
can result in biases in one's singleradarfrequency estimates of the rain, which would
not affect the 2radarfrequency estimates since in that case one generically has enough
information to estimate the DSD to firstorder from the measurements without a priori
assumptions. Figures 9 and 10 confirm that this is indeed the case. The figures were
created by grouping together simulation cases with smallerthanaverage mean drop size,
especially aloft, into a "black" group, while cases with largerthanaverage mean drop size,
especially near the surface, were grouped into a "red" group. Figure 9 shows the
singleradarfrequency estimates vs actual rain rates in both groups, and
Figure 10 shows the 2radarfrequencies estimates (the
axes represent rain rates on a logarithmic scale, starting at 1.6 mm/hr and ending
at 12.2 mm/hr, with a tick mark at 5 mm/hr). Manifestly,
the singlefrequency estimates misinterpret the different DSD variability
in the two groups as a bias, while the dual radar frequencies retain enough information
to prevent such a bias.
Wait! there's more! That's right, the estimates of the combined radar+radiometer retrievals can then be used to train the radiometers to make estimates of the vertical structure of the underlying precipitation when there is no simultaneous radar data. Here are the preprints of two papers (in .pdf format) that have recently appeared in JGR:



