Introduction

Variations in precipitation critically influence society and ecosystems, affecting water resources, agriculture, and flood risks1,2,3. Climate change has already amplified precipitation variability, leading to more frequent and severe weather events4. Understanding and mitigating the impacts of precipitation extremes requires accurate historical records in a spatial and temporal resolution that captures the high variability of rainfall5,6,7. Observation-based rainfall products can only partially fulfill this requirement. Station networks have long records, but are not dense enough in most parts of the world8 and thus lack spatial representativeness. In contrast, satellite rainfall products provide homogeneous spatial coverage but only have limited temporal coverage. In addition, they suffer from considerable errors due to their complex rainfall retrieval methods and exhibit spatial and temporal inhomogeneities9,10,11.

Assimilation of historical meteorological observations in first-principle-based physical simulations enables modeling of consistent, comprehensive, and long records of atmospheric conditions12. In the last decade, such reanalyses have accelerated scientific research in hydrological modeling13,14, flood prediction15, calculation of climate change-related costs16,17, or training data-driven weather forecasting models18,19,20,21. However, existing global reanalyses still have significant limitations. The heterogeneous density of assimilated observations and the low spatio-temporal model resolution lead to uncertainties and biases12,22. In particular, the complex spatio-temporal structure of rainfall cannot be represented by the resolution of current reanalysis products, which leads to a significant underestimation of extreme values, which are crucial for impact analysis of severe weather events23,24,25,26,27. Running higher-resolution global reanalyses is currently not feasible due to the immense computational demand28,29,30.

Downscaling can be used to increase the spatial and temporal resolution of coarse-resolution global models, either dynamically, that is running a local-area high-resolution model, or by statistical post-processing. While dynamical downscaling is again limited by computational resources, statistical methods are computationally efficient and can be applied globally. However, traditional statistical approaches are not capable of generating realistic high-resolution rainfall fields with correct spatio-temporal patterns31 and extreme values. Recently, advanced downscaling approaches leveraging deep neural networks have proven to be capable of this task. Successful applications have been shown for spatial and spatio-temporal super-resolution32,33,34,35,36, and regional spatial downscaling37,38,39,40,41. Nevertheless, a skillful global sub-hourly, km-scale downscaling of precipitation data has remained a challenging problem.

Here, we present spateGAN-ERA5, a conditional generative adversarial network for robust deep learning-based spatio-temporal downscaling of ERA5 precipitation data. Our model transforms hourly, 24 km (~0.25∘) resolved ERA5 precipitation estimates into rainfields that resemble weather radar observations at a resolution of 10 min and 2 km. SpateGAN-ERA5 is trained on high-resolution quantitative precipitation estimates (QPE) from a gauge-adjusted and climatology-corrected weather radar product in Germany and is evaluated across three climatically diverse regions on the globe. The model generalizes well outside the training domain and enables computationally efficient global rainfall downscaling to a resolution that is fine enough to capture the spatio-temporal complexity of rainfall, especially for rainfall events with convective cells. It generates realistic extreme value distributions, spatial structures, and advection patterns, all in a well-calibrated ensemble that addresses the underdetermined nature of the downscaling problem. Thus, spateGAN-ERA5 significantly advances downscaling methodologies and opens up a wide field of possible scientific investigations in a variety of domains like hydrology, risk analysis, or agriculture.

Results

Generative spatio-temporal downscaling of global ERA5 precipitation

For global downscaling of ERA5 precipitation data we use a conditional generative adversarial network (cGAN) with ERA5 convective (CP) and large-scale precipitation (LSP) as the coarse condition and gauge-adjusted weather radar data as the high-resolution reference (see Fig. 1b). The downscaling of hourly ERA5 precipitation fields with a spatial resolution of 24 km is performed by a generator model producing a field with a 12-times higher spatial and a 6-times higher temporal resolution. Specifically, the generator processes CP and LSP input patches with a size of 28 by 28 grid cells and 16 time steps. To provide more contextual information, the input is four times the domain size of the actual downscaled area (see Fig. 2a).

Fig. 1: Model and evaluation area overview for spatio-temporal downscaling of global ERA5 precipitation estimates.
figure 1

spateGAN-ERA5 can transform coarse-resolution ERA5 rainfields shown in (a) into high-resolution rainfields (2 km, 10 min.) as they would be observed by a gauge-adjusted radar product regarding their spatial structures and rain rate distribution. b Schematic of the model operating on patchwise downscaling of km gridded convective and large-scale ERA5 precipitation variables in a probabilistic manner. c Global downscaling predictions enabled by patch stitching provide continuous rainfields (full resolution map shown in ancillary files Fig. A1). From the area marked by the red box, patches are drawn and used for model training. Downscaling performance is evaluated using radar observations as a comparison from the regions marked by the orange boxes. d Detailed highlight shows the resolved resolution in time and space, and the comparison with the Australian weather radar observations.

Fig. 2: Case study of performance on a challenging precipitation event starting on 03.07.21 in the US with observed convective cells.
figure 2

a Model input patches consisting of larger ERA5 data, i.e., the convective and large-scale precipitation contribution to the total precipitation sum. b Location of the radar observation. c Observations, spateGAN-ERA5 predictions, and rainFARM downscaling for the target domain in 10-min increments from t to t + 50 min. and as a coarsened version approximating ERA5 resolution. d 1-D cutouts showing spateGAN-ERA5 ensemble members for a specific pixel along the temporal dimension (top panel) and a horizontal cross-section for one time step (bottom panel). e Distribution for temporally aggregated data with 2 km and 1 h resolution, including maps shown in (c) and the previous and following 6 h. A severe precipitation warning threshold of the German Weather Service is set at 25 mm/h.

A main feature of GANs is the custom learnable loss function (the discriminator). In our model, this enables the generation of realistic fields that fulfill a wide range of statistical and structural criteria for precipitation. The applied neural network architecture extends the spateGAN model established for a weather radar video-super-resolution approach32 and is described in the “Model description” section. The model is trained using high-quality gauge-adjusted weather radar data provided by the German Meteorological Service (DWD) from the years 2009–202042. Details on the adversarial training procedure are given in the “Training and model selection” section. The model is efficient, fast, and small enough to run on a single NVIDIA-Tesla-V100 GPU by downscaling one patch in 0.04 s in inference mode. Data-parallel training on 4 A100 80 GB GPUs took 3 days Table 1.

Table 1 Overview of used rainfall observation datasets

Global fields are produced by stitching overlapping high-resolution patches (see the “Data preparation” section). We evaluate the downscaling skill by comparison to weather radar data from the year 2021 in three different countries (Germany, USA, Australia) that cover a wide range of climatic conditions (see Fig. 1c). Performance is compared to the stochastic rainfall downscaling method rainFARM, which is based on the extrapolation of the power spectrum to smaller scales43,44, and to trilinear interpolation as a simple baseline method (see the “Reference methods” section).

Case study

We select a variety of meteorologically interesting events (Fig. 2 and Supplementary Figs. 5–9) to showcase the spatio-temporal downscaling performance of spateGAN-ERA5 and how this overcomes the inherent limitations of ERA5 precipitation data.

Here, we focus on the event in Fig. 2 showing convective cells in the United States as observed by the MRMS dataset, which are at a scale known not to be resolvable by ERA545. Even when compared to coarsened radar observations at ERA5 resolution, ERA5 shows a too-low variance (see Fig. 2e) with an underestimation of extreme values (see Supplementary Information Section 1.5). Being able to reconstruct such small-scale rainfall cells is of particular interest to improve ERA5 precipitation estimates in regions and seasons with a high amount of convective precipitation, such as the tropics and extratropics22.

SpateGAN-ERA5 is able to reconstruct convective rainfall fields with small-scale structures and plausible rain rates, including heavy local rainfall. The rain cells show temporal continuity, hardly allowing for a qualitative differentiation between observed and predicted rainfields (see video V1 in ancillary files). Predicted rainfall may occur at a misplaced spatial or temporal position, but with a magnitude similar to the associated radar observation (see Fig. 2d). This misplacement is not solely due to the underdetermined nature of the downscaling problem but also reflects differences between ERA5 and radar data on a coarser scale. The probabilistic nature of spateGAN-ERA5 accounts for such uncertainties, but is also constrained by the contextual information provided by ERA5. For example, the predicted ensemble shows greater variability in intensity than in spatial or temporal localization.

RainFARM fails to reconstruct small-scale convective cells, overestimates the spatial extent of rainfall, and underestimates extremes. By design, rainFARM mostly coincides with ERA5 at the coarse resolution, limiting spatio-temporal disaggregation. This leads to only slightly more granular rainfall fields than using simple interpolation techniques.

Skillful representation of extreme values

To get a more complete picture of the extreme value statistics of spateGAN-ERA5, we analyze data from different climatic regions in the US, Germany, and Australia (see the “Evaluation” section), including severe tropical rainfall events in Australia, highlighted in Supplementary Information Section 1.4.

The fractions skill score (FSS) (Fig. 3a) shows that only for the smallest rain rate threshold and up to a spatial scale of 16 km the interpolated ERA5 and rainFARM rainfields have a higher location accuracy than a single ensemble member of spateGAN-ERA5. Considering an increased spatial scale or ensemble of predictions, the generative model consistently outperforms the other methods across all rain rate thresholds. For intense rainfall larger than 5 mm/h, spateGAN-ERA5 is the only model with acceptable skill. The relative improvement in terms of ΔmFSS when considering spateGAN-ERA5 as a downscaling technique instead of an interpolated ERA5 is highest for Australia, the dataset where interpolation has the lowest absolute mFSS. This is followed by Germany, the training region, evaluated over an out-of-sample time period, and the US (see Supplementary Table 1). For the tropic dataset, ΔmFSS is slightly below the US, however, the overall skill is highest. This indicates a strong ability of the model to generalize well outside its training domain.

Fig. 3: Investigation of the downscaling distribution reconstruction skill for the evaluation datasets in Germany, the US, and Australia in 2021.
figure 3

a Fractions skill score (FSS) for thresholds 0.1, 1, 3, and 5 mm/h and a temporal scale of 1 h. We report the slightly improved probabilistic ensemble FSS for rainFARM. b Distribution comparison showing multiple spateGAN-ERA5 ensemble members.

The distributions shown in Fig. 3b further support spateGAN-ERA5’s capability in predicting plausible extreme values. Predictions generally follow the reference’s lognormal distribution for Australia, the tropics (see Supplementary Fig. 10b), and Germany, which is physically reasonable46,47. Different characteristics in the US are physically more implausible and thus likely due to systematic errors in the non-gauge-adjusted MRMS radar data. Overall, spateGAN-ERA5 underestimates the frequency of strong precipitation for Australia and the US and overestimates it for Germany. Since spateGAN-ERA5 follows the average precipitation amount of ERA5 (see the “Model description” section), this is in agreement with the biases of the individual evaluation datasets as shown in Supplementary Table 1.

In terms of a pixel-wise deterministic skill (MAE and RMSE), ERA5 interpolation and rainFARM show the best results (Supplementary Table 1). However, this is mainly due to their tendency to produce smoother rainfields with dampened extreme values, avoiding a double penalty for misplaced small-scale events. Since we aim for sharp probabilistic estimates, we use these scores with caution. The superior CRPS shows that spateGAN-ERA5 predictions have the highest ensemble skill. The ensemble quality, important for a correct representation of extremes, is analyzed by rank histograms (Supplementary Fig. 1). It shows a well-calibrated ensemble with a slight under-dispersive tendency for spateGAN-ERA5 and an unfavorable heavy under-dispersive tendency for rainFARM.

Spatial plausibility of highly resolved rainfall fields

Spatial and temporal patterns of rainfall are the tangible result of the physical processes that drive precipitation formation and evolution in the atmosphere48,49. Accurately reconstructing these patterns presents a considerable challenge, especially when using data-driven models, which lack a priori knowledge of the underlying atmospheric physics50,51. These models must learn to reproduce sharp gradients, coherent advection structures, and multi-scale variability from limited, coarsely-resolved, and potentially biased training data. We consider weather radar observations as a sufficient reference to allow for the statistical analysis of such spatio-temporal patterns. The qualitative assessment of the “Generative spatio-temporal downscaling of global ERA5 precipitation” section suggests that spateGAN-ERA5 predictions are hardly distinguishable from real radar observations, while ERA5 interpolation produces blurry rainfields. Visually, rainFARM only slightly improves over the interpolation. To quantify this observation, we chose radial averaged power spectral density (RAPSD). As a measure of anisotropy, a key aspect of specific spatial patterns often caused by horizontal advection49, we define the linear eccentricity in terms of spatial autocorrelation in the “Evaluation” section.

This analysis uses a subset of each evaluation dataset, described in the “Data preparation” section, focusing on cases with greater consistency between ERA5 and radar observations. For the RAPSD (shown in Fig. 4a), spateGAN-ERA5 largely replicates the power spectrum of the radar observations in Germany, with slight deviations at the smallest wavelengths close to the target resolution. In the US, Australia, and the tropics (see Supplementary Fig. 10c), an underestimation of all wavelengths is apparent. These discrepancies in RAPSD can be traced back to the mean field biases of ERA5, which are stronger for more extreme events22, and by design, not corrected by spateGAN-ERA5. When focusing solely on spatial characteristics and disregarding a multiplicative bias, the normalized RAPSD shows an almost perfect alignment between predictions and observations for all datasets.

Fig. 4: Spatial characteristic scores for a subset of the evaluation datasets in Germany, the US, and Australia in 2021 (see description in the “Data preparation” section).
figure 4

a Mean radially averaged power spectral density (RAPSD) and mean normalized radially averaged power spectral density (dashed line). b Distribution of linear eccentricity of the 2D autocorrelation representation (0.5 Pearson correlation coefficient ellipse).

ERA5 interpolation produces overly smoothed rainfields, resulting in a considerably lower RAPSD and normalized RAPSD for shorter wavelengths. RainFARM slightly improves the power spectrum, increasing the amplitude for wavelengths between the ERA5 resolution of 24 km up to the final 2 km resolution. However, the method introduces a physically unrealistic jump in the power spectrum at 24 km47,52. The temporal power spectrum density shows a similar behavior of all methods for the temporal dimension (Supplementary Fig. 2).

Linear eccentricity is analyzed in Fig. 4b and illustrated for a single field in Supplementary Fig. 3. The spateGAN-ERA5 distribution of the score is close to the observations while rainFARM stays similar to the ERA5 interpolation, providing rainfields that are highly autocorrelated for a large spatial lag. SpateGAN-ERA5 produces small-scale features that resemble the radar observations in terms of size, orientation, and eccentricity (see Supplementary Fig. 4).

Discussion

A critical issue in atmospheric sciences is the extent to which deep learning models trained for a specific region can generalize to other regions. Additionally, discrepancies between modeled and observed data distributions persist, particularly in free-running climate simulations, which can become entirely decoupled from observations beyond a certain lead time. In the context of downscaling, a widely adopted approach involves training super-resolution models that do not rely on perfectly matched input-output pairs. However, if the synthetically coarsened training data deviate significantly from the actual climate model output distribution, this mismatch can lead to a degradation in model performance during inference37.

Our own analysis highlights the significant discrepancy between coarsened radar observations and ERA5 precipitation, both in terms of extreme value distributions and spatio-temporal structures. This mismatch stems, partly, from the limited convective parameterization schemes in numerical modeling22. We show that a carefully designed training sampling scheme, which can be described as training on loosely paired images, results in a high downscaling performance. This involves selecting ERA5 model input samples that closely match their corresponding observation targets, and by choosing Germany as the primary training region, showing a relatively high agreement between reanalysis data and targets. While tested on reanalysis data, this idea of training a model is generic and can, e.g., be applied to train on loosely paired images from nudged climate simulations and observation data, thereby facilitating the downscaling of traditional climate model scenarios, which then can be downscaled in inference.

To evaluate the generalization capabilities of spateGAN-ERA5, we tested its performance on spatial domains and time periods not included in the training data. This is particularly relevant given that high-quality meteorological observations with fine spatial and temporal resolution are only sparsely available. Training a downscaling model to represent the high variability of precipitation on a global scale is, therefore, inherently challenging using observations alone. Our findings indicate that spateGAN-ERA5 exhibits robust performance even outside the training region, demonstrating its ability to reconstruct precipitation fields in climatologically distinct environments. In particular, the model also performs well in tropical regions such as northern Australia, where high-intensity rainfall is dominated by convective processes that differ fundamentally from the precipitation dynamics of mid-latitude regions like Germany. In some cases, it is even exceeding its performance within the training domain, depending on the evaluation metric. This suggests a strong generalization capacity and can therefore be used on an extended scale, providing a global precipitation product with improved rainfall distribution characteristics. By leveraging the full historical record of the ERA5 reanalysis dataset, extending back to 1940, spateGAN-ERA5 is able to provide high-resolution precipitation reconstructions for an unprecedented time record, which is a significant advancement over conventional precipitation datasets.

To evaluate the quality of the downscaled precipitation fields, we consider a variety of spatial structures and pixel-wise scores and conduct an event-based analysis to reflect the diverse variability and characteristics of rainfall. Given that precipitation is inherently difficult to model due to its high variability and intermittency, we could show spateGAN-ERA5’s ability to disaggregate and reconstruct the statistical properties of rainfields across temporal and spatial scales, with plausible extreme values that are completely missing within the initial low-resolution input data. To generate realistic rain events from this data, a mere extrapolation of the spatial power spectrum of ERA5 as performed by rainFARM proved to be insufficient for the given problem. SpateGAN-ERA5, as a generative model, shows high structural similarities between its predictions and the reference datasets, as can be shown by evaluating the RAPSD, temporal PSD, and linear eccentricity in the “Spatial plausibility of highly resolved rainfall fields” section. Furthermore, the probabilistic, yet computationally efficient, method explicitly accounts for downscaling-related uncertainties when refining precipitation fields in space and time. The model architecture and training methodology are designed to be adaptable, making it applicable to other precipitation datasets and resolutions, thereby serving as a versatile tool for various scientific and operational applications.

As shown in the “Case study” section and Fig. 2e, spateGAN-ERA5 is the only presented method that is able to reconstruct a distribution similar to the observations, with predictions of larger rainfall intensities in the severe weather warning range. This demonstrates its potential for enabling more accurate hydrological modeling, particularly in flood risk assessments, where detailed precipitation fields are essential for simulating extreme rainfall events and their impacts. The ability to generate high-resolution precipitation maps several orders of magnitude faster than traditional dynamical downscaling methods addresses critical needs in meteorological and hydrological research. In the context of climate impact studies, spateGAN-ERA5 facilitates improved assessments of long-term precipitation trends and variability, helping to refine projections of extreme events under different climate scenarios. Its ability to reconstruct convective rainfall events, which are often missing in traditional climate model outputs, makes it particularly useful for assessing localized hazards, such as flash floods, and informing disaster risk management strategies.

Methods

SpateGAN-ERA5 performs spatio-temporal downscaling of ERA5 precipitation estimates, increasing the resolution from 24 km and 1 h to 2 km and 10 min. The model receives input patches of the ERA5 variables convective and large-scale precipitation of size 16 h × 672 km × 672 km and performs the downscaling for a centered domain of 8 h × 336 km × 336 km. We trained the model in Germany, where a consistently high-resolution and high-quality reference dataset is available through the gauge-adjusted and climatology-corrected radar product RADKLIM-YW provided by the German Meteorological Service, and where a high agreement between ERA5 precipitation and observation data can be shown (see Fig. 3)22. Global downscaling is achieved by downscaling and stitching overlapping patches.

Model description

We build on the successful precipitation video-super-resolution approach, spateGAN32, consisting of a generator, trained in an adversarial manner with a discriminator model. The main ERA5 downscaling generator model (see Fig. 5) comprises four consecutive components that make use of 3D-convolutional residual blocks (Res3D) to capture spatio-temporal dependencies.

Fig. 5: Schematic overview of the spateGAN-ERA5 generator and discriminator model architecture using Residual Blocks (Res3D).
figure 5

During inference, boundary regions are removed, and an ERA5 mean field bias adjustment is applied.

First, the ERA5 convective and large-scale precipitation input data are processed on their initial resolution. Second, it passes a UNET-like downsampling and skip connection with an added cropping operation. This allows the model to process data at multiple resolutions, consider global and local features, and focus on the target domain at an early model stage. Third, the spatial and temporal resolution of the input data is successively increased, and the structures of the rainfields are refined by 4 upsampling blocks, including bilinear and linear interpolation and Res3D blocks. Finally, three subsequent Res3D blocks adjust fine-scale structures and limit the prediction range to positive values using a Softplus activation function.

Temporally constant dropout (p = 0.2) at three different generator depths introduces scale and rain event-dependent perturbation at low, mid, and high frequencies and enables spatio-temporally continuous probabilistic downscaling. The perturbation in combination with the ensemble loss supports the model in reconstructing the missing tail of the ERA5 precipitation distribution.

During inference mode, i.e., for evaluation and global prediction, we apply three additional operations. First, we freeze the dropout seed for each produced ensemble member, which improves the spatio-temporal consistency of the rainfields compared to a random perturbation in space and time. Second, we cut the outermost edges (24 km and 1 h) to remove boundary effects. For global predictions, this routine differs slightly, as described in the “Data preparation” section. Third, we apply a patchwise mean field bias correction to the predictions53, by multiplying the average predicted rainfall by a single value to match the average rainfall amount of the associated ERA5 input patch. This ensures that the provided ERA5 precipitation amount is preserved and that the model can be applied in regions where the ERA5 bias strongly deviates from the training distribution.

The overall design of the generator is memory efficient and can be run in inference on smaller GPUs (10 GB per sample). This allows the application of our model by a broad research community, not only those with access to the latest-generation GPUs with large memory.

The discriminator (see Fig. 5) is trained simultaneously with the generator. Its inputs are the temporal sequences of high-resolution prediction or observation, as well as the coarse-resolution context provided to the generator. Its training objective is to decide if the high-resolution field is real or artificially generated. The loss function is binary cross-entropy. Within the model, the high-resolution and low-resolution data are treated separately, and as a first step, Gaussian noise (mean = 1, std. 0.05) is added to the input data to prevent the model from learning to distinguish rainfields based on quantization characteristics. A series of Res3D blocks then processes the data and extracts spatio-temporal features. The coarse and high-resolution inputs are concatenated at a late stage to encourage a comparison based on latent features extracted on multiple resolutions. The discriminator model is thereby used as a powerful dynamical loss function for the generator, which learns to discriminate structure- and distribution-related rainfall characteristics.

Model details

For downsampling operations, the skip connection of the Res3D blocks includes a 3D-convolutional layer with a kernel size of 1 and instance normalization to harmonize the dimensions. All remaining convolutional layers in the networks use a kernel size of 3.

The generator uses 3D reflection padding in all layers with 3D convolution, and the discriminator uses zero padding. Except for the first, second, and last Res3D Block of the generator, the first Res3D block of the high-resolution discriminator path, and the two Res3D blocks of the low-resolution discriminator path, we apply instances of normalized convolutions. For the generator, we use a feature dimension of 96. For the discriminator, the high-resolution features are 128, 128, 128, and 64, and the low-resolution features are 64, 32. After concatenating, the final Res3D Block decreases the features to 64, which are compressed to 1 within the last 3D-convolutional layer.

The specific model architecture stems from an iterative optimization process that started during our investigations for precipitation video-super-resolution in ref. 32 and was further developed for the task of ERA5 precipitation downscaling. Thereby, we also tried, e.g., state-of-the-art vision transformer network layers as a generator, which did not result in a performance improvement and led us to stick to the well-proven 3D Residual layers. In general, an extensive hyperparameter optimization is desirable. Due to the computational complexity, long training runs, and limited computational resources, we could not test all possible parameter combinations and therefore cannot state that spateGAN-ERA5 provides the best possible results. We would rather invite the research community to build on our work and further improve the downscaling of precipitation data.

Objective function

As an objective function, we use a well-known stepwise adversarial training strategy54,55.

The discriminator D receives the ERA5 context X and target observations Y or predictions \(\hat{Y}\) of the generator and is trained to minimize the binary cross-entropy loss

$${{\mathcal{L}}}_{D}=-{{\mathbb{E}}}_{X,Y}[\log D(X,Y)]-{{\mathbb{E}}}_{X,\hat{Y}}[\log (1-D(X,\hat{Y}))]$$
(1)

The generator loss includes an adversarial loss

$${{\mathcal{L}}}_{{\rm{GAN}}}(G)=-{{\mathbb{E}}}_{X}[\log D(X,G(X))]$$
(2)

and an ensemble L1-loss defined as

$${{\mathcal{L}}}_{{\rm{L1}}}(G)=\overline{\left| Y-\frac{1}{3}\mathop{\sum }\limits_{i=1}^{3}{\hat{Y}}_{i}\right| },$$
(3)

which compares high-resolution targets to the ensemble mean prediction of 3 members \({\hat{Y}}_{1},{\hat{Y}}_{2},{\hat{Y}}_{3}\). This ensures that the predictions remain close to the ground truth while reducing the double penalty of small convective cells or heavy precipitation misplaced during training.

The total generator loss is

$${{\mathcal{L}}}_{G}={{\mathcal{L}}}_{{\rm{GAN}}}(G)+{{\mathcal{L}}}_{{\rm{L1}}}(G)$$
(4)

Training and model selection

The model is trained for 2 × 105 adversarial training steps. The learning rate is 1 × 10−4 for the generator and 2 × 10−4 for the discriminator and uses AdamW optimizer56 with β1 = 0.0 and β2 = 0.999 (Discriminator: β1 = 0.0 and β2 = 0.5). We employ data-parallel training on 3 Nvidia A100 GPUs with 80 GB of memory each for 4 days. The batch size is set to 9 per training step. In inference mode, downscaling 1 patch takes 0.04 s on one A100 GPU.

We save all model weights after every 250 training steps and identify the best generator training state by downscaling and evaluating the independent model selection dataset 4.7. We select the final model by calculating the average of the ensemble FSS (meFSS) of the thresholds 0.1, 1, 3, 5, and 8 mm/h, spatial scales 1, 4, 8, 16, 32, 64, and 128 km, and temporal scale of 1 h. This considers the ensemble quality and location accuracy for different categories of rainfall intensities, independent of the heavily skewed distribution of rainfall.

Evaluation

For evaluation, we verify the performance of the downscaling methods using a set of quantitative scores since no single metric is capable of capturing the complexity of highly resolved rainfields. We calculate the root mean square error (RMSE) is a pixel-wise error computed for a single predicted ensemble member:

$$RMSE=\sqrt{\overline{{(Y-{\hat{Y}}_{i})}^{2}}}$$
(5)

The continuous ranked probability score (CRPS)57 measures the prediction accuracy by accounting for the ensemble spread and bias. The Cumulative Density Function (CDF) of the predicted ensemble at a specific point and time step (\(\hat{F}(xt)\)) is compared to the observed rainfall y.

$$CRPS(\hat{F},y)=\mathop{\int}\nolimits_{\infty }^{-\infty }{(\hat{F}(xt)-1(xt\ge y))}^{2}dxt$$
(6)
$$1(xt\ge y)\mapsto \left\{\begin{array}{ll}0:\quad xt < y\\ 1:\quad xt\ge y\end{array}\right.$$
(7)

We report the CRPS as the average CRPS for each dataset. For deterministic methods (ensemble size of 1), i.e., for interpolated ERA5, this score reduces to the mean absolute error (MAE).

$$MAE=\overline{\left\vert (Y-{\hat{Y}}_{i})\right\vert }$$
(8)

The fractions skill score (FSS)58,59 is defined as

$$FSS=1-\frac{\overline{{({f}_{\hat{Y}}-{f}_{Y})}^{2}}}{\overline{{{f}_{\hat{Y}}}^{2}}+\overline{{f}_{Y}^{2}}},$$
(9)

where fY (resp. \({f}_{\hat{Y}}\)) is the fraction of pixels within a spatial and temporal (s, t) neighborhood that exceed a certain observed (resp. predicted) rainfall intensity threshold (σ). The averaging is performed over the respective neighborhoods of all locations and time steps of each evaluation dataset. For the ensemble FSS, the fraction of ensemble members exceeding (σ) is considered.

We calculate the mean FSS (mFSS) or mean ensemble FSS (meFSS) of a set of different scales (s = 0, 4, 8, 16, 32, 64, 128, 256 km, t = 1h) and thresholds (σ = 0.1, 1, 3, 5 mm/h). The ΔmFSS is the relative deviation of the meFSS of rainFARM and spateGAN-ERA5 to the mFSS of interpolated ERA5, expressed as a percentage, and illustrates the performance benefits when considering an alternative downscaling method instead of pure interpolation. For data on ERA5 resolution, the mFSS considers spatial scales of 0, 24, 96, and 192 km and rain thresholds of 0.1, 1, 3, and 5 mm/h.

The radially averaged power spectral density (RAPSD) and power spectral density (PSD)60,61 measure how power is distributed across spatial and temporal frequencies. The temporal PSD acts thereby as an indicator for plausible advection. The RAPSD is calculated for single images using the PySTEPS62 implementation and is averaged for each evaluation dataset. The PSD is calculated along the temporal dimension for each pixel and for each week of the evaluation datasets and is afterwards averaged for each dataset. Additionally, we report the normalized RAPSD and PSD, where the power spectrum of each image or time sequence is normalized so that it sums to one.

We use rank histograms63,64 to validate the variability and reliability of an ensemble of probabilistic rainfall predictions. For each pixel and time step of the evaluation datasets, 100 ensemble predictions are considered in increasing order, and the normalized rank r of the actual observation value is determined. Perfectly calibrated ensembles show a uniformly distributed r, where predictions and observations stem from the same distribution.

We investigate the spatial anisotropy of rainfields by calculating the autocorrelation of single images of observations and predictions for spatial lags from 0 to 60 km in x and y direction65. We estimate an ellipse from the 0.5 Pearson Correlation Coefficient (PCC) counterline for each individual autocorrelation field and retrieve the variables' length of major axis a and length of minor axis b to determine the linear eccentricity

$$ec{c}_{l}=\sqrt{{a}^{2}-{b}^{2}},$$
(10)

eccentricity

$$ecc=\sqrt{1-\frac{{b}^{2}}{{a}^{2}}},$$
(11)

and size

$$size=\sqrt{a* b}.$$
(12)

Furthermore, we compute the orientation of the ellipse, i.e., of the major axis, in degrees.

We define the BIAS as

$$BIAS=\frac{\overline{Y}-\overline{X}}{\overline{Y}}$$
(13)

where \(\overline{X}\) is the average predicted precipitation amount of each evaluation region and \(\overline{Y}\) is the average observed rainfall.

During evaluation, spateGAN-ERA5 downscales patches that overlap in the temporal dimension to generate a continuous sequence of temporally consistent rainfields, by keeping the central 2 h of each patch. For the case study videos, a linear blending approach is applied to 1 h overlapping periods, with weights decaying from 1 to 0, effectively smoothing out minor remaining temporal discontinuities in the predictions.

In total, the probabilistic model performance is evaluated using 100 ensemble members for calculating rank histograms and CRPS shown in Supplementary Fig. 1 and Supplementary Table 1. For the ensemble FSS and meFSS, we calculate only 6 members since the score converges at a small ensemble size. For the presented evaluation, rain rates smaller than 0.01 of all compared datasets are set to zero.

Datasets

The model input and, therefore, the only dataset required for applying spateGAN-ERA5 are the convective and large-scale variables from the ERA5 reanalysis. The model is trained using gauge-adjusted and climatology-corrected radar data in Germany. We use two additional radar datasets for evaluation from the United States and Australia to test the model’s ability for generalization outside of its training distribution. Even if it seems obvious at first to include data from the US and Australia for model training, we have deliberately refrained from doing so. Pure radar observations can be highly error-prone and do not match the quality of a sophisticated, gauge-adjusted, and climatologically corrected product such as RADKLIM-YW. Due to the lack of high-resolution data availability, we use radar observations to get an indication of spateGAN-ERA5’s generalization capabilities.

ERA5 dataset

The ERA5 reanalysis provides global, hourly model data spanning the past 70 years12,66. It integrates observational data with numerical model predictions through advanced data assimilation techniques, resulting in a high-quality benchmark dataset. For precipitation, the ERA5 4D-var system assimilates hourly NCEP stage IV gauge-adjusted weather radar precipitation information over the US67,68. In this study, we used the years 2009–2021, where ERA5 aligns with the available radar data. We utilize the variables convective and large-scale precipitation of hourly ERA5 data as input for spatio-temporal downscaling. Including additional variables as input, such as wind components, temperature, pressure level, etc., did not enhance overall performance in the presented setup.

We do not use finer resolved ERA5-land precipitation estimates, since they lack valuable scale-related information29,69,70, exclude oceans and coastal areas, and have a higher release latency71.

Despite the known limitations of ERA5 precipitation estimates, which include spatially heterogeneous quality, biases12,22, a tendency to smooth out local extremes due to the coarse resolution of 0.25° and 1 h72, and limitations in modeling convective events45,73, the product is most commonly used in environmental research.

RADKLIM-YW Germany

For training, model selection, and part of the validation of spateGAN-ERA5, we use the gauge-adjusted and climatology-corrected weather radar product RADKLIM-YW provided by the German Meteorological Service (DWD) as target data42,74.

This product is a composite of precipitation information from a network of 16 C-band weather radars. It is adjusted by approximately 1000 rain gauges that are homogeneously distributed in Germany with a density of one gauge per 330 km2. In addition to the RADOLAN gauge adjustment, effects like range-dependent underestimation and beam blockage are covered by an additional climatological correction.

The grid extent is 900 km × 1100 km in polar stereographic projection, covering almost the entire Germany and its surrounding border regions, with a resolution of 1 km × 1 km and a temporal resolution of 5 min. Each grid cell represents a 5 min. rainfall sum with a quantization of 0.01 mm. Regions not covered by the 150 km measurement radii of the radars or missing measured values are marked with “NaNs.” For our investigation, we used data on the provided km grid, coarsened to 2 km and 10 min. resolution. We use the years 2009–2020 for model training, the first half of the year 2021 for model selection, and the second half for evaluation, preventing data leakage and testing for generalization abilities. For evaluation, we select two fixed locations of the size 336 km × 336 km, highlighted in Fig. 1, covering almost the entire country.

Multi-Radar Multi-Sensor System (MRMS) United States

For validation purposes, we use the radar composite from the Multi-Radar Multi-Sensor (MRMS) system comprising 146 WSR-88D radars covering the US and 30 Canadian radars75,76. Climatic conditions in the United States have a high variability, ranging from continental, subtropical, and Mediterranean to tropical.

The MRMS dataset we use covers the time period from July to December 2021 and is not gauge-adjusted. Alternative gauge-adjusted QPE products are not available at a sub-hourly resolution and, therefore, are not suitable for most parts of our analysis.

MRMS covers the region from 20° to 55° latitude North and 130° to 60° longitude West with a resolution of 0.01° in both latitude and longitude directions. The temporal resolution is 2 min. We select 6 regions exhibiting a high radar quality and covering different climatic regions of the country (see Fig. 1, yellow boxes). For evaluation, we regrid the radar observations of each 6 locations to their associated regular km UTM projection and downsample them to 2 km and 10 min. resolution. Each location has a domain size of 336 km × 336 km.

Australian Radar Network

We additionally use quantitative precipitation estimates from the Australian operational radar network77.

We select data from 6 different C-band weather radars, covering subtropical regions across the country, for the period from July to December 2021 (see Fig. 1). The individual locations have a radar coverage of 150 km and are selected by considering less beam blockage, data availability, and homogeneous distribution. 3 of these radar sites operate Doppler radars. The QPE is gauge-adjusted but strongly depends on the availability of the heterogeneously distributed rain gauge observations78. An increased bias between ERA5 and the Australian radar was visible, and the radar quality may be a larger factor than in Germany (see Supplementary Table 1). The product has a spatial grid resolution of 0.5 km × 0.5 km using an Albers Conical Equal Area projection and a temporal interval between 5, 6, and 10 min. For evaluation, we downsample the observations to 2 km and 10 min. resolution. Due to the smaller radar coverage, each location has a domain size of 280 km × 280 km.

Additionally, we select two of the northernmost radar observation stations, located in Darwin and Weipa, Australia, and use the time period from January to March 2021 to cover tropical rainfall regimes in our separate investigation shown in the Supplementary Information X.

Data preparation

Observation data and ERA5 precipitation estimates are adjusted to be used for model training, selection, evaluation, and global inference as described below.

Training and model selection dataset

For training, we draw random target samples from RADKLIM-YW, each with 48 continuous radar observation time steps and a size of 168 × 168 pixels, i.e., 8 h and 336 km × 336 km. The associated model input is received by first interpolating ERA5 data to the target grid and afterwards downsampling the extracted patches to 24 km and 1 h to approximate the initial resolution.

Since most of the time, little to no rain falls in the training region of Germany, we apply a subsampling routine, selecting only samples with a sufficient amount of wet pixels and total precipitation in both input and target to avoid learning from data that contains little to no rain and fewer wet pixels. For each randomly drawn sample, the following conditions must be fulfilled by the ERA5 input X and the RADKLIM-YW observation y:

  1. 1.

    X and y do not contain missing values

  2. 2.

    The 66th quantiles of the pixel values in X and y exceed ε1, where \({\varepsilon }_{1}=| -50\varepsilon^ {\prime} +500|\) and where \(\varepsilon^ {\prime}\) is drawn from Lognormal (0, 1).

  3. 3.

    ∑h,w,t X > ε2 and ∑h,w,ty > ε2, where ε2 = ∣ − 450ε + 4500∣ and where ε is drawn from Lognormal (0, 1).

The distribution of the thresholds ε1 and ε2 is shown in Fig. 6 and roughly reflects the inverse probability of drawing samples that match the given thresholds. The resulting number of observation samples contained in the training data is about 20,000 (850 GB). During training, we apply standard data augmention79 in the form of a rotation (90° or 270°) or reflection (vertical or horizontal) to every alternate sample passed to the model, increasing sample diversity and reducing directional biases, particularly for Germany with dominant westerly wind patterns.

Fig. 6: Probability density functions (PDF) of the dynamic thresholds used in the subsampling routine.
figure 6

a PDF of rainfall intensity threshold ε1 and b PDF of total precipitation amount threshold ε2.

For model selection, we randomly draw additional samples and apply them to the subsampling routine. We select 1000 samples from the temporally independent time period (January–June 2021). We adjust the average rainfall of the targets of this dataset, using a scalar multiplication, so that it matches the average rainfall of the corresponding ERA5 data. This supports the identification of a model state that tends to modify the average precipitation of the ERA5 input samples less drastically and allows the model to be applied outside the training region.

Evaluation dataset

SpateGAN-ERA5 is evaluated using a temporally and spatially independent dataset. The evaluation period contains every first week of the months of July–December 2021 and every first week of the months of January–March 2021 for evaluating tropical rainfall events in Australia. The data is sampled using fixed patch locations in the US, Germany, and Australia, highlighted in Fig. 1. For the associated ERA5 samples, the data are projected to the observation grid and afterwards interpolated to 24 km resolution. The domain size is 672 km and includes the previous and following 8 h of the evaluation observation time period.

To analyze the spatial characteristics of the predicted rainfields, i.e., radially averaged power spectral density and anisotropy, we select a subset of each evaluation dataset that exhibits greater consistency between ERA5 and radar observations. This subset includes cases where interpolated ERA5 achieves an mFSS score exceeding 0.2.

In addition to the high-resolution observations and predictions, we evaluate the performance of the individual downscaling methods and datasets on a coarser resolution, approximately that of ERA5 (see Supplementary Information Section 1.5). Therefore, we average the observations and predictions of the evaluation datasets to a spatial resolution of 24 km using 2D average pooling and aggregate the temporal dimension to 1 h resolution.

Generation of global fields

We define a processing pipeline for producing seamless global high-resolution precipitation maps from a deep learning model that operates on patchwise downscaling.

First, ERA5 data on its original lat-lon grid is segmented into patches. Each patch covers a regular spatial extent of 672 km × 672 km. We calculate the necessary ERA5 lat-lon coordinates to maintain these patches with the required spatial extent by using the Haversine formula. To simplify the process, the latitude center coordinate of each patch is used to determine the longitudinal extent. Resulting spatial distortions in the longitude directions can be neglected due to the small patch sizes. In comparison to the evaluation and training datasets, where ERA5 is regridded onto a regular kilometer grid using the radar observation projection or UTM projection, this is a more efficient method for global high-resolution mapping. The patches are designed to overlap, such that the target prediction domain of 336 km × 336 km overlaps by approximately 10% in both latitude and longitude directions.

The generated patches are then interpolated onto a regular grid with dimensions of 672 km × 672 km using nearest neighbor interpolation. This data has an approximate resolution of 24 km and enters the spateGAN-ERA5 model as input data. Downscaling of patches on a km-grid ensures that the model receives data that does not exhibit any latitude-dependent spatial distortion of physical properties. After downscaling, spateGAN-ERA5 applies a mean field bias adjustment. Due to extensive areas of uncertain, low-intensity rainfall in the ERA5 dataset - particularly over ocean regions - all ERA5 rain rates below 0.1 mm/h are set to zero for this adjustment. The resulting downscaled high-resolution patches are seamlessly interpolated onto a global latitude-longitude grid with a resolution of 0.018°, which corresponds to approximately 2 km at the equator.

To combine the individual overlapping patches, a linear weighting (decaying from 1 to 0 while approaching the border of the patch) is applied in the overlapping regions. This blending process ensures smoother transitions between patches, aiming for continuous large-scale rainfall field circulation (see Supplementary Fig. 13).

Reference methods

RainFARM is a statistical downscaling approach implemented in the PySTEPS package62. It produces small-scale variability by a stochastic process that estimates and extends the spectral slope from each coarse input patch with an estimated scaling factor while preserving key statistical properties. Most importantly, rainFARM produces an isotropic spatial distribution and preserves the rainfall amount when aggregated to the initial resolution.

RainFARM, therefore, serves as a suitable baseline method. Similar to our deep learning approach, it does not rely on additional input data such as atmospheric variables or orography. It was specifically developed for meteorological-scale downscaling, has been successfully applied in various downscaling studies across different contexts33,37,80 and allows for the generation of multiple ensemble members.

In our study, we apply spatial downscaling of ERA5 total precipitation using the advanced spectral rainFARM algorithm44, followed by temporal interpolation. The probabilistic downscaling is conducted using a different fixed random seed for the stochastic component of the method.

For this particular problem, the performance was better than applying the combined spatio-temporal downscaling operation described in ref. 43. Downscaling and aggregating the individual ERA5 variables, convective and large-scale precipitation separately, lead to negligible differences. Unlike spateGAN-ERA5, rainFARM downscales patches of the whole ERA5 input domain of 672 km × 672 km and 16 h, and is afterward cropped to match the domain of the radar observations from the evaluation datasets.

Trilinear interpolation of ERA5 total precipitation, in both space dimensions and the time dimension, serves as a simple baseline where the ERA5 rainfall information can be compared to the high-resolution radar observation without an artificial generation of small-scale features. We interpolate the projected ERA5 data on the coarse km grid described in the “Data preparation” section.