# GloSEM: High-resolution global estimates of present and future soil displacement in croplands by water erosion

The GloSEM platform follows the same principles of most RUSLE-type models or more-complex catchment-scale process-based models, with a driving force (erosivity of the climate [R-factor]expressed in MJ mm h−1 he has−1 yr−1), a resistance term (erodability of the soil [K-factor]expressed in Mg h MJ−1 hmm−1), and other factors representing the farming choice, such as topographical conformation of the field (dimensionless LS-factor), cropping system (dimensionless C-factor) and soil conservation practices (dimensionless P-factor). The long-term annual average soil displacement (Mg ha−1 yr−1) estimates rest on a multiplicative equation (Eq. [1]) of these environmental factors:

$${rm{A}}={rm{R}}cdot {rm{L}}cdot {rm{S}}cdot {rm{K}}cdot {rm{ C}}cdot {rm{P}}$$

(1)

Using a GIS-based gridded modeling approach applied to a RUSLE-type algorithm, such as GloSEM, corresponds to the hypothesis that each cell is independent of the others with respect to soil displacement. The provided estimates of soil displacement by water erosion refer to the amount of sediment displaced within each cell. The model does not, in any way, include areas of slope that experience net deposition over the long term1. This is because the displaced soil amount is not routed downslope across each cell, from hill slope to sink area or riverine system, through a transport/deposition capacity module. This RUSLE-based soil displacement prediction scheme is preferred to other process-based physical models because the latter require a large number of input data that are not yet mature enough for global-scale applications.

### Model parameterization

The six environmental factors described in Eq. (1) were computed as follows.

### Land-cover and management factor

The (dimensionless) C-factor measures the combined effect of the interrelated soil-cover and management variables on the soil erosion process. Here, for its spatial computation, we followed the pathway developed in previous GloSEM applications9.32. However, in this case, the built-in GloSEM probabilistic land-use-allocation module was substituted with an external cropland layer (reference year 2019)––the CGLS-LC100 Collection 3, developed by the ESA and the CGLS31. The CGLS-LC100 Collection 3 comprises a set of annual global land-cover maps (2015–2019), with an overall 80% accuracy, obtained using PROBA-V and Sentinel-2 satellite imagery, and more than 150,000 crowd-sourced training points . The decision to use this new ESA product to represent the global cropland in this new version of GloSEM 1.3 rested on: (i) the high mapping accuracy of the new ESA product; (ii) the high level of agreement with the FAOSTAT cropland information at the global and continental scale; and (iii) its capacity to describe cropland-cover fractions (%) and, in turn, define spatial locations where cropland is mixed with pasture and natural vegetation. The original approach of GloSEM to describe crop rotations was not changed, and it still relies on the 12 years (2001–2012) of crop yields obtained from the FAOSTAT database (https://www.fao.org/faostat/en/# data) for each considered country. The cropland C-factor (CCROP) values ​​were computed at the sub-national level for each of 3,252 administrative regions, as follows:

$${{rm{C}}}_{{rm{CROP}}}=mathop{sum }limits_{{rm{n}}={rm{1}}}^{{ rm{13}}}{{rm{C}}}_{{rm{CROPn}}}cdot left[ % right]{{rm{Region}}}_{{rm{CROPn}}}$$

(two)

Where, CCROP represents the overall C-factor estimated for each sub-country administrative unit layers (FAO GAUL), CCROPn is the individual C-factor for each of the 170 crops considered (FAOSTAT), and [%] RegionCROPn represents the share of each crop in the arable land of the given FAO GAUL administrative unit. the [%] RegionCROPn values ​​were assessed through a statistical downscaling of the national crop statistics using the spatially explicit harvested areas proposed by Monfreda et al.33combining national-, state- and county-level census statistics with remote-sensing data (for 15,990 administrative units)3. 4. The global cropland obtained from the global land-cover map for the reference year 2019 was also applied to the 2070 scenarios. Accordingly, the extent of the global cropland in both temporal scenarios overlapped precisely and consistently, and was equal to ~1.4 billion ha. The regional CCROP values ​​were also kept the same for both temporary scenarios. However, in the Excel spreadsheet we provide further data concerning the country-scale extension of cropland, and estimates computed according to the future land-use/cover scenarios from the integrated assessment model given in the Land-Use Harmonization (LUH2) database.35—that is, SSP1–RCP2.6 derived from IMAGE, SSP2–RCP4.5 derived from MESSAGE-GLOBIOM and SSP5–RCP8.5 derived from REMIND-MAGPIE. While in the present study future land-use change is not considered, readers can use the Excel spreadsheet (Appendix A) to perform country-scale estimates of soil displacement under alternative scenarios (Table S1). In the future, we plan to release refined GloSEM input data with the purpose of allowing readers to perform their own spatially explicit estimates.

### Rainfall-runoff erosivity

The average annual rainfall-runoff erosivity (R-factor, MJ mm h−1 he has−1 yr−1) is spatially defined using a Gaussian process regression (GPR) geostatistical model for both present36 and future9 climatic conditions. The GPR model established a statistical relationship between the R-factor point values ​​(from 3,625 meteorological stations, associated with the Global Rainfall Erosivity Database [GloREDa]in 63 countries that provide sub-hourly [61%] and hourly [39%] pluviographic data) and a set of WorldClim37 climatic data (version 2.1, 30 seconds spatial resolution), acting as spatially exhaustive covariates. The R-factor values ​​for every erosive rainstorm36 were calculated adopting the methodology suggested in USDA Handbook No. 70310, while the USDA’s Rainfall Intensity Summarization Tool was used to facilitate the computation phase of the individual erosive rainstorms. The R-factor values ​​(MJ mm ha−1 h−1 yr−1) from each of the 3,625 meteorological stations resulted from the following equation:

$$R=frac{1}{n}mathop{sum }limits_{j=1}^{n},mathop{sum }limits_{k=1}^{mj}left (E{I}_{30}right)k$$

(3)

Where, EI30 is the rainfall-runoff erosivity of a single event, k; n expresses the number of years observed; and mj expresses the erosive events during a given year, j. The WorldClim climatic data, acting as spatially exhaustive covariates in both the present and future climate scenarios, were: (i) average monthly precipitation; (ii) average minimum and maximum monthly precipitation; (iii) average monthly temperature; (iv) precipitation in the wettest month; (v) precipitation in the driest month; and (vi) precipitation seasonality. It is worth mentioning that, for the future (2070) scenarios, the R-factor values ​​were estimated using the climate projections from 14 general circulation models (GCMs) and considering three different SSPs and RCPs (ie SSP1–RCP2.6, SSP1– RCP4.5 and SSP5–RCP8.5). This yielded a total of 43 possible future scenarios.

### Soil erodability

The K-factor (Mg h MJ−1 hmm−1) spatially defines the susceptibility of the soil to be eroded. Here, it was algebraically defined (Eq. [4]), based on certain intrinsic soil properties, according to the methodology proposed by Wischmeier et al.38. Some of the required soil properties (ie texture, organic matter and coarse fragmentation) were obtained from the International Soil Reference and Information Center SoilGrids database at a 1-km spatial resolution39. Further soil properties, such as soil structure and permeability, were derived according to the methodology proposed for the GloSEM soil erodability map.32which combines textural characteristics (M––percentage of the silt-plus-fine-sand fraction, multiplied by 100, minus the clay fraction), organic matter (OM, in %), soil structure (s) and permeability class (p ):

$$K=frac{left(2.1times 1{0}^{-4};{M}^{1.14};left(12-OMright)+3.25left(s-2 right)+2.5left(p-3right)right)}{100}cdot 0.137$$

(4)

### Slope length and steepness factor

The (dimensionless) LS-factor represents the influence of the terrain on the surface runoff and sediment transport capacity. It was computed using the two-dimensional GIS-based approach proposed by Desmet and Govers40. The slope and upslope contributing area were calculated using hole-filled Shuttle Radar Topography Mission (SRTM) and Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) v2 data with a 3-arc-seconds spatial resolution ( about 90 m). Due to the large amount of processed data, the computational operations were carried out using the Saginaw Area (SAGA) GIS interface in R statistical computing software. We created 870 tiles of ca. 500 × 500 km.

### Conservation support-practice factor

The (dimensionless) P-factor expresses the ratio of soil displacement using a conservation support practice, such as contouring, strip-cropping, terracing or subsurface drainage. Values ​​for the support-practice P-factor are difficult to report above the field-scale due to the lack of spatial information, and they are not taken into account in the vast majority of regional- and global-scale assessments. In GloSEM9.32, the possible mitigation effects of soil conservation rely on the CA data provided by 54 countries to the UN Food and Agriculture Organization’s (FAO’s) AQUASTAT database. These countries regularly communicate the proportion of their cropland managed in accordance with the three FAO CA standards––minimum soil disturbance, organic soil cover and crop rotation/association. Overall, these countries cover 73% of the global cropland. For the remaining 27%, the continental-level CA was considered. For areas under CA, we assumed a reduction in soil erosion ranging between 45% and 70% compared to conventional agriculture. Customized conservation scenarios can be simulated using the provided Excel spreadsheet.

### Heterogeneity of input data

GloSEM versions 1.1 and 1.2 came with a spatial resolution of ca. 250m at the equator. This resolution was set to be consistent with the satellite data of the Moderate Resolution Imaging Spectroradiometer (MODIS), which constituted a key input for the computation of the Land-cover and management factor (C), the spatial interpolation of the rainfall-runoff erosivity. factor (R), and the spatial interpolation of the soil erodability factor (K). Accordingly, all these GloSEM input data were spatially optimized and consistently multiples of the base unit of 250 m defined by the MODIS data. The slope length and steepness factor (LS) was computed at 90 m native resolution (SRTM 3 arc-seconds data) and resampled to 250 m. The resampling caused a loss of detail that we consider within the tolerance threshold.

In the new GloSEM version 1.3, the adoption of the 100 m land-use data provided by ESA caused (i) a slight misalignment of the gridded structure and (ii) the loss of the perfect multiples structure guaranteed from the former 250 m MODIS- based spatial resolution. Despite not being optimal, in our view the increase in heterogeneity of input data due to the change in land-use product is overcompensated for, by the relevant increase in geographical representativeness provided by the ESA fractional (%) cropland layer. A future refinement of the other input data of GloSEM to 100 m spatial resolution is to be considered desirable.

### Error estimate

A summary of error propagation in the present and future predictions was calculated considering: (i) the uncertainty of the spatial predictions estimated using a Markov Chain Monte Carlo approach; (ii) the uncertainty of estimating the area under CA (considered only for the 2015 scenario); (iii) the uncertainty related to the effectiveness of the CA practices using a Monte Carlo method; and (iv) the uncertainty in regional-rainfall-intensity–kinetic-energy relationships. A further uncertainty (v), related to the variation found in the 14 different GCMs, was also considered. The error propagation is expressed as the square root of the sum of squares of the different uncertainties.