A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA
David C. Wheeler, Jonathan A. Traum, Gregory E. Schwarz, Celia Z. Rosecrans, Katherine M. Ransom, Bernard T. Nolan, George Kourakos, Bryant C. Jurgens, Thomas Harter, Jo Ann M. Gronberg, Claudia C. Faunt, Sandra M. Eberts, Andrew Bell, Kenneth Belitz | June 9th, 2017
Intense demand for water in the Central Valley of California and related increases in groundwater nitrate concentration threaten the sustainability of the groundwater resource. To assess contamination risk in the region, we developed a hybrid, non-linear, machine learning model within a statistical learning framework to predict nitrate contamination of groundwater to depths of approximately 500m below ground surface.
A database of 145 predictor variables representing well characteristics, historical and current field and landscape-scale nitrogen mass balances, historical and current land use, oxidation/reduction conditions, groundwater flow, climate, soil characteristics, depth to groundwater, and groundwater age were assigned to over 6000 private supply and public supply wells measured previously for nitrate and located throughout the study area. The boosted regression tree (BRT) method was used to screen and rank variables to predict nitrate concentration at the depths of domestic and public well supplies.
The novel approach included as predictor variables outputs from existing physically based models of the Central Valley. The top five most important predictor variables included two oxidation/reduction variables (probability of manganese concentration to exceed 50 ppb and probability of dissolved oxygen concentration to be below 0.5 ppm), field-scale adjusted unsaturated zone nitrogen input for the 1975 time period, average difference between precipitation and evapotranspiration during the years 1971–2000, and 1992 total landscape nitrogen input.
Twenty-five variables were selected for the final model for log-transformed nitrate. In general, increasing probability of anoxic conditions and increasing precipitation relative to potential evapotranspiration had a corresponding decrease in nitrate concentration predictions.
Conversely, increasing 1975 unsaturated zone nitrogen leaching flux and 1992 total landscape nitrogen input had an increasing relative impact on nitrate predictions.
Three-dimensional visualization indicates that nitrate predictions depend on the probability of anoxic conditions and other factors, and that nitrate predictions generally decreased with increasing groundwater age.