Graduate Thesis Or Dissertation

Development of a Statistical Emulator of US Crop Yields via a Deep Neural Network Approach

Public Deposited

Downloadable Content

Download PDF


Attribute NameValues
  • Agricultural systems are inherently complex; understanding these systems requires knowledge of climatology, plant physiology, soil physics, economics, and the human psychology of the farmers themselves. Decision support tools strive to leverage existing data to help guide stakeholders towards the best policies and practices for their situation. Quantitative crop simulation models are one decision support tool which can be used to predict crop yield with available data. Crop models can broadly be sorted into two categories: process-based models and statistical models. Process-based models simulate the underlying physiological mechanisms of crop growth and can potentially be applied in out-of-sample scenarios, however, running these models is highly computationally intensive. Statistical models are less computationally intensive but do not provide the same level of causal understanding and can be difficult to apply in out-of-sample scenarios. Statistical emulators retain some of the benefits of processed-based models while achieving the reduced computational speed of statistical models. This is accomplished by treating the results of a process-based model as “true” and training a statistical model on the corresponding inputs and outputs. The reduced computational requirements of statistical emulators make them well suited for integration with large-scale integrated assessment models. In this thesis a statistical emulator for crop yield in the contiguous United States is developed using a deep neural network (DNN) approach. Data from the Agricultural Model Intercomparison Project (AgMIP) Global Gridded Crop Model Intercomparison (GGCMI) datasets are used to train yield emulators for three crop models and three crops. The DNN model architecture was developed by combining two forms of neural networks, a fully connected neural network and a long short-term memory (LSTM) recurrent neural network, in an attempt to capture the different time scales of the relevant inputs. Data relating to soil characteristics, growing season, and weather aggregates are used as input to the DNN. Root mean square error (RMSE) was calculated for each model and crop combination by comparing the reported simulated yield from the GGCMI dataset and the emulated yield from the DNN statistical emulator. RMSE values for all but two emulators were below 10% of the simulated yield range. The normalized RMSE values are comparable with, RMSE values from statistical emulators of crop yield currently available in the literature. The RMSE values reported in this thesis can likely be improved with further optimization of the DNN model architecture and with tuning of relevant hyper-parameters. Given the growing volume of agricultural data and the growing complexity of process-based crop models, DNN approaches may be valuable for the development of future statistical emulators of crop yield as they allow for greater flexibility over current methods.
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Committee Member
Academic Affiliation
Rights Statement
Peer Reviewed
Embargo reason
  • Pending Publication
Embargo date range
  • 2021-03-05 to 2021-10-05



This work has no parents.

In Collection: