- This thesis presents a case study of applying machine learning tools to build a predictive
model of annual infestations of grasshoppers in Eastern Oregon. The purpose of the
study was two-fold. First, we wanted to develop a predictive model. Second, we wanted
to explore the capabilities of existing machine learning tools and identify areas where
further research was needed.
The study succeeded in constructing a model with modest ability to predict future
grasshopper infestations and provide advice to farmers about whether to treat their
croplands with pesticides. Our analysis of the learned model shows that it should be able
to provide useful advice if the ratio of treatment cost to crop damage cost is 1 to 1.67 or
more. However, there is some evidence that the model is not able to make good
predictions in years with extremely high levels of grasshopper infestation.
To arrive at this successful model, three critical steps had to be taken. First, we had to
properly formulate the prediction task both in terms of exactly what we were trying to
predict (i.e., the probability of infestation) and the spatial area over which we could make
predictions (i.e., areas within 6.77 kms radius of a weather station). Second, we had to
define and extract a set of features that incorporated knowledge of the grasshopper life
cycle. Third, we had to employ evaluation metrics that were able to measure small
improvements in the quality of predictions.
The study identified important directions for future research. In the area of grasshopper
ecology, there is a need for improved data gathering tools including a much denser and
more widespread network of weather stations. These stations should also measure
subsoil temperatures. Recording of the dates of hatching of grasshopper nymphs would
also be very valuable. In machine learning, methods are needed for automating the
definition and extraction of features guided by qualitative knowledge of the domain, such
as our qualitative knowledge of the grasshopper lifecycle.