Grasshopper infestation prediction : an application of data mining to ecological modeling Public Deposited


Attribute NameValues
Abstract or Summary
  • This thesis presents a case study of applying machine learning tools to build a predictive model of annual infestations of grasshoppers in Eastern Oregon. The purpose of the study was two-fold. First, we wanted to develop a predictive model. Second, we wanted to explore the capabilities of existing machine learning tools and identify areas where further research was needed. The study succeeded in constructing a model with modest ability to predict future grasshopper infestations and provide advice to farmers about whether to treat their croplands with pesticides. Our analysis of the learned model shows that it should be able to provide useful advice if the ratio of treatment cost to crop damage cost is 1 to 1.67 or more. However, there is some evidence that the model is not able to make good predictions in years with extremely high levels of grasshopper infestation. To arrive at this successful model, three critical steps had to be taken. First, we had to properly formulate the prediction task both in terms of exactly what we were trying to predict (i.e., the probability of infestation) and the spatial area over which we could make predictions (i.e., areas within 6.77 kms radius of a weather station). Second, we had to define and extract a set of features that incorporated knowledge of the grasshopper life cycle. Third, we had to employ evaluation metrics that were able to measure small improvements in the quality of predictions. The study identified important directions for future research. In the area of grasshopper ecology, there is a need for improved data gathering tools including a much denser and more widespread network of weather stations. These stations should also measure subsoil temperatures. Recording of the dates of hatching of grasshopper nymphs would also be very valuable. In machine learning, methods are needed for automating the definition and extraction of features guided by qualitative knowledge of the domain, such as our qualitative knowledge of the grasshopper lifecycle.
Resource Type
Date Available
Date Issued
Table of Contents
  • Abstract_______________________________________________________________ 3 Table of Contents _______________________________________________________ 5 Chapter 1 Introduction __________________________________________________ 7 1.1 The Grasshopper Infestation Prediction Problem______________________________ 7 1.2 The Data Mining Process __________________________________________________ 8 Chapter 2 Data _______________________________________________________ 10 2.1 Sources of Data _________________________________________________________ 10 2.1.1 DBMS_____________________________________________________________________10 2.1.2 Text file ___________________________________________________________________10 2.1.3 Binary file__________________________________________________________________11 2.2 Imperfections in the data _________________________________________________ 11 2.2.1 Missing values ______________________________________________________________12 2.2.2 Partially Missing Values_______________________________________________________12 2.2.3 No Value___________________________________________________________________12 2.2.4 Noise and Uncertainty ________________________________________________________12 2.2.5 Missing attributes ____________________________________________________________13 2.2.6 Dynamic data _______________________________________________________________13 Chapter 3 Feature Definition and Extraction _______________________________ 14 3.1 Issues in Feature Extraction ______________________________________________ 14 3.1.1 Reducing Dimensionality ______________________________________________________14 3.1.2 Defining Useful Features ______________________________________________________14 3.1.3 Incorporating Background Knowledge____________________________________________15 3.2 Methods for Feature Extraction ___________________________________________ 16 3.2.1 Semi-automated Feature Extraction ______________________________________________16 3.2.2 Manual Feature Extraction _____________________________________________________17 3.2.3 Time and Spatial Feature Extraction _____________________________________________17 3.2.4 Types of feature extraction _____________________________ Error! Bookmark not defined. 3.2.5 Tuning up the feature _________________________________________________________18 Chapter 4 Learning Algorithms __________________________________________ 19 4.1 The Learning Problem___________________________________________________ 19 4.1.1 Training Examples, Classes, etc. ________________________ Error! Bookmark not defined. 4.2 The Fundamental Tradeoff _______________________________________________ 19 4.2.1 Tradeoff between Number of Examples, Size of Hypothesis Space, Accuracy of Result _____20 4.2.2 Overfitting _________________________________________________________________20 4.3 Decision Tree Algorithms_________________________________________________ 21 4.3.1 Growing ___________________________________________________________________22 4.3.2 Pruning ____________________________________________________________________22 4.4 Regression Tree Algorithms ______________________________________________ 22 Chapter 5 Grasshopper Infestation Prediction ______________________________ 25 5.1 Linear Regression and State-wide Models ___________________________________ 25 5.1.1 Modeling and Methods________________________________________________________25 5.1.2 Results ____________________________________________________________________26 5.2 Decision Tree and Grid Site Model_________________________________________ 26 5.2.1 Models and Methods _________________________________________________________26 5.2.2 Results ____________________________________________________________________27 5.3 Regression Tree and Region Site Model _____________________________________ 27 5.3.1 Models and Methods _________________________________________________________27 5.3.2 Results ____________________________________________________________________27 5.4 Decision Tree and Weather Station Site Model _______________________________ 27 5.4.1 Models and Methods _________________________________________________________27 5.4.2 Results ____________________________________________________________________28 5.5 Probabilistic Decision Trees and the Weather Station Site Model________________ 28 5.5.1 Models and Methods _________________________________________________________28 5.5.2 Results ____________________________________________________________________32 Chapter 6 Conclusions and Future Improvements ___________________________ 41 Appendix A: Pictures of Grasshopper Infestation in Eastern Oregon Problem _____ 43 Appendix B: Summary of Study, Processing and Modeling in the Grasshopper Infestation Project _____________________________________________________ 45 B.1 Data Sources ___________________________________________________________ 45 B.2 Imperfect Data Treatments_______________________________________________ 45 B.3 Data Manipulations and Productions_______________________________________ 46 B.4 Feature Extractions _____________________________________________________ 48 B.6 Prediction and Evaluation________________________________________________ 51 Appendix C: Feature Set in Weather Station Site Model_______________________ 53 C.1 List and Definitions _____________________________________________________ 53 C.2 Information Gain Performance ___________________________________________ 54 References ___________________________________________________________ 59
Rights Statement
Additional Information
  • description.provenance : Approved for entry into archive by Linda Kathman( on 2009-06-03T22:55:31Z (GMT) No. of bitstreams: 1 Waranun_Bunjongsat.pdf: 605775 bytes, checksum: 2286b1357a4d3c83a93b85a2c765d1ed (MD5)
  • description.provenance : Submitted by Philip Vue ( on 2009-06-03T18:33:46Z No. of bitstreams: 1 Waranun_Bunjongsat.pdf: 605775 bytes, checksum: 2286b1357a4d3c83a93b85a2c765d1ed (MD5)
  • description.provenance : Made available in DSpace on 2009-06-03T22:55:31Z (GMT). No. of bitstreams: 1 Waranun_Bunjongsat.pdf: 605775 bytes, checksum: 2286b1357a4d3c83a93b85a2c765d1ed (MD5)



This work has no parents.

Last modified

Downloadable Content

Download PDF