Crop type prediction based on farmers declarations

Jong, S. de

View/Open

Thesis - crop type prediction based on farmers declarations - Lineke de Jong - 13 augustus 2018.pdf (4.161Mb)

Publication date

2018

Author

Jong, S. de

Metadata

Show full item record

Summary

Every year the Dutch agricultural businesses submit all sorts of agricultural data like crop type and parcel geometry to RVO.nl in accordance with CAP (the Common Agricultural Policy). This annual declaration is an enormous burden for the agricultural entrepreneur. To lessen this burden, RVO.nl searched for manners in which the declaration can be simplified. The prediction of crop types cultivated in the next year based on historical declaration information was one of the efforts. Crop type is very detailed information, in 2017 364 individual crop types could be declared in the Netherlands. Former research performed by RVO.nl in 2016 tried to predict crop types by using information about crop rotation schemas which was extracted out of historical declaration information, however, only very few rotation schemas were found. A reason for finding very few rotation schemas was the change in crop type codification during the research period. Therefore this research focused on identifying the length of crop rotation schemas in the declaration information and assessed whether this information improved crop type prediction for the declaration year of 2017. All the declared crop type information was gathered for the research period of 2008 till 2017 and combined with additional variable data: farm type, soil type, groundwater level, climate data, user identifier and organic farming. Sixty crop types were adjusted for the change in crop type codification, corresponding crop type codes from 2017 were used instead. Furthermore, a study area with a variety of crop rotation schemas was selected. Then search patterns were used to find repetitive patterns in the crop sequences derived from the historical declaration information in order to identify the length of crop rotation schemas. This information was stored as a new variable for predicting crop types. All the input variables contained nominal values. As a target variable was present (crop types of 2016), supervised modeling was applied. In order to predict crop types in the next year (crop types of 2017), several classification trees and neural network models were built. By fine-tuning the property settings, models were searched for that produced the lowest misclassification rates. For 29.622 of the 185.108 sub parcels a crop rotation schema was identified. Multi-way split classification trees produced the lowest misclassification rates. The prediction accuracy was highest for sub parcels where crop rotation schemas were identified (45,8%). The prediction accuracy for sub parcels where no crop rotation schema was found was much lower (31,3%). The average prediction accuracy of the declared parcels was 42,7%. The length of crop rotation schemas was relatively important in predicting crop types. It slightly improved the number of correctly classified sub parcels with 0,9 percentage point, and was used most of all variables in the splitting process during modeling. Class confusion was present in the prediction results. Crop types that frequently formed a crop rotation schema, were often mixed up with each other. It was found that the aggregation of crop types would lead to a higher prediction accuracy. Furthermore, the use of class weights or priors in the modeling process might increase the prediction accuracy of minority classes. Additionally it was assumed that the random forest modeling technique produces a higher prediction accuracy than a classification tree for it is less prone to overfit the training data. It was also assumed that stratifying the research area might increase the prediction accuracy based on the presumption that similar crop types grow in a small area. An area with similar crop types is assumed to be easier to predict. The main conclusion is that the length of crop rotation schemas can be identified from historical declaration data and that the variable containing the length of crop rotation schemas was relatively important in the modeling process. By using this variable in the prediction process, the prediction accuracy was improved with 0,9 percentage point. Therefore it is concluded that this variable aids in predicting crop types in the next year. Suggestions are given to improve the current prediction accuracy of 42,7%.

URI

https://studenttheses.uu.nl/handle/20.500.12932/39932

Collections

Theses