Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorOosterom, P.J.M.
dc.contributor.advisorBalado, J.
dc.contributor.authorMolleman, D.M.
dc.date.accessioned2021-08-23T18:00:24Z
dc.date.available2021-08-23T18:00:24Z
dc.date.issued2021
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/41034
dc.description.abstractFor military terrain analysis, a detailed soil map is needed to assess the terrain accessibility during a mission. Predicting soil classes where no data is available is a difficult task. For this reason, the Random Forest algorithm has been applied to predict the individual soil properties: sand, silt, clay, coarse fragments, organic content, and cation exchange capacity which can be combined into the Unified Soil Classification system (USCS) soil classification. Using open-source data points in combination with explanatory variables that are available for all of Europe, the model is trained to predict soil property values in areas where no soil samples are present at a spatial resolution of 30 meters. The predictors used include satellite imagery, spectral indices, hydrological data, digital elevation models and its derivatives. As the liquid limit and the plasticity index are both needed for the USCS, but are not included in European soil samples, they must be calculated using the other soil properties. From the clay content and the cation exchange capacity, a linear regression model was set up using US-data in order for the two properties to be predicted in Europe. The linear regression model reached an R-squared of 0.842 for the liquid limit and 0.895 for the plasticity index. In this study, an innovative method for model validation is used that ensures consistent validation statistics by generating subsets that each contain points that are well distributed over the entire range of values and are also geographically dispersed. It was found that hydrological predictors scored high in importance when predicting sand, silt, clay and cation exchange capacity. Moreover, variation in coarse fragments was mostly explained by the digital elevation model and its derivatives, and the nitrogen content of a soil reaped the highest importance in predicting organic content in soils. In predicting the individual soil properties, the highest amount of variation was explained for clay content, resulting in 68.5%. This is followed by sand and silt content (48.4% and 57.9% resp.). For cation exchange capacity and organic content, explained variations of 40% and 44.5% were attained. The lowest R-squared statistic was reached for coarse fragments where only 38.1 percent of the variation was explained. The Random Forest algorithm proved effective in predicting soil properties with limited samples available while maintaining a spatial resolution of 30 meters. Additionally, an improved method for determining the Atterberg limits was developed to be used in areas where no data on the limits is available. Furthermore, a validation method was constructed that provided consistent statistics describing explained variation.
dc.description.sponsorshipUtrecht University
dc.format.extent11374228
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titleWho can command the Random Forest and make the trees pull Data out of the earth?: Predicting soil types through Random Forest machine learning using open-source data
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordspredicting soil properties, USCS, liquid limit, plasticity index, Random Forest, machine learning, remote sensing, open-source data.
dc.subject.courseuuGeographical Information Management and Applications (GIMA)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record