Floran Baumgartner will present his M.Sc. thesis “The potential of Sentinel-2 time series for yield estimation of a perennial wild plant mix-ture using machine learning” on Friday 29th of October at 10am.
From the abstract: “Monocultures are generally accompanied by losses of small-scale structures, ecosystem services and biodiversity (Tamms et al., 2020). One strategy to combat these effects is seen in perennial wild plant mixtures (WPMs), like ‘Hanfmix’ (Vollrath & Marzini, 2016), which promote biodiversity and provide food and habitat for the local fauna. WPMs consist of around 30 different annual and perennial plant species (Table 1). They regrow after each harvest for about five years (Degenbeck, 2020). External and internal factors, like climate conditions and interactions between species, influence the composition of the mixture. After the first year of growth annual plants are replaced by perennial species and the composition further changes (Figure 1). This concept contrasts from the monocultural farming methods and introduces new aspects and uncertainties into the analysis by remote sensing (RS) methods, like the perennial growth times and a constantly changing species composition. The WPM is a relatively new study subject and RS methods provide the tools for studying ‘Hanfmix’ on a large scale and over long study periods. This study explores the potential of RS, in the form of Sentinel-2 (S2) NDVI time series, to predict the crop yield of ‘Hanfmix’. It was assessed for a study period of four years between 2017 and 2020 in the region Rhön-Grabfeld in the North of Bavaria, Germany. Weekly NDVI values were derived from S2 time series and information on the climatic conditions, as well as the topography, among other variables, was obtained for each field. Three well established regression models were tested: a multiple linear regression model, as well as the machine learning methods random forest (RF) and cforest (CF). The performance of the models trained on datasets with, and without, NDVI time series was compared to determine the influence of RS on the prediction accuracy. The results show that it is possible to create a better model using only NDVI time series, then using a large dataset containing site-related data and climate time series which does not include RS information. The best modeling and prediction accuracies were reached when training on the dataset containing all available variables. However, a suitable model is required which is not prone to overfitting when confronted with highly dimensional data. The linear regression model produced very high R²adj values of 0.72 but showed signs of overfitting. In that regard, the machine learning approaches are preferrable. Relatively acceptable model accuracies were achieved with the machine approaches trained on the combined database: mean R² values of 0.52 for the RF and 0.51 for the CF, as well as mean RMSE values of 1.78 t ha-1 for the RF and 1.83 t ha-1 for the CF model were reached. These results were obtained despite restrictions of the model setup which introduced uncertainties for several variables, like the cutting height of the crop, as well as varying numbers of fields sowed per year. A key feature of the RF and CF models is that they provide variable importance measures, which quantify of the impact that the different groups of variables have on the model accuracy. The results reinforced the findings of the model performance comparisons: overall, the highest variable importance ranks are occupied by RS time series variables. Especially the dynamic NDVImax, representing the highest NDVI value of each field at a flexible time step, reached the highest variable importance ranks.”
1st Supervisor and Mentor: Dr. Thorsten Dahms
2nd Supervisor: Dr. Sarah Schönbrodt-Stitt