RSVSW/RSVS_classification.Rmd

---
title: "Classification in R"
author: "Stef Lhermitte"
date: "2023-03-14"
output:
  pdf_document: default
  html_document: default
editor_options:
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Classification in R

## Introduction

In this practical session we will learn how to:

-   download (remote sensing) data from Google Earth Engine

-   visualize and preliminary analyse the data

-   apply different machine learning techniques

In order to use run this notebook you will need a Google Earth Engine (GEE) account. If you do not have one yet, create one at [GEE webpage](https://signup.earthengine.google.com/). GEE combines a multi-petabyte catalog of satellite imagery and geospatial data sets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface. The [public data archive](https://developers.google.com/earth-engine/datasets) includes more than thirty years of historical imagery and scientific data sets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis and if you look for a specific data set I recommend to check the [data catalog](https://developers.google.com/earth-engine/datasets).

In this notebook, we will limit the use of GEE to downloading different data sources and subsequently work with the data on your local computer. GEE offers a multitude of more opportunities and functions to process, analyze and visualize the data in the cloud, but covering these is beyond the scope of this session.

Once the data are downloaded, we will perform different types of classifiers/regressions on the data in order to let you understand the different concept of classification based on machine learning techniques in practice.

## Before you start

Before starting this notebook make sure that you install the necessary packages. In the next cell, we load the necessary packages. If you encounter an error message when running the `library()` function in R, it may be because the required package is not installed on your computer. In order to install the package, you can use the `install.packages()` function. For example, if you receive an error when running `library(raster)`, you can install the package by typing `install.packages("raster")` in the console. After installation, you can load the package by running `library(raster)` again. It is important to ensure that all necessary packages are installed before running your R code to avoid errors and ensure that your code runs smoothly.

Additionally, you can also install packages using the R GUI by selecting the `Packages` menu and clicking on `Install`. From there, you can search for and install the desired package.

```{r load-packages, message=FALSE, warning=FALSE}
# Packages for spatial data processing
library(raster)
library(terra)
library(dplyr)
library(reshape2)
library(data.table)
library(leafsync)
library(glcm)
library(RStoolbox)

# Packages for visualization
library(mapview)
library(tmap)
library(ggplot2)
library(ggridges)
library(plotly)

# Machine learning packages
library(caret)

# Packages for general data processing
library(data.table)
library(dplyr)

# set the temporary folder for raster package operations
rasterOptions(tmpdir = "./cache/temp")
```

Finally, before starting our notebook we want to run it in the correct location on our computer by changing the working directory with \`setwd\`. The notebook will work optimally if you install it in the location of the unzipped zip file of this practical session

```{r}
# Change the working directory with setwd
# setwd(location/of/your/unzipped/labZipfile/)
setwd('C:/Users/u0132684/Documents/Lab/Lab')

# Check the working directory by printing it
getwd()
```

## Step 1: Pre-process remote sensing data on GEE

We will do Step 1 and Step 2 in the Google Earth Engine (GEE) code editor (<https://code.earthengine.google.com/>). First, we explore GEE and then export an image to your Google Drive. We can then download the image and import it into R to use it from Step 3 onwards in this notebook.

Link to GEE script: <https://code.earthengine.google.com/2b95fd1bcbe1fe53b0844661c7921bc3>

## Step 2: Download data from GEE

--\> GEE code editor

## Step 3: Read in downloaded data

Now we can import the .tif file.

```{r load-raster}
s2_rl = raster::brick('s2_r.tif')
s2_rl
```

## Step 4: Visualize the data

#### Interactive visualization

Subsequently, you can visualize the data on an interactive map using the [mapview](https://r-spatial.github.io/mapview/) package

```{r plot-interactive}
# use viewRGB to view RGB images interactively
mapview::viewRGB(s2_rl, b=2, g=3, r=4, maxpixels=1242110)
# use viewRGB to view one band images interactively
mapview::mapview(s2_rl[[1]])
# Combine maps by using the + symbol
mapview::viewRGB(s2_rl, b=2, g=3, r=4, maxpixels=1242110) + 
  mapview(s2_rl[[8]], maxpixels=1242110)
```

#### Static visualization

Or a static map using the `tmap` [package](https://r.geocompx.org/adv-map.html#static-maps):

```{r plot-static-tm}
# Create a tmap object (map_1) with a raster layer (s2_rl) colored using a custom RGB color scheme based on the layer's  values
map_1 = tmap::tm_shape((s2_rl/5500*255)) + 
  # Set the raster layer as the main layer and scale its colors according to the values in   the layer divided by 5500 and multiplied by 255
  tmap::tm_rgb(r = 4, g = 3, b = 2) +      
  # Use a custom RGB color scheme with red = 4, green = 3, and blue = 2
  tmap::tm_scale_bar(position = c("left", "bottom"))  
  # Add a scale bar to the map in the bottom-left corner

# Create another tmap object (map_2) with a raster layer (s2_rl[[4]]) overlaid 
map_2 = tmap::tm_shape(s2_rl[[4]]) + 
  # Set the raster layer as the main layer
  tmap::tm_raster(alpha = 0.7,        # Set the opacity of the raster layer to 0.7
                  palette = "YlGn",   # Use a color palette called "YlGn"
                  legend.show = TRUE) # Display a legend for the raster layer

# Combine the two tmap objects (map_1 and map_2) into a single layout
tmap_arrange(map_1, map_2)
```

Or alternatively use the functions of the raster package such as [`plotRGB`](https://www.rdocumentation.org/packages/raster/versions/3.6-14/topics/plotRGB)

```{r plot-static-raster}
raster::plotRGB(s2_rl,2 , 3, 4, stretch="lin", scale=5000)
```

Now you know how to download, read in and visualize data. In the next part of the notebook, we will see how to further explore this data and finally classify this data into information.

## Step 5: Explore the data

When using the raster::click function we can explore some of the pixel values. However, you have to make sure the markdown is plotted in the console and not inline in the notebook. This implies clicking on the preferences wheel above and setting the Chunk Output in console:

![](Rimages/RChunk2.png){width="237"}

Once this is done you can visualize the image and use an interactive clicking function to identify the pixels that we want to extract spectral signatures for.

```{r}
# Plot the data
dev.new() # In RStudio, the raster::click()/terra::click() functions may result in an offset between the clicked point and the recorded one.
raster::plotRGB(s2_rl, 2, 3, 4, stretch="lin", scale=5000)
# Change plotting parameters to better see the points and numbers generated from clicking
par(col="red", cex=3)

# Use the 'click' function
ct = raster::click(s2_rl, id=T, xy=T, cell=T, type="p", pch=16, col="magenta", col.lab="red")
```

Once you have clicked your five points, press the `ESC` key to save your clicked points and close the function before moving on to the next step. If you make a mistake in the step, run the `chunck` again to start over.

Here we see the output of our clicking:

```{r show-spectra-data}
head(ct)
```

and subsequently we can subset it to get only the pixel values. From here on it might be better to switch back to `Chunk Output Inline`

```{r}
# Select the spectral data (columns 3 to 15) from the data frame called ct and assign it to a new data frame called dat
dat = ct[,3:15]

# Create a vector of wavelengths that correspond to the columns of dat and include the id column
wavelengths=c("id",443,490,560,665,705,740,783,835,865,945,1610,2190)

# Rename the columns of dat with the values of the wavelengths vector
colnames(dat)=wavelengths

# Print the structure of dat, which includes information on the data types and column names
str(dat)

# Reshape the data frame from wide to long format using melt()
# Use "id" column as the identifier variable and create a new column called "Reflectance" for the values
Pixel.melt <- reshape2::melt(dat, id.vars = "id", value.name ="Reflectance")

# Convert the "variable" column (which contains the wavelengths) to numeric and assign it to a new column called "wl"
Pixel.melt$wl <- as.numeric(as.character(Pixel.melt$variable))

# Convert the "id" column to a factor and assign it to a new column called "idf"
Pixel.melt$idf <- as.factor(Pixel.melt$id)

# Print the structure of Pixel.melt, which includes information on the data types and column names
str(Pixel.melt)

# Create a ggplot object with no data or mappings specified
ggplot() +

     # Add a line layer using the data from Pixel.melt, mapping wl to x, Reflectance to y, and idf to color
     # Set the line width to 1
     geom_line(data = Pixel.melt, 
               mapping = aes(x=wl, y=Reflectance, color=idf), 
               lwd=1)+

     # Add a point layer using the data from Pixel.melt, mapping wl to x, Reflectance to y, and idf to color
     # Set the point shape to a filled circle (pch=20) and the size to 5
     geom_point(data = Pixel.melt, 
                mapping = aes(x=wl, y=Reflectance, color=idf), 
                pch=20, 
                size=5)+

     # Add a legend for the color mapping with the label "Pixel id"
     labs(color = "Pixel id")+

     # Add a plot title "Land cover spectral signatures"
     ggtitle("Land cover spectral signatures")+

     # Center the plot title and set the font size to 20
     theme(plot.title = element_text(hjust = 0.5, size=20))+

     # Add an x-axis label "Wavelength"
     xlab("Wavelength")

```

Alternatively, we can translate the entire Sentinel-2 image, which is a `raster` data set, into a `data.table` to plot the different band histograms.

```{r}
# Convert the Sentinel-2 raster image s2_rl to a data table with columns: x, y, and the pixel values (one column for each band)
dat_all <- data.table::as.data.table(rasterToPoints(s2_rl))

# Add a column with the id
dat_all = dat_all <- dat_all %>% mutate(id = row_number())

# Select the id (column 15) and spectral data (columns 3 to 14) from dat_all
dat_all = dat_all[,c(3:15)]

# Create a vector of wavelengths that correspond to the columns of dat_all and include the id column
wavelengths=c(443,490,560,665,705,740,783,835,865,945,1610,2190,"id")

# Rename the columns of dat_all with the values of the wavelengths vector
colnames(dat_all)=wavelengths

# Remove any rows with missing values from dat_all
dt = na.omit(dat_all)
```

Now we are going to make a ridgeline plot which creates a visualization of the density of the different band distributions along a common axis, using a series of vertically stacked curves. This visualization allows for easy comparison of distributions, as well as visualization of the distributional shape. Compared to histograms, ridgeline plots provide a more detailed view of the distribution of each variable by showing the density of the values in addition to their frequency. This can be especially useful when comparing multiple distributions, as ridgeline plots can help to identify differences in the shapes of the distributions that might not be apparent in histograms. Additionally, the stacked nature of ridgeline plots makes it easy to compare the distribution of a variable across different groups, such as different values of a categorical variable

```{r}
# Convert the wide format of 'dat_all' into a long format using the 'melt()' function
# 'id.vars = "id"' specifies that the 'id' column should be used as the identifier variable,
# 'value.name = "Reflectance"' specifies that the melted column should be named 'Reflectance',
dat_mlt = reshape2::melt(dat_all, id.vars = "id", value.name ="Reflectance")

# Create a ridgeline plot of the 'Reflectance' values for each wavelength in the 'dat_mlt' data frame
# using the 'ridgeline()' function from the 'ggridges' package.
# The 'Reflectance' values are plotted on the x-axis, and the wavelength groups are plotted on the y-axis.
# The resulting plot is displayed in the output.
ggplot(dat_mlt, aes(x=Reflectance, y=variable)) + geom_density_ridges()
```

## Step 6: Prepare data for classification

### Download training data

--\> GEE code editor

Now save it to our hard disk and re-read it.

```{r}
# Read raster from computer
lc_r = raster::raster('lc_r.tif')
```

### Sample training data

So now, we have both our training data `X` (i.e., spectral values per pixel) and `Y` (i.e., class per pixel) downloaded to our computer and from that we need to create a training and testing set for our supervised classifier. To do so, we are first going to sample our `Y` into a random set of samples and subsequently get the corresponding `X` for each pixel.

```{r}
set.seed(321) # set the seed for reproducibility

# Convert the raster object 'lc_r' into a spatial points data frame using the 'rasterToPoints()' function,
# and save the resulting data table to the variable 'poly_dt' as a data table using 'as.data.table()'.
poly_dt <- data.table::as.data.table(rasterToPoints(lc_r))

# Select a random sample of 10,000 points from the 'poly_dt' data table using the 'sample()' function,
# and save the resulting data table back to the variable 'poly_dt'.
poly_dt = poly_dt[sample(1:nrow(poly_dt), 10000), ]

# Rename the column 'lc_r' to 'id_cls' in the 'poly_dt' data table using the 'setnames()' function.
data.table::setnames(poly_dt, old = "X2021_Map", new = "id_cls")

# Create a new spatial points data frame 'points' using the 'SpatialPointsDataFrame()' function,
# with the x and y coordinates taken from the 'poly_dt' data table and the attribute data and projection taken from the same table.
points <- sp::SpatialPointsDataFrame(coords = poly_dt[, .(x, y)],
                                     data = poly_dt,
                                     proj4string = lc_r@crs)

# Print the first few rows of the 'points' data frame to the console using the 'head()' function.
# The output actually shows for every point the x,y coordinates and corresponding class.
head(points)
```

Once we have the sample of `Y` points, we can start to sample the data from `X` at the location of the `Y` points

```{r}

# Extract the attribute data from the spatial points data frame 'points' in the raster object 's2_rl' using the 'extract()' function,
# and save the resulting data as a data table using the 'as.data.table()' function.
dt <- s2_rl %>% 
      extract(y = points) %>% 
      as.data.table() 

# Add a new column to the 'dt' data table called 'class', which is based on the 'id_cls' attribute of the original spatial points data frame.
# The 'make.names()' function is used to ensure that the factor levels are valid column names.
dt[, class := make.names(factor(points@data$id_cls))]

# Print the first few rows of the 'dt' data table to the console using the 'head()' function.
head(dt)
```

So this resulting table `dt` gives us for every sample the spectral reflectance and corresponding class value. Since some pixels might still contain `NA` values that might mess up our classifiers, we need to remove the rows that contain `NA`'s.

```{r}
# Print the dimension before removing NA
dim(dt)
# Remove NA rows
dt <- na.omit(dt)
# Print the dimension after removing NA
print(dim(dt))

table(dt$class)
```

We can now calculate the mean reflectance per class

```{r}
# calculate the mean for each group and each variable
df_mean <- dt %>%
  group_by(class) %>%
  summarize_all(mean)

# view the resulting data frame
df_mean
```

#### Q1: Reflectance per class

Now as task for you is to plot the reflectance in function of the wavelength for every class and compare the spectral profiles

## Step 7: Classification introduction

After downloading and preparing the data we can start with the actual classification. Therefore, we are first performing a stratified random split of the data into training and testing sets. Splitting the data into training and testing sets is a crucial step in building and evaluating machine learning models. The purpose of splitting the data is to evaluate the performance of the model on unseen data. The model is trained and fine-tuned on the training set and then evaluated on the testing set. This test set provides an estimate of how well the model can generalize to new, unseen data. If the model performs well on the testing set, it suggests that it has learned meaningful patterns in the data and can generalize to new data. On the other hand, if the model performs poorly on the testing set, it suggests that it has overfit to the training data and is not able to generalize to new data.

```{r}
set.seed(321) # set the seed for reproducibility

# create a stratified random split of the data into training and testing sets
idx_train <- caret::createDataPartition(dt$class, # variable to stratify by 
                                        p = 0.7, # proportion of data to use for training
                                        list = FALSE) # output as vector, not list

# subset the original data to create the training and testing sets
dt_train <- dt[idx_train] # subset the training set
dt_test <- dt[-idx_train] # subset the testing set

# create a frequency table of the class variable in the training set
print(table(dt_train$class))
print(table(dt_test$class))

# remove classes with too little samples (only needed when the Random Forest model gives a lot of warnings, see below)
# e.g. dt_train <- dt_train[!dt_train$class == 'X80', ]
```

The training dataset is used for carrying cross-validation (CV) and grid search for model tuning. Once the optimal/best parameters were found a final model is fit to the entire training dataset using those findings. Further we can check how these final models behave on unseen data (the testing dataset). Therefore, the CV indices need to match when comparing multiple models, so to get a fair comparison. Therefore, folds will pass to trainControl argument for each type of model.

```{r}
# create cross-validation folds (splits the data into n random groups)
n_folds <- 10 # number of folds
folds <- caret::createFolds(1:nrow(dt_train), k = n_folds) # create n_folds folds of the data
```

The `n_folds` variable determines the number of folds to use in the cross-validation process. The `folds <- createFolds(1:nrow(dt_train), k = n_folds)` line creates `n_folds` random groups (or "folds") of the data for cross-validation purposes.

Subsequently, we can set the seed for each iteration of the folds in a `seeds` variable that will store the random number seeds for each resampling iteration. This ensures that each resampling iteration has a unique random seed.

```{r}
# Set the seed at each resampling iteration. Useful when running CV in parallel.
seeds <- vector(mode = "list", length = n_folds + 1) # create empty list for seeds, +1 for the final model
for(i in 1:n_folds) seeds[[i]] <- sample.int(1000, n_folds) # set seeds for each fold
seeds[n_folds + 1] <- sample.int(1000, 1) # set seed for final model
```

### Control objects

Subsequently, we are going to use the `caret` package to train our classifiers. The `caret` (Classification And REgression Training) package in R is a powerful tool for building and evaluating machine learning models. Some advantages of using the caret package are:

-   Streamlined workflow: The caret package provides a consistent, streamlined workflow for building and evaluating machine learning models, with a consistent interface for many different modeling algorithms. As such it allows to test completely different classification algorithms (which were designed with a different syntax) by using a very similar syntax.

-   Cross-validation: The caret package makes it easy to perform cross-validation, a key step in model evaluation and hyper-parameter tuning, with support for different types of cross-validation such as k-fold, repeated k-fold, and leave-one-out cross-validation.

-   Hyperparameter tuning: The caret package provides tools for tuning hyper-parameters of machine learning models, which can greatly improve model performance. The train() function in caret can automatically search through a grid of hyper-parameter values and select the best-performing combination.

Here we will start by setting up a control object for a machine learning model using cross-validation. Such a control object is important for being able to compare the different type of models. Therefore, the `trainControl()` function is used to create the control object, and it includes several arguments:

-   `summaryFunction`: specifies a summary function to be used in evaluating the model's performance. Here, multiClassSummary is used for multi-class classification problems.
-   `method`: specifies the resampling method to be used. Here, cv indicates that k-fold cross-validation will be used.
-   `number`: specifies the number of folds to be used in cross-validation. The value of n_folds is passed to this argument.
-   `search`: specifies the search method to be used for hyper-parameter tuning. Here, grid indicates that a grid search will be used.
-   `classProbs`: specifies whether to include class probabilities in the model output. It is set to TRUE, but a warning will be issued because it is not implemented for SVM models.
-   `savePredictions`: specifies whether to save the predicted values for each fold. It is set to TRUE.
-   `index`: specifies the resampling indexes to be used. The folds variable is passed to this argument.
-   `seeds`: specifies the random seed values to be used. The seeds variable is passed to this argument.

```{r set-trainControl}
ctrl <- trainControl(summaryFunction = multiClassSummary,
                     method = "cv",
                     number = n_folds,
                     search = "grid",
                     classProbs = TRUE, # not implemented for SVM; will just get a warning
                     savePredictions = TRUE,
                     index = folds,
                     seeds = seeds)
```

Once we have set up this control object for cross-validation we can start training the model.

## Step 8 : Random Forest classifier

We will start by training a Random Forest classifier using the `caret` package.

```{r}
# train a random forest model using caret::train()
model_rf <- caret::train(class ~ . , # class ~ .: specifies the formula for the model, where class is the target variable and . indicates that all other variables in the dt_train dataset should be used as predictors.
                         method = "rf", # method: specifies the machine learning method to be used. Here, rf indicates that a random forest model will be used.
                         data = dt_train, # data: specifies the data used for training the model.
                         importance = TRUE, #  specifies whether to calculate variable importance.
                         tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8)), # specifies the hyperparameters to be tuned during cross-validation. Here, mtry (the number of variables to consider at each split) is varied across a range of values.
                         trControl = ctrl) # specifies the control object to be used for cross-validation. 

# save the trained model as an RDS file
saveRDS(model_rf, file = "./cache/model_rf.rds")

# Load the model again
# model_rf = readRDS('./cache/model_rf.rds')

# Getting a lot of warnings or NA values? remove classes with too little samples ()
# dt_train <- dt_train[!dt_train$class == 'X80', ]
# dt_test <- dt_test[!dt_test$class == 'X80', ]
```

Tuning here was done via the `mtry` argument, which can vary from 2 up to total number of predictors (bands) used (here, 12). So, the optimization was done via cross validation and grid search (here by grid I refer to `tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8))`).

```{r}
model_rf$times$everything # Print the total computation time
```

```{r}
plot(model_rf) # Plot the tuning results
```

#### Confusion matrix

Now we can compute the confusion matrix and associated statistics using the test data. A confusion matrix indicates how "confused" the model is between the given classes and highlights instances in which one class is confused for another. The main (first) diagonal of the matrix shows the cases when the model is "correct". The next cell creates a confusion matrix `cm_rf` using the `confusionMatrix()` function from the caret package. The function takes as input two arguments:

-   `data`: a vector of predicted class labels, obtained by applying the trained `model_rf` on the test dataset `dt_test` using the `predict()` function.
-   `factor(dt_test$class,levels=...)`: the true class labels from the `dt_test` dataset, converted to a factor variable, where we can either hardcode the potential levels (e.g. `c('X10','X30','X40','X50','X60','X80')` or get them from the training data (e.g., `levels(as.factor(dt_train$class))`))

```{r}
cm_rf <- caret::confusionMatrix(data = predict(model_rf, newdata = dt_test),factor(dt_test$class,levels=c('X10','X30','X40','X50','X60','X80'))) 
cm_rf <- caret::confusionMatrix(data = predict(model_rf, newdata = dt_test),factor(dt_test$class,levels=levels(as.factor(dt_train$class)))) 
cm_rf
```

The output is a confusion matrix object that summarizes the number of true positives, true negatives, false positives, and false negatives for each class label. Moreover it gives: - Accuracy: - Sensitivity (Recall) refers to the true positive rate (model correctly detects the class); - Specificity is the true negative rate (model correctly rejects the class) See also this [Wikipedia](https://en.wikipedia.org/wiki/Confusion_matrix) or `help(confusionMatrix)` for more details on confusion matrix terminology.

#### Predictor importance

Finally, we can determine the predictor importance of each `X` feature on the classification. Determining the importance of predictors when fitting a machine learning model provides valuable insights into the factors that are most relevant for predicting the target variable. This information can be used to improve the accuracy of the model, reduce the number of features needed for prediction, and gain a better understanding of the underlying data.

There are several ways to get importance values but in this notebook we will use the `caret::varImp()` method as it is a generic method, so will work also for other models. The simple rule with importance values is that higher values mean the variables are more important.

```{r}
# Compute variable importance scores for model_rf using varImp() function
caret::varImp(model_rf)$importance %>%
  # Convert data frame to matrix
  as.matrix() %>%
  # Create heatmap visualization using plot_ly() function
  plotly::plot_ly(x = colnames(.), y = rownames(.), z = ., type = "heatmap", width = 350, height = 300)
```

The predictor importance shows that some bands are more important than other variables for the classification.

#### Q2: Predictor importance

Compare the predictor importance with your earlier plot of the spectral profiles per class. Can you visualise explain the predictor importance to separate the different classes.

## Step 9: SVM

OK, so we have trained one RF model, but of course we can also train some completely different models. On the [caret help page](https://topepo.github.io/caret/available-models.html) you can find a complete list of model types that can be used in caret and which relevant characteristics are set there.

For example, we can train a [support vector machine model](https://en.wikipedia.org/wiki/Support_vector_machine) (SVM). In this case we are going to fine tune two different SVM models with different values of the cost parameter and the Loss function to be used in the SVM model. The cost parameter in SVM determines the penalty for misclassification of training samples. It controls the balance between achieving a low training error and a low testing error. The loss function in SVM determines the type of error measure that the SVM model will optimize during training.

```{r}
# Grid of tuning parameters for SVM model
svm_grid <- expand.grid(cost = c(0.2, 0.5, 1), # Different values of cost parameter
                        Loss = c("L1", "L2")) # Different values of loss function

# Train an SVM model using caret package
model_svm <- caret::train(class ~ . , method = "svmLinear3", data = dt_train,
                         importance = TRUE, #  specifies whether to calculate variable importance.
                         tuneGrid = svm_grid,
                         trControl = ctrl)
# Notice that  I didn’t bother to make another ctrl object for SVM, so it works to recycle the one used for the random forests models with ignoring the warning: Class probabilities were requested for a model that does not implement them.

# Save the trained SVM model object to a file using saveRDS() function
saveRDS(model_svm, file = "./cache/model_svm.rds")

# Load the model again
# model_svm = readRDS('./cache/model_svm.rds')
```

Now we can again see which model performs best.

```{r}
plot(model_svm) # Plot the tuning results
```

Or get the confusion matrix

```{r}
# The confusion matrix using the test dataset
cm_svm <- confusionMatrix(data = predict(model_svm, newdata = dt_test), factor(dt_test$class,levels=levels(as.factor(dt_train$class)))) 
cm_svm
```

Here you notice that the SVM models performs less well than the RF model.

## Step 10: Neural Network

Finally, we can test a third different type of ML classifier, namely a Neural Network with one hidden layer where we will test how many hidden units and decay functions fit best.

```{r}
# define a grid of tuning parameters for the nnet model
nnet_grid <- expand.grid(size = c(5, 10, 15),
                         decay = c(0.001, 0.01, 0.1))

# train the nnet model using the defined grid of tuning parameters, with parallel processing
model_nnet <- caret::train(class ~ ., method = 'nnet', data = dt_train,
                           maxit = 1000, # set high enough so to be sure that it converges
                           tuneGrid = nnet_grid,
                           trControl = ctrl)

# save the trained nnet model as an RDS file
saveRDS(model_nnet, file = "./cache/model_nnet.rds")

# Load the model again
# model_nnet = readRDS('./cache/model_nnet.rds')
```

Now we can again see which model performs best.

```{r}
plot(model_nnet) # Plot the tuning results
```

Or get the confusion matrix

```{r}
# The confusion matrix using the test dataset
cm_nnet <- confusionMatrix(data = predict(model_nnet, newdata = dt_test), factor(dt_test$class,levels=levels(as.factor(dt_train$class)))) 
cm_nnet
```

Here you notice that the NN model performs slightly less than the RF model, but better than the SVM model.

## Step 11: Compare models performance

Once we have trained and fine-tuned the different models, we can compare the models using the resamples() function. We can only do this as long as the train indices of the observations match (which we made sure they do by setting specific seeds). Here we compare the results obtained via cross validation on the train data set when we tuned the models.

```{r}
# Create model_list
model_list <- list(rf = model_rf, svm = model_svm, nnet = model_nnet)
# Pass model_list to resamples()
resamples <- caret::resamples(model_list)
```

In general, the model with the higher median accuracy is the "winner", as well as a smaller range between min and max accuracy.

```{r}
# All metrics with boxplots
bwplot(resamples)
```

Here we see that in general the RF model performs best for our data set and tuning, so we will preferably use that model for later purposes if we want to continue with this. Remember however that there is typically not one best method for all data sets. So when you are faced with a new data set, it might be important to again compare different models and different hyper-parameter settings.

Finally, we can also plot the output of the different models. Therefore, we first need to apply the trained models to every pixel in the image

```{r}
predict_rf <- raster::predict(object = s2_rl, model = model_rf, type = 'raw')
predict_svm <- raster::predict(object = s2_rl, model = model_svm, type = 'raw')
predict_nnet <- raster::predict(object = s2_rl, model = model_nnet, type = 'raw')
```

These lines use the trained machine learning models to predict the class of each pixel in a raster object. Specifically, the `raster::predict()` function is called for each model, passing the trained model and the raster object `s2_rl` as input. The type = 'raw' argument indicates that we want to get the raw predicted values (not the class probabilities). The predicted values for each model are stored in separate variables `predict_rf`, `predict_svm`, and `predict_nnet`.

```{r}
leafsync::sync(viewRGB(s2_rl, r = 4, g = 3, b = 2)+mapView(lc_r, maxpixels = 1242110), 
               mapView(predict_rf, maxpixels = 1242110), 
               mapView(predict_svm, maxpixels = 1242110), 
               mapView(predict_nnet, maxpixels = 1242110))
```

Now you can compare the original map of WorldCover with our own predictions.

#### Q3: Model comparison

Compare the different models and discuss their performance in terms of overall accuracy and how well they spatially capture the land cover patterns.

## Step 12: GCLM

Up to now we have been classifying our data based on per-pixel values only without taking the spatial context into account.

In the next example we are going to calculate the Gray Level Co-Occurrence Matrix (Haralick et al. 1973) to calculate the texture of the image. GLCM is a powerful image feature for image analysis. The `glcm` package provides a easy-to-use function to calculate such textural features for RasterLayer objects in R. More information on GCLM can be found [here](https://prism.ucalgary.ca/server/api/core/bitstreams/8f9de234-cc94-401d-b701-f08ceee6cfdf/content).

We can calculate the GLCM textures in one direction:

```{r}
# Calculate gray level co-occurrence matrix (GLCM) for the fourth band of the image
# represented by the variable "s2_rl"
rglcm <- glcm(s2_rl[[4]], 
              window = c(9,9), # The GLCM is calculated using a 9x9 moving window 
              shift = c(1,1),  # The window is shifted one pixel in both x and y directions
              statistics = c("mean", "variance", "homogeneity", "contrast", 
                             "dissimilarity", "entropy", "second_moment")) # The GLCM is calculated for several statistical measures including mean, variance, homogeneity, contrast, dissimilarity, entropy, and second moment.

# Plot the GLCM
plot(rglcm)
```

or we can calculate rotation-invariant texture features. This means that the textural features are calculated in all 4 directions (0°, 45°, 90° and 135°) and then combined to one rotation-invariant texture. The key for this is the shift parameter.

```{r}
# Calculate gray level co-occurrence matrix (GLCM) for the fourth band of the image
# represented by the variable "s2_rl". This GLCM is calculated using a 9x9 moving window
# with a shift of 1 pixel in four different directions (right, diagonal-upright,
# upward, diagonal-upleft).
rglcm1 <- glcm(s2_rl[[4]], 
              window = c(9,9), 
              shift = list(c(0,1), c(1,1), c(1,0), c(1,-1)), 
              statistics = c("mean", "variance", "homogeneity", "contrast", 
                             "dissimilarity", "entropy", "second_moment")
              )

# Plot the GLCM
plot(rglcm1)
```

#### Q4: GLCM in model

Add the GLCM output as additional input to the classifier to check if the spatial context improves the classification performance.

## Step 13: Unmixing

In the final step we are also going to implement a unmixing example.

The "mesma" unmixing algorithm is a spectral unmixing method that can be used to decompose a multispectral image into its constituent spectral endmembers and their corresponding fractional abundances in each pixel. This algorithm is implemented in the R package `RSToolbox`.

The "mesma" algorithm is based on the assumption that the spectral response of a pixel can be modeled as a linear combination of the spectral responses of the endmembers present in the pixel. In other words, the algorithm assumes that each pixel can be represented as a weighted sum of the spectra of the endmembers, where the weights correspond to the fractional abundances of the endmembers in the pixel.

The "mesma" algorithm uses a constrained least squares approach to estimate the fractional abundances of the endmembers in each pixel.

Therefore, we first have to first estimate the endmember spectra by selecting a subset of the pixels in the image that have pure spectra (i.e., spectra that are representative of a single endmember). In this example, we are going to do that by selecting 4 pure pixels in the image by using the earlier explained click function.

```{r}
# Plot the data
dev.new() # To avoid offsets between clicked and recorded points
plotRGB(s2_rl,2,3,4,stretch="lin",scale=5000)
# change plotting parameters to better see the points and numbers generated from clicking
par(col="red", cex=3)

# use the 'click' function
ct = click(s2_rl, id=T, xy=T, cell=T, type="p", pch=16, col="magenta", col.lab="red")
```

In this case I selected a `forest`, `grass` and `soil` pixel. The algorithm then uses these endmember spectra to estimate the fractional abundances of the endmembers in each pixel, subject to the constraint that the fractional abundances are non-negative and sum to one.

```{r}
# Create the endmember spectra
em = ct[4:15]
rownames(em) = c('forest','soil','grass')

# Do the unmixing
probs <- RStoolbox::mesma(s2_rl, em, method = "NNLS", verbose=TRUE)
```

Subsequently, we can visualize the output

```{r}
sync(viewRGB(s2_rl, r = 4, g = 3, b = 2) + 
     mapView(lc_r, maxpixels = 1242110), mapView(probs$forest, maxpixels = 1242110) +
     mapView(probs$soil, maxpixels = 1242110) + 
     mapView(probs$grass, maxpixels = 1242110), mapView(probs$RMSE, maxpixels = 1242110))
```

## Step 14: Do it yourself

-   Download a Sentinel-2 image over your home location that is roughly 5x5km centered on your home location
-   Download the corresponding WorldCover map over your home location
-   Visualize both the satellite image and land cover map
-   Optimize three different land cover classifiers over your home location and assess their accuracy
-   Assess the feature importance of the input data for accurate classification
-   Repeat the classification but by using only the red, green and blue band
-   Compare the full models with the simplified RGB model
-   Perform an unmixing over your home area and compare that with the land cover classes

## Done