You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The interpretation of conventional residuals for generalized linear (mixed) and other hierarchical statistical models is often problematic. As an example, here the result of conventional Deviance, Pearson and raw residuals for two Poisson GLMMs, one that is lacking a quadratic effect, and one that fits the data perfectly. Could you tell which is the correct model?
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
111
110
```
112
111
113
-
Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate
112
+
Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate
114
113
115
-
```{r}
114
+
```{r, fig.show='hide'}
116
115
testDispersion(fittedModel)
117
116
```
118
117
119
118
In this case, the randomized quantile residuals are calculated on the fly. However, residual calculation can take a while, and would have to be repeated by every other test you call. It is therefore highly recommended to first calculate the residuals once, using the simulateResiduals() function.
The function implements the algorithm discussed above, i.e. it a) creates n new synthetic datasets by simulating from the fitted model, b) calculates the cumulative distribution of simulated values for each observed value, and c) calculates the quantile (residual) value that corresponds to the observed value. Those quantiles are called "scaled residuals" in DHARMa. For example, a scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower. A value of 0.99 would mean that nearly all simulated data are lower than the observed value. The minimum/maximum values for the residuals are 0 and 1. The function returns an object of class DHARMa, containing the simulations and the scaled residuals, which can then be passed on to all other plots and test functions. When specifying the optional argument plot = T, the standard DHARMa residual plot is displayed directly, which will be discussed below. The calculated residuals can be accesed via
126
125
127
-
```{r,eval = F}
126
+
```{r,results = "hide"}
128
127
residuals(simulationOutput)
129
128
```
130
129
@@ -290,8 +289,8 @@ A few general rules of thumb
290
289
This this is how **overdispersion** looks like in the DHARMa residuals
Here, we get too many residuals around 0.5, which means that we are not getting as many residuals as we would expect in the tail of the distribution than expected from the fitted model.
The exact p-values for the quantile lines in the plot can be displayed via
443
433
444
-
```{r}
434
+
```{r, eval = F}
445
435
testQuantiles(simulationOutput)
446
-
plotResiduals(simulationOutput)
447
436
```
448
437
449
-
450
438
Adding a simple overdispersion correction will try to find a compromise between the different levels of dispersion in the model. The qq plot looks better now, but there is still a pattern in the residuals
The function testTemporalAutocorrelation performs a Durbin-Watson test from the package lmtest on the uniform residuals to test for temporal autocorrelation in the residuals, and additionally plots the residuals against time.
508
496
509
-
The function also has an option to perform the test against randomized time (H0) - the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).
510
-
511
-
```{r, fig.width=4, fig.height=4}
497
+
```{r}
512
498
testTemporalAutocorrelation(simulationOutput = simulationOutput, time = testData$time)
If no time varialbe is provided, the function uses a random time (H0) - apart from testing, the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).
502
+
516
503
Note general caveats mentioned about the DW test in the help of testTemporalAutocorrelation(). In general, as for spatial autocorrelation, it is difficult to specify one test, because temporal and spatial autocorrelation can appear in many flavors, short-scale and long scale, homogenous or not, and so on. The pre-defined functions in DHARMa are a starting point, but they are not something you should rely on blindly.
517
504
518
505
## Spatial autocorrelation
@@ -536,7 +523,7 @@ An additional test against randomized space (H0) can be performed, for the same
536
523
537
524
```{r, fig.width=4.5, fig.height=4.5}
538
525
testSpatialAutocorrelation(simulationOutput = simulationOutput, x = testData$x, y= testData$y)
# testSpatialAutocorrelation(simulationOutput = simulationOutput) # again, this uses random x,y
540
527
```
541
528
542
529
The usual caveats for Moran.I apply, in particular that it may miss non-local and heterogeneous (non-stationary) spatial autocorrelation. The former should be better detectable visually in the spatial plot, or via regressions on the pattern.
0 commit comments