Skip to content

Commit 849c354

Browse files
committed
vignette updates
1 parent c8682ee commit 849c354

File tree

1 file changed

+25
-39
lines changed

1 file changed

+25
-39
lines changed

DHARMa/vignettes/DHARMa.Rmd

+25-39
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,15 @@ editor_options:
1414
chunk_output_type: console
1515
---
1616

17+
```{r global_options, include=FALSE}
18+
knitr::opts_chunk$set(fig.width=8.5, fig.height=5.5, fig.align='center', warning=FALSE, message=FALSE, cache = T)
19+
```
20+
1721
```{r, echo = F, message = F}
1822
library(DHARMa)
1923
set.seed(123)
2024
```
2125

22-
23-
```{r global_options, include=FALSE}
24-
knitr::opts_chunk$set(fig.width=8.5, fig.height=5.5, fig.align='center', warning=FALSE, message=FALSE, cache = F)
25-
```
26-
2726
# Motivation
2827

2928
The interpretation of conventional residuals for generalized linear (mixed) and other hierarchical statistical models is often problematic. As an example, here the result of conventional Deviance, Pearson and raw residuals for two Poisson GLMMs, one that is lacking a quadratic effect, and one that fits the data perfectly. Could you tell which is the correct model?
@@ -110,21 +109,21 @@ testData = createData(sampleSize = 250)
110109
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
111110
```
112111

113-
Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate
112+
Most functions in DHARMa could be calculated directly on the fitted model. So, for example, if you are only interested in testing dispersion, you could calculate
114113

115-
```{r}
114+
```{r, fig.show='hide'}
116115
testDispersion(fittedModel)
117116
```
118117

119118
In this case, the randomized quantile residuals are calculated on the fly. However, residual calculation can take a while, and would have to be repeated by every other test you call. It is therefore highly recommended to first calculate the residuals once, using the simulateResiduals() function.
120119

121-
```{r}
120+
```{r, fig.show='hide'}
122121
simulationOutput <- simulateResiduals(fittedModel = fittedModel, plot = T)
123122
```
124123

125124
The function implements the algorithm discussed above, i.e. it a) creates n new synthetic datasets by simulating from the fitted model, b) calculates the cumulative distribution of simulated values for each observed value, and c) calculates the quantile (residual) value that corresponds to the observed value. Those quantiles are called "scaled residuals" in DHARMa. For example, a scaled residual value of 0.5 means that half of the simulated data are higher than the observed value, and half of them lower. A value of 0.99 would mean that nearly all simulated data are lower than the observed value. The minimum/maximum values for the residuals are 0 and 1. The function returns an object of class DHARMa, containing the simulations and the scaled residuals, which can then be passed on to all other plots and test functions. When specifying the optional argument plot = T, the standard DHARMa residual plot is displayed directly, which will be discussed below. The calculated residuals can be accesed via
126125

127-
```{r, eval = F}
126+
```{r, results = "hide"}
128127
residuals(simulationOutput)
129128
```
130129

@@ -290,8 +289,8 @@ A few general rules of thumb
290289
This this is how **overdispersion** looks like in the DHARMa residuals
291290

292291
```{r}
293-
testData = createData(sampleSize = 500, overdispersion = 2, family = poisson())
294-
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
292+
testData = createData(sampleSize = 200, overdispersion = 1.5, family = poisson())
293+
fittedModel <- glm(observedResponse ~ Environment1 , family = "poisson", data = testData)
295294
296295
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
297296
plot(simulationOutput)
@@ -307,19 +306,10 @@ This is an example of underdispersion
307306
testData = createData(sampleSize = 500, intercept=0, fixedEffects = 2, overdispersion = 0, family = poisson(), roundPoissonVariance = 0.001, randomEffectVariance = 0)
308307
fittedModel <- glmer(observedResponse ~ Environment1 + (1|group) , family = "poisson", data = testData)
309308
310-
summary(fittedModel)
311-
312-
# plotConventionalResiduals(fittedModel)
313-
314309
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
315310
plot(simulationOutput)
316311
```
317312

318-
```{r fig.width=4.5, fig.height=4.5}
319-
testUniformity(simulationOutput = simulationOutput)
320-
```
321-
322-
323313
Here, we get too many residuals around 0.5, which means that we are not getting as many residuals as we would expect in the tail of the distribution than expected from the fitted model.
324314

325315

@@ -439,14 +429,12 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel)
439429
plot(simulationOutput)
440430
```
441431

442-
The
432+
The exact p-values for the quantile lines in the plot can be displayed via
443433

444-
```{r}
434+
```{r, eval = F}
445435
testQuantiles(simulationOutput)
446-
plotResiduals(simulationOutput)
447436
```
448437

449-
450438
Adding a simple overdispersion correction will try to find a compromise between the different levels of dispersion in the model. The qq plot looks better now, but there is still a pattern in the residuals
451439

452440
```{r}
@@ -479,7 +467,7 @@ fittedModel <- glmer(observedResponse ~ Environment1 + Environment2 + (1|group)
479467
simulationOutput <- simulateResiduals(fittedModel = fittedModel)
480468
# plotConventionalResiduals(fittedModel)
481469
plot(simulationOutput, quantreg = T)
482-
testUniformity(simulationOutput = simulationOutput)
470+
# testUniformity(simulationOutput = simulationOutput)
483471
```
484472

485473
It is difficult to see that there is a problem at all in the general plot, but it becomes clear if we plot against the environment
@@ -506,13 +494,12 @@ simulationOutput <- simulateResiduals(fittedModel = fittedModel)
506494

507495
The function testTemporalAutocorrelation performs a Durbin-Watson test from the package lmtest on the uniform residuals to test for temporal autocorrelation in the residuals, and additionally plots the residuals against time.
508496

509-
The function also has an option to perform the test against randomized time (H0) - the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).
510-
511-
```{r, fig.width=4, fig.height=4}
497+
```{r}
512498
testTemporalAutocorrelation(simulationOutput = simulationOutput, time = testData$time)
513-
testTemporalAutocorrelation(simulationOutput = simulationOutput)
514499
```
515500

501+
If no time varialbe is provided, the function uses a random time (H0) - apart from testing, the sense of this is to be able to run simulations for testing if the test has correct error rates in the respective situation, i.e. is not oversensitive (too high sensitivity has sometimes been reported for Durbin-Watson).
502+
516503
Note general caveats mentioned about the DW test in the help of testTemporalAutocorrelation(). In general, as for spatial autocorrelation, it is difficult to specify one test, because temporal and spatial autocorrelation can appear in many flavors, short-scale and long scale, homogenous or not, and so on. The pre-defined functions in DHARMa are a starting point, but they are not something you should rely on blindly.
517504

518505
## Spatial autocorrelation
@@ -536,7 +523,7 @@ An additional test against randomized space (H0) can be performed, for the same
536523

537524
```{r, fig.width=4.5, fig.height=4.5}
538525
testSpatialAutocorrelation(simulationOutput = simulationOutput, x = testData$x, y= testData$y)
539-
testSpatialAutocorrelation(simulationOutput = simulationOutput)
526+
# testSpatialAutocorrelation(simulationOutput = simulationOutput) # again, this uses random x,y
540527
```
541528

542529
The usual caveats for Moran.I apply, in particular that it may miss non-local and heterogeneous (non-stationary) spatial autocorrelation. The former should be better detectable visually in the spatial plot, or via regressions on the pattern.
@@ -577,8 +564,9 @@ data$logDensity = log10(data$density.attack+1)
577564
```
578565

579566

580-
```{r, fig.width=4, fig.height=4}
581-
plot(N_parasitized / (N_adult + N_parasitized ) ~ logDensity, xlab = "Density", ylab = "Proportion infected", data = data)
567+
```{r, fig.width=5, fig.height=5}
568+
plot(N_parasitized / (N_adult + N_parasitized ) ~ logDensity,
569+
xlab = "Density", ylab = "Proportion infected", data = data)
582570
```
583571

584572
Let's fit the data with a regular binomial n/k glm
@@ -645,9 +633,9 @@ Somewhat better, but not good. Move to neg binom, to adjust dispersion
645633

646634
```{r}
647635
m3 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), data=Owls , family = nbinom1)
648-
summary(m3)
649636
650637
res <- simulateResiduals(m3, plot = T)
638+
par(mfrow = c(1,3))
651639
plotResiduals(res, Owls$FoodTreatment)
652640
testDispersion(res)
653641
testZeroInflation(res)
@@ -657,10 +645,9 @@ We see underdispersion now. In a model with variable dispersion, this is often t
657645

658646
```{r}
659647
m4 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), ziformula = ~ FoodTreatment + SexParent, data=Owls , family = nbinom1)
660-
summary(m4)
661648
662-
res <- simulateResiduals(m4)
663-
plot(res)
649+
res <- simulateResiduals(m4, plot = T)
650+
par(mfrow = c(1,3))
664651
plotResiduals(res, Owls$FoodTreatment)
665652
testDispersion(res)
666653
testZeroInflation(res)
@@ -670,9 +657,9 @@ This looks a lot better. Trying a slightly different model specification
670657

671658
```{r}
672659
m5 <- glmmTMB(SiblingNegotiation ~ FoodTreatment*SexParent + offset(log(BroodSize)) + (1|Nest), dispformula = ~ FoodTreatment , ziformula = ~ FoodTreatment + SexParent, data=Owls , family = nbinom1)
673-
summary(m5)
674660
675-
simulateResiduals(m5, plot = T)
661+
res <- simulateResiduals(m4, plot = T)
662+
par(mfrow = c(1,3))
676663
plotResiduals(res, Owls$FoodTreatment)
677664
testDispersion(res)
678665
testZeroInflation(res)
@@ -727,7 +714,6 @@ testDispersion(simulationOutput)
727714
728715
simulationOutput = recalculateResiduals(simulationOutput , group = testData$group)
729716
testDispersion(simulationOutput)
730-
731717
```
732718

733719

0 commit comments

Comments
 (0)