Skip to content

Commit 6a01a7d

Browse files
Merge pull request #65 from florianhartig/bugfixes-0.1.6.5
Bugfixes 0.1.6.5
2 parents bfc6297 + 79c892b commit 6a01a7d

26 files changed

+427
-162
lines changed

DHARMa/DESCRIPTION

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
Package: DHARMa
22
Title: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
3-
Version: 0.1.6.4
3+
Version: 0.1.6.5
44
Date: 2018-03-14
55
Authors@R: c(person("Florian", "Hartig", email = "florian.hartig@biologie.uni-regensburg.de", role = c("aut", "cre"), comment = "Theoretical Ecology, University of Regensburg, Regensburg, Germany"))
66
Description: The 'DHARMa' package uses a simulation-based approach to create
7-
readily interpretable scaled (quantile) residuals for fitted generalized linear mixed
8-
models. Currently supported are generalized linear mixed models from 'lme4'
7+
readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed
8+
models. Currently supported are (generalized) linear mixed models from 'lme4'
99
(classes 'lmerMod', 'glmerMod') and 'glmmTMB', generalized additive models ('gam' from 'mgcv'),
1010
'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model
11-
classes. Alternatively, externally created simulations, e.g. posterior predictive simulations
11+
classes. Moreover, externally created simulations, e.g. posterior predictive simulations
1212
from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well.
1313
The resulting residuals are standardized to values between 0 and 1 and can be interpreted
1414
as intuitively as residuals from a linear regression. The package also provides a number of

DHARMa/NAMESPACE

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ export(plotConventionalResiduals)
1212
export(plotQQunif)
1313
export(plotResiduals)
1414
export(plotSimulatedResiduals)
15+
export(recalculateResiduals)
1516
export(runBenchmarks)
1617
export(simulateResiduals)
1718
export(testDispersion)

DHARMa/NEWS

+4-2
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@ DHARMa 0.2.0
66
New features
77

88
- support for glmmTMB https://github.com/florianhartig/DHARMa/issues/16, implemented since https://github.com/florianhartig/DHARMa/releases/tag/v0.1.6.2
9+
- support for grouping of residuals, see https://github.com/florianhartig/DHARMa/issues/22
10+
911

1012
Major changes
1113

1214
- remodeled benchmarks functions in https://github.com/florianhartig/DHARMa/releases/tag/v0.1.6.3
13-
- remodeled dispersion tests, adresses https://github.com/florianhartig/DHARMa/issues/62
15+
- remodeled dispersion testsin https://github.com/florianhartig/DHARMa/releases/tag/v0.1.6.4, adresses https://github.com/florianhartig/DHARMa/issues/62
1416

1517
Minor changes
1618

@@ -19,7 +21,7 @@ Minor changes
1921
Bugfixes
2022

2123
- fixed bug with zeroinflation test for k/n binomial data https://github.com/florianhartig/DHARMa/issues/55
22-
24+
- fixed bug with p-value calculation via ecdf https://github.com/florianhartig/DHARMa/issues/55
2325

2426
DHARMa 0.1.6
2527

DHARMa/R/DHARMa.R

-37
Original file line numberDiff line numberDiff line change
@@ -9,43 +9,6 @@
99
NULL
1010

1111

12-
#' DHARMa standard residual plots
13-
#'
14-
#' This function creates standard plots for the simulated residuals
15-
#' @param x an object with simualted residuals created by \code{\link{simulateResiduals}}
16-
#' @param rank if T (default), the values of pred will be rank transformed. This will usually make patterns easier to spot visually, especially if the distribution of the predictor is skewed.
17-
#' @param ... further options for \code{\link{plotResiduals}}. Consider in particular parameters quantreg, rank and asFactor. xlab, ylab and main cannot be changed when using plotSimulatedResiduals, but can be changed when using plotResiduals.
18-
#' @details The function creates two plots. To the left, a qq-uniform plot to detect deviations from overall uniformity of the residuals (calling \code{\link{plotQQunif}}), and to the right, a plot of residuals against predicted values (calling \code{\link{plotResiduals}}). For a correctly specified model, we would expect
19-
#'
20-
#' a) a straight 1-1 line in the uniform qq-plot -> evidence for an overal uniform (flat) distribution of the residuals
21-
#'
22-
#' b) uniformity of residuals in the vertical direction in the res against predictor plot
23-
#'
24-
#' Deviations of this can be interpreted as for a liner regression. See the vignette for detailed examples.
25-
#'
26-
#' To provide a visual aid in detecting deviations from uniformity in y-direction, the plot of the residuals against the predited values also performs an (optional) quantile regression, which provides 0.25, 0.5 and 0.75 quantile lines across the plots. These lines should be straight, horizontal, and at y-values of 0.25, 0.5 and 0.75. Note, however, that some deviations from this are to be expected by chance, even for a perfect model, especially if the sample size is small. See further comments on this plot, and options, in \code{\link{plotResiduals}}
27-
#'
28-
#' The quantile regression can take some time to calculate, especially for larger datasets. For that reason, quantreg = F can be set to produce a smooth spline instead. This is default for n > 2000.
29-
#'
30-
#' @seealso \code{\link{plotResiduals}}, \code{\link{plotQQunif}}
31-
#' @example inst/examples/plotsHelp.R
32-
#' @import graphics
33-
#' @import utils
34-
#' @export
35-
plot.DHARMa <- function(x, rank = TRUE, ...){
36-
37-
oldpar <- par(mfrow = c(1,2), oma = c(0,1,2,1))
38-
39-
plotQQunif(x)
40-
41-
plotResiduals(pred = x, residuals = NULL, xlab = "Predicted value (rank transformed)", ylab = "Standardized residual", main = "Residual vs. predicted\n lines should match", rank = T, ...)
42-
43-
mtext("DHARMa scaled residual plots", outer = T)
44-
45-
par(oldpar)
46-
}
47-
48-
4912
#' Print simulated residuals
5013
#'
5114
#' @param x an object with simulated residuals created by \code{\link{simulateResiduals}}

DHARMa/R/plotResiduals.R

+42-4
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,41 @@
1+
#' DHARMa standard residual plots
2+
#'
3+
#' This function creates standard plots for the simulated residuals
4+
#' @param x an object with simualted residuals created by \code{\link{simulateResiduals}}
5+
#' @param rank if T (default), the values of pred will be rank transformed. This will usually make patterns easier to spot visually, especially if the distribution of the predictor is skewed.
6+
#' @param ... further options for \code{\link{plotResiduals}}. Consider in particular parameters quantreg, rank and asFactor. xlab, ylab and main cannot be changed when using plotSimulatedResiduals, but can be changed when using plotResiduals.
7+
#' @details The function creates two plots. To the left, a qq-uniform plot to detect deviations from overall uniformity of the residuals (calling \code{\link{plotQQunif}}), and to the right, a plot of residuals against predicted values (calling \code{\link{plotResiduals}}). For a correctly specified model, we would expect
8+
#'
9+
#' a) a straight 1-1 line in the uniform qq-plot -> evidence for an overal uniform (flat) distribution of the residuals
10+
#'
11+
#' b) uniformity of residuals in the vertical direction in the res against predictor plot
12+
#'
13+
#' Deviations of this can be interpreted as for a liner regression. See the vignette for detailed examples.
14+
#'
15+
#' To provide a visual aid in detecting deviations from uniformity in y-direction, the plot of the residuals against the predited values also performs an (optional) quantile regression, which provides 0.25, 0.5 and 0.75 quantile lines across the plots. These lines should be straight, horizontal, and at y-values of 0.25, 0.5 and 0.75. Note, however, that some deviations from this are to be expected by chance, even for a perfect model, especially if the sample size is small. See further comments on this plot, and options, in \code{\link{plotResiduals}}
16+
#'
17+
#' The quantile regression can take some time to calculate, especially for larger datasets. For that reason, quantreg = F can be set to produce a smooth spline instead. This is default for n > 2000.
18+
#'
19+
#' @seealso \code{\link{plotResiduals}}, \code{\link{plotQQunif}}
20+
#' @example inst/examples/plotsHelp.R
21+
#' @import graphics
22+
#' @import utils
23+
#' @export
24+
plot.DHARMa <- function(x, rank = TRUE, ...){
25+
26+
oldpar <- par(mfrow = c(1,2), oma = c(0,1,2,1))
27+
28+
plotQQunif(x)
29+
30+
plotResiduals(pred = x, residuals = NULL, xlab = "Predicted value (rank transformed)", ylab = "Standardized residual", main = "Residual vs. predicted\n lines should match", rank = T, ...)
31+
32+
mtext("DHARMa scaled residual plots", outer = T)
33+
34+
par(oldpar)
35+
}
36+
37+
38+
139
#' DHARMa standard residual plots
240
#'
341
#' DEPRECATED, use plot() instead
@@ -31,7 +69,7 @@ plotQQunif <- function(simulationOutput, testUniformity = T){
3169

3270
gap::qqunif(simulationOutput$scaledResiduals,pch=2,bty="n", logscale = F, col = "black", cex = 0.6, main = "QQ plot residuals", cex.main = 1)
3371
if(testUniformity == TRUE){
34-
temp = testUniformity(simulationOutput)
72+
temp = testUniformity(simulationOutput, plot = F)
3573
legend("topleft", c(paste("KS test: p=", round(temp$p.value, digits = 5)), paste("Deviation ", ifelse(temp$p.value < 0.05, "significant", "n.s."))), text.col = ifelse(temp$p.value < 0.05, "red", "black" ), bty="n")
3674
}
3775
}
@@ -44,7 +82,7 @@ plotQQunif <- function(simulationOutput, testUniformity = T){
4482
#' @param pred either the predictor variable against which the residuals should be plotted, or a DHARMa object
4583
#' @param residuals residuals values. Leave empty if pred is a DHARMa object
4684
#' @param quantreg whether to perform a quantile regression on 0.25, 0.5, 0.75 on the residuals. If F, a spline will be created instead. Default NULL chooses T for nObs < 2000, and F otherwise.
47-
#' @param rank if T, the values of pred will be rank transformed. This will usually make patterns easier to spot visually, especially if the distribution of the predictor is skewed.
85+
#' @param rank if T, the values of pred will be rank transformed. This will usually make patterns easier to spot visually, especially if the distribution of the predictor is skewed. If pred is a factor, this has no effect.
4886
#' @param asFactor should the predictor variable converted into a factor
4987
#' @param ... additional arguments to plot
5088
#' @details For a correctly specified model, we would expect uniformity in y direction when plotting against any predictor.
@@ -81,7 +119,7 @@ plotResiduals <- function(pred, residuals = NULL, quantreg = NULL, rank = FALSE,
81119

82120
if (asFactor) pred = as.factor(pred)
83121

84-
if(! is.factor(pred)){
122+
if(!is.factor(pred)){
85123
nuniq = length(unique(pred))
86124
ndata = length(pred)
87125
if(nuniq < 10 & ndata / nuniq > 10) message("DHARMa::plotResiduals - low number of unique predictor values, consider setting asFactor = T")
@@ -90,7 +128,7 @@ plotResiduals <- function(pred, residuals = NULL, quantreg = NULL, rank = FALSE,
90128
if (rank == T) pred = rank(pred, ties.method = "average")
91129
pred = pred / max(pred)
92130
} else {
93-
if (rank == T) warning("DHARMa::plotResiduals - predictor is a factor, rank = T has no effect")
131+
# if (rank == T) warning("DHARMa::plotResiduals - predictor is a factor, rank = T has no effect")
94132
}
95133

96134

DHARMa/R/random.R

+8-4
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#'
1111
#' @return a list with various infos about the random state that after function execution, as well as a function to restore the previous state before the function execution
1212
#'
13-
#' @param seed seed argument to set.seed()
13+
#' @param seed seed argument to set.seed(). NULL = no seed, but random state will be restored. F = random state will not be restored
1414
#' @export
1515
#' @example inst/examples/getRandomStateHelp.R
1616
#' @author Florian Hartig
@@ -22,10 +22,14 @@ getRandomState <- function(seed = NULL){
2222

2323
current = mget(".Random.seed", envir = .GlobalEnv, ifnotfound = list(NULL))[[1]]
2424

25-
restoreCurrent <- function(){
26-
if(is.null(current)) rm(".Random.seed", envir = .GlobalEnv) else assign(".Random.seed", current , envir = .GlobalEnv)
25+
if(is.logical(seed) & seed == F){
26+
restoreCurrent <- function(){}
27+
}else{
28+
restoreCurrent <- function(){
29+
if(is.null(current)) rm(".Random.seed", envir = .GlobalEnv) else assign(".Random.seed", current , envir = .GlobalEnv)
30+
}
2731
}
28-
32+
2933
# setting seed
3034
if(is.numeric(seed)) set.seed(seed)
3135

DHARMa/R/simulateResiduals.R

+62-4
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
#' @param integerResponse if T, noise will be added at to the residuals to maintain a uniform expectations for integer responses (such as Poisson or Binomial). Usually, the model will automatically detect the appropriate setting, so there is no need to adjust this setting.
88
#' @param plot if T, \code{\link{plotSimulatedResiduals}} will be directly run after the simulations have terminated
99
#' @param ... parameters to pass to the simulate function of the model object. An important use of this is to specify whether simulations should be conditional on the current random effect estimates. See details.
10-
#' @param seed the random seed. The default setting, recommended for any type of data analysis, is to reset the random number generator each time the function is run, meaning that you will always get the same result when running the same code. Setting seed = NA avoids the reset. This is only recommended for simulation experiments. See vignette for details.
11-
#' @return A list with various objects. The most important are scaledResiduals, which contain the scaled residuals, and scaledResidualsNormal, which are the the scaled residuals transformed to a normal distribution
10+
#' @param seed the random seed. The default setting, recommended for any type of data analysis, is to reset the random number generator each time the function is run, meaning that you will always get the same result when running the same code. NULL = no new seed is set, but previous random state will be restored after simulation. F = no seed is set, and random state will not be restored. The latter two options are only recommended for simulation experiments. See vignette for details.
11+
#' @return A list with various objects. The most important are scaledResiduals, which contain the scaled residuals, and scaledResidualsNormal, which are the the scaled residuals transformed to a normal distribution.
1212
#' @details There are a number of important considerations when simulating from a more complex (hierarchical) model.
1313
#'
1414
#' \strong{Re-simulating random effects / hierarchical structure}: the first is that in a hierarchical model, several layers of stochasticity are aligned on top of each other. Specifically, in a GLMM, we have a lower level stochastic process (random effect), whose result enters into a higher level (e.g. Poisson distribution). For other hierarchical models such as state-space models, similar considerations apply. When simulating, we have to decide if we want to re-simulate all stochastic levels, or only a subset of those. For example, in a GLMM, it is common to only simulate the last stochastic level (e.g. Poisson) conditional on the fitted random effects.
@@ -29,7 +29,7 @@
2929
#'
3030
#' #' \strong{How many simulations}: about the choice of n: my simulations didn't show major problems with a small n (if you get down to the order of a few 10, you will start seeing discretization artifacts from the empirical cummulative density estimates though). The default of 250 seems safe to me. If you want to be on the safe side, choose a high value (e.g. 1000) for producing your definite results.
3131
#'
32-
#' @seealso \code{\link{testSimulatedResiduals}}, \code{\link{plotSimulatedResiduals}}
32+
#' @seealso \code{\link{testResiduals}}, \code{\link{plot.DHARMa}}, \code{\link{print.DHARMa}}, \code{\link{recalculateResiduals}}
3333
#' @example inst/examples/simulateResidualsHelp.R
3434
#' @import stats
3535
#' @export
@@ -194,7 +194,7 @@ simulateResiduals <- function(fittedModel, n = 250, refit = F, integerResponse =
194194
}
195195

196196
}
197-
197+
198198
########### Wrapup ############
199199

200200
out$scaledResidualsNormal = qnorm(out$scaledResiduals + 0.00 )
@@ -299,6 +299,64 @@ securityAssertion <- function(context = "Not provided", stop = F){
299299
}
300300

301301

302+
#' Recalculate residuals with grouping
303+
#'
304+
#' The purpose of this function is to recalculate scaled residuals per group, based on the simulations done by \code{\link{simulateResiduals}}
305+
#'
306+
#' @param simulationOutput an object with simualted residuals created by \code{\link{simulateResiduals}}
307+
#' @param group group of each data point
308+
#' @param aggregateBy function for the aggregation. Default is sum. This should only be changed if you know what you are doing. Note in particular that the expected residual distribution might not be flat any more if you choose general functions, such as sd etc.
309+
#'
310+
#' @return an object of class DHARMa, similar to what is returned by \code{\link{simulateResiduals}}, but with additional outputs for the new grouped calculations. Note that the relevant outputs are 2x in the object, the first is the grouped calculations (which is returned by $name access), and later another time, under identical name, the original output. Moreover, there is a function 'aggregateByGroup', which can be used to aggregate predictor variables in the same way as the variables calculated here
311+
#'
312+
#' @example inst/examples/simulateResidualsHelp.R
313+
#' @export
314+
recalculateResiduals <- function(simulationOutput, group = NULL, aggregateBy = sum){
315+
316+
if(!is.null(simulationOutput$original)) simulationOutput = simulationOutput$original
317+
318+
out = list()
319+
320+
if(is.null(group)) return(simulationOutput)
321+
else group =as.factor(group)
322+
out$nGroups = nlevels(group)
323+
324+
aggregateByGroup <- function(x) aggregate(x, by=list(group), FUN=aggregateBy)[,2]
325+
326+
out$observedResponse = aggregateByGroup(simulationOutput$observedResponse)
327+
out$fittedPredictedResponse = aggregateByGroup(simulationOutput$fittedPredictedResponse)
328+
out$simulatedResponse = apply(simulationOutput$simulatedResponse, 2, aggregateByGroup)
329+
out$scaledResiduals = rep(NA, out$nGroups)
330+
331+
if (simulationOutput$refit == F){
332+
if(simulationOutput$integerResponse == T){
333+
for (i in 1:out$nGroups) out$scaledResiduals[i] <- ecdf(out$simulatedResponse[i,] + runif(out$nGroups, -0.5, 0.5))(out$observedResponse[i] + runif(1, -0.5, 0.5))
334+
} else {
335+
for (i in 1:out$nGroups) out$scaledResiduals[i] <- ecdf(out$simulatedResponse[i,])(out$observedResponse[i])
336+
}
337+
######## refit = T ##################
338+
} else {
339+
340+
out$refittedPredictedResponse <- apply(simulationOutput$refittedPredictedResponse, 2, aggregateByGroup)
341+
out$fittedResiduals = aggregateByGroup(simulationOutput$fittedResiduals)
342+
out$refittedResiduals = apply(simulationOutput$refittedResiduals, 2, aggregateByGroup)
343+
out$refittedPearsonResiduals = apply(simulationOutput$refittedPearsonResiduals, 2, aggregateByGroup)
344+
345+
if(simulationOutput$integerResponse == T){
346+
for (i in 1:out$nGroups) out$scaledResiduals[i] <- ecdf(out$refittedResiduals[i,] + runif(out$nGroups, -0.5, 0.5))(out$fittedResiduals[i] + runif(1, -0.5, 0.5))
347+
} else {
348+
for (i in 1:out$nGroups) out$scaledResiduals[i] <- ecdf(out$refittedResiduals[i,])(out$fittedResiduals[i])
349+
}
350+
}
351+
# hack - the c here will result in both old and new outputs to be present resulting output, but a named access should refer to the new, grouped calculations
352+
out$aggregateByGroup = aggregateByGroup
353+
out = c(out, simulationOutput)
354+
out$original = simulationOutput
355+
class(out) = "DHARMa"
356+
return(out)
357+
}
358+
359+
302360

303361

304362

0 commit comments

Comments
 (0)