cyclomort
: A tool for
estimating seasonal mortality patternsThe cyclomort
package
presents a selection of tools for fitting, exploring, and visualizing
uni- or multi-modal periodic hazard functions to right-censored
mortality data. Its development was motivated by the fairly common
observation in wildlife populations that the risk of mortality is higher
in certain times of year than others. The central function,
fit_cyclomort
, returns estimates of the timing, duration,
and intensity of one or more “mortality seasons”. Other functions
summarize, visualize, and perform some statistical tests comparing
models. The underlying model and its implementation are described in
detail in Gurarie et al. (in review), and users are encouraged
to read the paper for a deeper understanding of the likelihood based
estimation of the parameters. This vignette illustrates the basic use of
the most important functions.
To load the package:
The underlying hazard model is based on a mixture of a modified
wrapped Cauchy-like functions. Its simplest iteration is
wc()
, parameterized in terms of a mean mu
, a
concentration parameter rho
which ranges from 0 to 1, and a
periodicity tau
. Its mean value is 1. In the examples
below, there is a single mortality season that peaks on day 100 on a
period of 365 days (i.e. April 10 in a calendar year):
curve(wc(x, mu = 100, rho = .7, tau = 365), xlim = c(0,365), n = 1e4, ylab = "hazard", xlab = "time")
curve(wc(x, mu = 100, rho = .5, tau = 365), add = TRUE, col = 2)
curve(wc(x, mu = 100, rho = .3, tau = 365), add = TRUE, col = 3)
legend("topright", col = 1:3, legend = c(.7,.5,.3), title = "rho", lty = 1)
The mixture version mwc()
takes vectors of the means and
concentration parameters, as well as a vector mean hazard values
gammas
. The sum of gammas
is the overall mean
hazard. Several examples (on a periodicity of 1, with an overall mean
hazard 1) are illustrated below:
mus <- c(.3,.5,.9)
rhos <- c(.5,.9,.7)
gammas <- c(.6,.1,.3)
curve(mwc(x, mus = mus[1], rhos = rhos[1], gammas = 1, tau = 1),
xlim = c(0,1), ylab = "hazard", xlab = "time")
curve(mwc(x, mus = mus[1:2], rhos = rhos[1:2], gammas = gammas[1:2]/sum(gammas[1:2]), tau = 1),
add = TRUE, col = 2)
curve(mwc(x, mus = mus, rhos = rhos, gammas = gammas, tau = 1),
add = TRUE, col = 3)
Note that in the three season model above, the second peak is quite pronounced (i.e. has the highest instantaneous hazard), but it accounts for only 1/10 of the total hazard since it has such a short duration.
Additionally there are functions for the integrals of these functions
: iwc()
and imwc()
respectively. It is
important to remember that these functions are not probability
distributions (or, respectively, cumulative probability distributions)
but positive recurring functions with an infinite support, the long-term
mean of which is the sum of gamma vector.
This set of functions are mainly internal and used for fitting,
though they can be useful for visualizing the shape of an underlying
hazard. In subsequent analyses and applications, the parameter
rho
is replaced with a “season duration” parameter, and an
additional parameter of the mean overall hazard. In turn, gammas are
converted to “weight” parameters for each component, with all weights
from each component summing to 1.
The simulate_cycloSurv()
function generates
time-to-event data from an underlying periodic hazard process. Note that
this function is parameterized via the (spelled-out) parameters:
period
, meanhazard
, peaks
,
durations
and weights
. By default, this
function plots the hazard, cumulative mortality, survival curve, and a
histogram of simulated mortalities:
T.morts.sim <- simulate_cycloSurv(n = 300, period = 365, meanhazard = 1/365,
peaks = c(100, 250), durations = c(25, 40),
weights = c(0.6, 0.4), plotme = TRUE,
max.periods = 5)
The function returns an object of the cyclomort
specific
class cycloSurv
, which is a superclass of the
Surv
data type used widely in survival analysis in R:
The fit_cyclomort()
function estimates the parameters of
the periodic hazard process. Note that we have to specify the numbers of
seasons to fit.
The returned object is of class cmfit
, which has
summary
, print
and plot
methods:
print(T.morts.fit)
> Multi-seasonal hazard function fit with 2 seasons with periodicity 365.
>
> $meanhazard
> estimate CI.low CI.high se
> 1 0.003057976 0.002804095 0.003311857 0.0001269406
>
> $`season 1`
> parameter season estimate CI.low CI.high
> 1 peak 1 101.9432851 99.0665436 104.8200266
> 2 duration 1 25.0426128 18.9551849 32.7292681
> 3 weight 1 0.5680905 0.4605317 0.6756493
>
> $`season 2`
> parameter season estimate CI.low CI.high
> 1 peak 2 242.4753420 230.69456 254.256126
> 2 duration 2 57.8271311 37.87082 82.553070
> 3 weight 2 0.4319095 0.30541 0.558409
>
> Log-likelihood: -104.502; AIC: 221.004
From this summary, we see that the true values are within the 95% confidence intervals of the estimates.
The default plot:
plot(T.morts.fit, breaks = 30)
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
By default, the function plots a histogram of the data as well. It can be useful to plot only the fit and the confidence interval, for example to compare the 2 season fit with a 1 season and null fit, while suppressing the histogram, for example:
T.morts.1season.fit <- fit_cyclomort(T.morts.sim, n.seasons = 1)
> Warning in fit_cyclomort(T.morts.sim, n.seasons = 1): Upper confidence limit
> for weights manually coerced to 1
T.morts.null.fit <- fit_cyclomort(T.morts.sim, n.seasons = 0)
plot(T.morts.fit, histogram = FALSE, monthlabs = TRUE)
plot(T.morts.1season.fit, histogram = FALSE, add = TRUE, hazcolor = "red")
plot(T.morts.null.fit, histogram = FALSE, add = TRUE, hazcolor = "blue")
The plot of the fitted objects includes confidence intervals around
the predictions. These are obtained numerically by generating a hazard
curve many (by default: 5000) times using parameter value sampled from
the MLE estimates using the Hessian to generate a variance covariance
matrix across the parameters. This procedure is performed with the
predict
method (predict.cmfit
function), which
returns a set of predictions for a given fit at a time or set of times
with a specific parameter sets.
Users can also call predict.cmfit
to estimate the
expected time to death for any individual at any given time. Note,
this procedure takes a very (in fact,
embarrassingly) long time even with relatively few samples (e.g. 100).
We are working on this!
## timetoeventprediction <- predict(T.morts.fit, t = 1:365, type = "timetoevent", CI = TRUE, nreps = 100)
The figure below illustrates the extent to which the expected time to an event depends on when an individual “enters” the process.
with(timetoeventprediction, {
plot(t, fit, type = "l", ylab = "Time to event", ylim = range(CI), lwd = 2)
lines(t, CI[1,])
lines(t, CI[2,])
})
An important aspect of survival modelling is the ability to
definitively characterize specific traits of a data set or population.
Identifying the probable number of mortality seasons in a system is a
powerful and useful claim that one can make using the
select_seasons
function. This function calls
fit_cyclomort
for a number of different seasons (ranging
from the null model of 0 seasons to the max.season
parameter), identifying the best model using Akaike Information
Criterion (AIC).
The select_seasons
function returns a
cmfitlist
object which is equipped with unique print and
summary functions. While cmfitlist
objects are simply lists
of cmfit
objects (each element of the list being the cmfit
object representing a different seasonal assumption), the unique
summary.cmfitlist
function allows for easier comparison of
the different fits.
T.select <- select_seasons(T.morts.sim, max.season = 4)
> Fitting model with 0 seasons...
> Fitting model with 1 seasons...
> Warning in fit_cyclomort(x, n.seasons = n.seasons): Upper confidence limit for
> weights manually coerced to 1
> Fitting model with 2 seasons...
> Fitting model with 3 seasons...
> Fitting model with 4 seasons...
> Warning in sqrt(diag(solve(fits$hessian))): NaNs produced
> Warning in fit_cyclomort(x, n.seasons = n.seasons): Could not produce confidence intervals for mean hazard parameter. Interpret results with skepticism.
> Warning in fit_cyclomort(x, n.seasons = n.seasons): Could not produce accurate confidence intervals for weight parameter. Interpret results with skepticism.
> Warning in fit_cyclomort(x, n.seasons = n.seasons): Could not produce confidence intervals for peak parameter. Interpret results with skepticism.
> Warning in fit_cyclomort(x, n.seasons = n.seasons): Could not produce confidence intervals for duration parameter. Interpret results with skepticism.
>
> Delta AIC table of fitted models:
> n.seasons logLik d.f. AIC dAIC
> 0 -214.3846 1 430.7691 209.764713
> 1 -164.4372 3 334.8744 113.869963
> 2 -104.5022 6 221.0044 0.000000
> 3 -103.1173 9 224.2347 3.230234
> 4 -102.3955 12 228.7909 7.786521
As expected, the select_seasons
function identifies the
two season model as the one with the lowest AIC value. Note, the
likelihood maximization algorithm had some difficulty obtaining the
Hessian matrix for the overly complex 4 season model, but (in our
experience) the inference for model selection is still generally
stable.
The factorfit_cyclomort
function allows users to test
whether or not a categorical variable has an impact on seasonal
mortality patterns, using R’s formula notation. Input data for this
function can be structured as a data frame with a categorical variable
and a Surv
(or cycloSurv
) object in columns.
There is a simulated dataset in the package called
seasonalsex
that illustrates a hypothetical example where
factorial analysis may be useful:
data(seasonalsex)
str(seasonalsex)
> 'data.frame': 200 obs. of 2 variables:
> $ sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
> $ event: 'cycloSurv' num [1:200, 1:3] (0,6.134+] (0,1.822] (0,5.347+] (0,1.121] ...
> ..- attr(*, "dimnames")=List of 2
> .. ..$ : NULL
> .. ..$ : chr [1:3] "start" "stop" "status"
> ..- attr(*, "type")= chr "counting"
> ..- attr(*, "period")= num 1
> ..- attr(*, "t0")= num 0
The implementation is straightforward:
Survival.factorfit <- factorfit_cyclomort(event ~ sex, data = seasonalsex, n.seasons = 1)
> Warning in fit_cyclomort(t, n.seasons = n.seasons, ...): Upper confidence limit
> for weights manually coerced to 1
> Warning in fit_cyclomort(t.subset, n.seasons = n.seasons, ...): Upper
> confidence limit for weights manually coerced to 1
> Warning in fit_cyclomort(t.subset, n.seasons = n.seasons, ...): Upper
> confidence limit for weights manually coerced to 1
summary(Survival.factorfit)
> Summary table comparing factorial seasonal survival model with 1 seasons.
>
> Formula: event ~ sex
> logLike k AIC LRT p.value
> null -339.2927 3 684.59 44.15 0
> sex -317.2170 6 646.43
The analyses reports a p-value based on a likelihood ratio test of a model that includes the factor against a null model that does not, as well as reporting AIC values. A plotting method provides a visual comparison of the fits.
plot(Survival.factorfit, ymax = 1.2)
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
>
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
>
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
Aside from the simulated sex-difference process, the package includes two anonymized and randomized caribou mortality data sets. The code below replicates the analyses presented in Gurarie et al. (in review).
The first is from woodland caribou (Rangifer tarandus caribou) collected throughout the Northwest Territories in Canada, loaded and plotted as follows:
data(nwt_morts)
require(ggplot2); require(magrittr); require(plyr)
ggplot(nwt_morts %>% arrange(start) %>% mutate(id = factor(id, levels = id)),
aes(x = start, y = id, col = status)) +
geom_errorbarh(aes(xmin = start, xmax = end))
Note these data have been randomized by year, such that the seasonal patterns are retained.
These data need to be converted to a cycloSurv
data
type, which inherits from the Surv
class used in
survival
and other survival modeling packages:
nwt_surv = with(nwt_morts,
create_cycloSurv(start, end, event = status == "Mort",
period = 365, timeunit = "days"))
Note, it is important to specify the length of the period and time-unit.
Once these data are converted to the right class, they can be analyzed as in the examples above. For example, the Northwest Territories woodland caribou show three distinct seasons of higher mortality:
nwt_fit <- fit_cyclomort(nwt_surv, n.seasons = 3)
plot(nwt_fit, breaks = 40)
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
summary(nwt_fit)
> $model
> [1] "Multi-seasonal hazard function fit with 3 seasons with periodicity 365.\n\n"
>
> $estimates
> $estimates$meanhazard
> estimate CI.low CI.high se
> 1 0.0004687024 0.000426814 0.0005105908 2.09442e-05
>
> $estimates$`season 1`
> parameter season estimate CI.low CI.high
> 1 peak 1 111.3346293 107.3627962 115.3064624
> 2 duration 1 16.6833805 9.4696085 28.6363512
> 3 weight 1 0.2935465 0.1834352 0.4036578
>
> $estimates$`season 2`
> parameter season estimate CI.low CI.high
> 1 peak 2 193.2151937 188.2755275 198.1548600
> 2 duration 2 24.3743563 12.8503759 43.8717275
> 3 weight 2 0.4126835 0.2422596 0.5831074
>
> $estimates$`season 3`
> parameter season estimate CI.low CI.high
> 1 peak 3 314.34156 288.3487125 340.3344057
> 2 duration 3 67.88710 24.5513232 127.1531969
> 3 weight 3 0.29377 0.1185108 0.4690292
>
>
> $analysis
> [1] "Log-likelihood: -324.6288; AIC: 667.2577"
The second data set is from the western Arctic herd (WAH) of migratory tundra caribou in western Alaska, U.S.:
data(wah_morts)
ggplot(wah_morts %>% arrange(start),
aes(x = start, y = id, col = fate)) +
geom_errorbarh(aes(xmin = start, xmax = end))
The western Arctic herd, with two pronounced seasons of mortality,
showed an increase in mortality in the 2017-2018 winter season, which we
analyze by converting to a cycloSurv
objects and censoring
and trimming the data in September 2017 using two convenient helper
functions, censor_cycloSurv
and
trim_cycloSurv
:
wah <- with(wah_morts, create_cycloSurv(start = start, end = end,
event = fate == "dead", period = 365))
cutoff <- "2016-09-01"
wah_pre = censor_cycloSurv(wah, censor.time = cutoff)
wah_post = trim_cycloSurv(wah, trim.time = cutoff)
wah_fit_pre <- fit_cyclomort(wah_pre, n.seasons = 1)
> Warning in fit_cyclomort(wah_pre, n.seasons = 1): Upper confidence limit for
> weights manually coerced to 1
wah_fit_post <- fit_cyclomort(wah_post, n.seasons = 1)
> Warning in fit_cyclomort(wah_post, n.seasons = 1): Upper confidence limit for
> weights manually coerced to 1
summary(wah_fit_pre)
> $model
> [1] "Multi-seasonal hazard function fit with 1 seasons with periodicity 365.\n\n"
>
> $estimates
> $estimates$meanhazard
> estimate CI.low CI.high se
> 1 0.0004578555 0.0002812696 0.0006344414 8.829294e-05
>
> $estimates$`season 1`
> parameter season estimate CI.low CI.high
> 1 peak 1 163.23033 136.5797 189.8810
> 2 duration 1 82.46871 44.8649 123.5056
>
>
> $analysis
> [1] "Log-likelihood: -70.3358; AIC: 146.6715"
summary(wah_fit_post)
> $model
> [1] "Multi-seasonal hazard function fit with 1 seasons with periodicity 365.\n\n"
>
> $estimates
> $estimates$meanhazard
> estimate CI.low CI.high se
> 1 0.0007732174 0.000406981 0.001139454 0.0001831182
>
> $estimates$`season 1`
> parameter season estimate CI.low CI.high
> 1 peak 1 73.91225 -42.13717 189.9617
> 2 duration 1 143.25942 48.44100 177.3495
>
>
> $analysis
> [1] "Log-likelihood: -40.0186; AIC: 86.0372"
plot(wah_fit_pre, hist= FALSE, ymax = 0.003, monthlabs = TRUE)
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
plot(wah_fit_post, hist= FALSE, add = TRUE, hazcolor = "red")
> Estimating 95% confidence intervals for the hazard function drawing 5000 samples from the parameter estimates.
Note that this result differs somewhat from that in Gurarie et al. (2020) since the data provided in the package are a subset of the complete data set, and too sparse to fit a more appropriate two season model.
E. Gurarie, P. Thompson, A. Kelly, N. Larter, W. Fagan and K. Joly
(2020) For Everything There is a Season: Estimating periodic hazard
functions with the cyclomort
R package. Methods in
Ecology and Evolution. 11(1): 129-138. https://doi.org/10.1111/2041-210X.13305.