Estimation of secondary regression models after the estimation of a primary latent class model
Source:R/externVar.R
externVar.Rd
This function fits regression models to relate a latent class structure (stemmed
from a latent class model estimated within lcmm
package) with either an external
outcome or external class predictors.
Two inference techniques are implemented to account for the classification error:
- a 2-stage estimation of the joint likelihood of the primary latent class model and the secondary/ external regression;
- a regression between the posterior latent class assignment and the external variable which internally corrects for the assignment misclassification.
It returns an object from one of the lcmm
package classes.
Usage
externVar(
model,
fixed,
mixture,
random,
subject,
classmb,
survival,
hazard = "Weibull",
hazardtype = "Specific",
hazardnodes = NULL,
TimeDepVar = NULL,
logscale = FALSE,
idiag = FALSE,
nwg = FALSE,
randomY = NULL,
link = NULL,
intnodes = NULL,
epsY = NULL,
cor = NULL,
nsim = NULL,
range = NULL,
data,
longitudinal,
method,
varest,
M = 200,
B,
convB = 1e-04,
convL = 1e-04,
convG = 1e-04,
maxiter = 100,
posfix,
partialH = FALSE,
verbose = FALSE,
nproc = 1
)
Arguments
- model
an object inheriting from class
hlme
,lcmm
,Jointlcmm
,multlcmm
ormpjlcmm
giving the primary latent class model.- fixed
optional two sided linear formula object for specifying the fixed-effects in the secondary model with an external outcome variable. The response outcome is on the left of
~
and the covariates are separated by+
on the right of the~
. By default, an intercept is included.- mixture
optional one-sided formula object for the class-specific fixed effects in the model for the external outcome. Among the list of covariates included in fixed, the covariates with class-specific regression parameters are entered in mixture separated by
+
. By default, an intercept is included. If no intercept,-1
should be the first term included.- random
optional one-sided linear formula object for specifying the random-effects on external outcome in the secondary model, if appropriate. By default, no random effect is included.
- subject
name of the covariate representing the grouping structure. Even in the absence of a hierarchical structure.
- classmb
optional one-sided formula specifying the external predictors of latent class membership to be modelled in the secondary class-membership multinomial logistic model. Covariates are separated by
+
on the right of the~
.- survival
optional two-sided formula specifying the external survival part of the model.
- hazard
optional family of hazard function assumed for the survival model (Weibull, piecewise or splines)
- hazardtype
optional indicator for the type of baseline risk function (Specific, PH or Common)
- hazardnodes
optional vector containing interior nodes if
splines
orpiecewise
is specified for the baseline hazard function inhazard
- TimeDepVar
optional vector specifying the name of the time-depending covariate in the survival model
- logscale
optional boolean indicating whether an exponential (logscale=TRUE) or a square (logscale=FALSE -by default) transformation is used to ensure positivity of parameters in the baseline risk functions
- idiag
if appropriate, optional logical for the structure of the variance-covariance matrix of the random-effects in the secondary model. If
FALSE
, a non structured matrix of variance-covariance is considered (by default). IfTRUE
a diagonal matrix of variance-covariance is considered.- nwg
if appropriate, optional logical indicating if the variance-covariance of the random-effects in the secondary model is class-specific. If
FALSE
the variance-covariance matrix is common over latent classes (by default). IfTRUE
a class-specific proportional parameter multiplies the variance-covariance matrix in each class (the proportional parameter in the last latent class equals 1 to ensure identifiability).- randomY
optional logical for including an outcome-specific random intercept. If FALSE no outcome-specific random intercept is added (default). If TRUE independent outcome-specific random intercept with parameterized variance are included
- link
optional family of parameterized link functions for the external outcome if appropriate. Defaults to NULL, corresponding to continuous Gaussian distribution (hlme function).
- intnodes
optional vector of interior nodes. This argument is only required for a I-splines link function with nodes entered manually.
- epsY
optional definite positive real used to rescale the marker in (0,1) when the beta link function is used. By default, epsY=0.5.
- cor
optional indicator for inclusion of an auto correlated Gaussian process in the latent process linear (latent process) mixed model. Option "BM" indicates a brownian motion with parameterized variance. Option "AR" specifies an autoregressive process of order 1 with parameterized variance and correlation intensity. Each option should be followed by the time variable in brackets as codecor=BM(time). By default, no autocorrelated Gaussian process is added.
- nsim
number of points to be used in the estimated link function. By default, nsom=100.
- range
optional vector indicating the range of the outcomes (that is the minimum and maximum). By default, the range is defined according to the minimum and maximum observed values of the outcome. The option should be used only for Beta and Splines transformations.
- data
Data frame containing the variables named in
fixed
,mixture
,random
,classmb
andsubject
, for both the current function arguments and the primary model arguments Checkdetails
to get information on the data structure, especially with external outcomes.- longitudinal
only with
mpjlcmm
primary models and "twoStageJoint" method: mandatory list containing the longitudinal submodels used in the primary latent class model.- method
character indicating the inference technique to be used:
"twoStageJoint"
corresponds to 2-stage estimation."conditional"
corresponds to the method based on the distribution of Y conditionally to the true latent class membership.- varest
optional character indicating the method to be used to compute the variance of the regression estimates.
"none"
does not account for the uncertainty in the primary latent class model,"paramBoot"
computes the total variance using a parametric bootstrap technique,"Hessian"
computes the total Hessian of the joint likelihood (implemented for"twoStageJoint"
method only). Default to"Hessian"
for"twoStageJoint"
method and"paramBoot"
for"conditional"
method.- M
option integer indicating the number of draws for the parametric boostrap when
varest="paramBoot"
. Default to 200.- B
optional vector of initial parameter values for the secondary model. If external outcome, the vector has the same structure as a latent class model estimated in the other functions of
lcmm
package for the same type of outcome. If external class predictors (of size p), the vector is of length (ng-1)*(1+p). IfB=NULL
(by default), internal initial values are selected.- convB
optional threshold for the convergence criterion based on the parameter stability. By default, convB=0.0001.
- convL
optional threshold for the convergence criterion based on the log-likelihood stability. By default, convL=0.0001.
- convG
optional threshold for the convergence criterion based on the derivatives. By default, convG=0.0001.
- maxiter
optional maximum number of iterations for the secondary model estimation using Marquardt iterative algorithm. Defaults to 100
- posfix
optional vector specifying indices in parameter vector B the secondary model that should not be estimated. Default to NULL, all the parameters of the secondary regression are estimated.
- partialH
optional logical for Piecewise and Splines baseline risk functions and Splines link functions only. Indicates whether the parameters of the baseline risk or link functions can be dropped from the Hessian matrix to define convergence criteria (can solve non convergence due to estimates at the boundary of the parameter space - usually 0).
- verbose
logical indicating whether information about computation should be reported. Default to FALSE.
- nproc
the number cores for parallel computation. Default to 1 (sequential mode).
Value
an object of class externVar
and
externSurv
for external survival outcomes,
externX
for external class predictors, and
hlme
, lcmm
, or multlcmm
for external longitudinal or cross-sectional outcomes.
Details
A. DATA STRUCTURE
The data
argument must follow specific structure for individual variables,
i.e. variables with a unique constant value for each subject. For an individual variable
given as external outcome, data value must be present only once per subject,
independently of any time variable used in the primary latent class.
For an individual variable given as external class predictor,
data values must be given for every row of every individual (as usual)
B. VARIANCE ESTIMATION
Not taking into account first stage variance with specifing "none"
may lead to
underestimation of the final variance. When possible, Method "Hessian"
which relies on the combination of Hessians from the primary and secondary
model is recommended. However, it may become numerically intensive in the event
of very high number of parameters in the primary latent class model. As an
alternative, especially in situations with a complex primary model but rather
parcimonious secondary model, method "paramBoot"
which implements a
parametric bootstrap can be used.
Examples
# \dontrun{
###### Estimation of the primary latent class model ######
set.seed(1234)
PrimMod <- hlme(Ydep1~Time,random=~Time,subject='ID',ng=1,data=data_lcmm)
PrimMod2 <- hlme(Ydep1~Time,mixture=~Time,random=~Time,subject='ID',
ng=2,data=data_lcmm,B=random(PrimMod))
#> Error in eval(cl$B[[2]], parent.env(environment())): object 'PrimMod' not found
###### Example 1: Relationship between a latent class structure and #
# external class predictors ######
# estimation of the secondary multinomial logistic model with total variance
# computed with the Hessian
XextHess <- externVar(PrimMod2,
classmb = ~X1 + X2 + X3 + X4,
subject = "ID",
data = data_lcmm,
method = "twoStageJoint")
#> Error in externVar(PrimMod2, classmb = ~X1 + X2 + X3 + X4, subject = "ID", data = data_lcmm, method = "twoStageJoint"): object 'PrimMod2' not found
summary(XextHess)
#> Error in summary(XextHess): object 'XextHess' not found
# estimation of a secondary multinomial logistic model with total variance
# computed with parametric Bootstrap (much longer). When using the bootstrap
# estimator, we recommend running first the analysis with option varest = "none"
# which is faster but which underestimates the variance. And then use these values
# as initial values when running the model with varest = "paramBoot" to obtain
# a valid variance of the parameters.
XextNone <- externVar(PrimMod2,
classmb = ~X1 + X2 + X3 + X4,
subject = "ID",
data = data_lcmm,
varest = "none",
method = "twoStageJoint")
#> Error in externVar(PrimMod2, classmb = ~X1 + X2 + X3 + X4, subject = "ID", data = data_lcmm, varest = "none", method = "twoStageJoint"): object 'PrimMod2' not found
XextBoot <- externVar(PrimMod2,
classmb = ~X1 + X2 + X3 + X4,
subject = "ID",
data = data_lcmm,
varest = "paramBoot",
method = "twoStageJoint",
B = XextNone$best)
#> Error in externVar(PrimMod2, classmb = ~X1 + X2 + X3 + X4, subject = "ID", data = data_lcmm, varest = "paramBoot", method = "twoStageJoint", B = XextNone$best): object 'PrimMod2' not found
summary(XextBoot)
#> Error in summary(XextBoot): object 'XextBoot' not found
###### Example 2: Relationship between a latent class structure and #
# external outcome (repeatedly measured over time) ######
# estimation of the secondary linear mixed model with total variance
# computed with the Hessian
YextHess = externVar(PrimMod2, #primary model
fixed = Ydep2 ~ Time*X1, #secondary model
random = ~Time, #secondary model
mixture = ~Time, #secondary model
subject="ID",
data=data_lcmm,
method = "twoStageJoint")
#> Error in externVar(PrimMod2, fixed = Ydep2 ~ Time * X1, random = ~Time, mixture = ~Time, subject = "ID", data = data_lcmm, method = "twoStageJoint"): object 'PrimMod2' not found
# estimation of a secondary linear mixed model with total variance
# computed with parametric Bootstrap (much longer). When using the bootstrap
# estimator, we recommend running first the analysis with option varest = "none"
# which is faster but which underestimates the variance. And then use these values
# as initial values when running the model with varest = "paramBoot" to obtain
# a valid variance of the parameters.
YextNone = externVar(PrimMod2, #primary model
fixed = Ydep2 ~ Time*X1, #secondary model
random = ~Time, #secondary model
mixture = ~Time, #secondary model
subject="ID",
data=data_lcmm,
varest = "none",
method = "twoStageJoint")
#> Error in externVar(PrimMod2, fixed = Ydep2 ~ Time * X1, random = ~Time, mixture = ~Time, subject = "ID", data = data_lcmm, varest = "none", method = "twoStageJoint"): object 'PrimMod2' not found
YextBoot = externVar(PrimMod2, #primary model
fixed = Ydep2 ~ Time*X1, #secondary model
random = ~Time, #secondary model
mixture = ~Time, #secondary model
subject="ID",
data=data_lcmm,
method = "twoStageJoint",
B = YextNone$best,
varest= "paramBoot")
#> Error in externVar(PrimMod2, fixed = Ydep2 ~ Time * X1, random = ~Time, mixture = ~Time, subject = "ID", data = data_lcmm, method = "twoStageJoint", B = YextNone$best, varest = "paramBoot"): object 'PrimMod2' not found
summary(YextBoot)
#> Error in summary(YextBoot): object 'YextBoot' not found
###### Example 3: Relationship between a latent class structure and #
# external outcome (survival) ######
# estimation of the secondary survival model with total variance
# computed with the Hessian
YextHess = externVar(PrimMod2, #primary model
survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
hazard="3-quant-splines", #secondary model
hazardtype="PH", #secondary model
subject="ID",
data=data_lcmm,
method = "twoStageJoint")
#> Error in externVar(PrimMod2, survival = Surv(Tevent, Event) ~ X1 + mixture(X2), hazard = "3-quant-splines", hazardtype = "PH", subject = "ID", data = data_lcmm, method = "twoStageJoint"): object 'PrimMod2' not found
summary(YextHess)
#> Error in summary(YextHess): object 'YextHess' not found
# estimation of a secondary survival model with total variance
# computed with parametric Bootstrap (much longer). When using the bootstrap
# estimator, we recommend running first the analysis with option varest = "none"
# which is faster but which underestimates the variance. And then use these values
# as initial values when running the model with varest = "paramBoot" to obtain
# a valid variance of the parameters.
YextNone = externVar(PrimMod2, #primary model
survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
hazard="3-quant-splines", #secondary model
hazardtype="PH", #secondary model
subject="ID",
data=data_lcmm,
varest = "none",
method = "twoStageJoint")
#> Error in externVar(PrimMod2, survival = Surv(Tevent, Event) ~ X1 + mixture(X2), hazard = "3-quant-splines", hazardtype = "PH", subject = "ID", data = data_lcmm, varest = "none", method = "twoStageJoint"): object 'PrimMod2' not found
YextBoot = externVar(PrimMod2, #primary model
survival = Surv(Tevent,Event)~ X1+mixture(X2), #secondary model
hazard="3-quant-splines", #secondary model
hazardtype="PH", #secondary model
subject="ID",
data=data_lcmm,
method = "twoStageJoint",
B = YextNone$best,
varest= "paramBoot")
#> Error in externVar(PrimMod2, survival = Surv(Tevent, Event) ~ X1 + mixture(X2), hazard = "3-quant-splines", hazardtype = "PH", subject = "ID", data = data_lcmm, method = "twoStageJoint", B = YextNone$best, varest = "paramBoot"): object 'PrimMod2' not found
summary(YextBoot)
#> Error in summary(YextBoot): object 'YextBoot' not found
# }