Monday, November 25, 2024
spot_imgspot_img

Top 5 This Week

spot_img

Related Posts

Development and validation of a prognostic model to predict relapse in adults with remitted depression in primary care: secondary analysis of pooled individual participant data from multiple studies


WHAT IS ALREADY KNOWN ON THIS TOPIC

WHAT THIS STUDY ADDS

  • We found that it is not possible to accurately predict individualised risk of relapse using prognostic factors that are routinely collected and available in primary care. We found evidence to suggest that relationship status (not being in a relationship) is associated with increased risk of relapse and warrants confirmatory prognostic factor research.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Future prognosis research in this area should focus on exploring the feasibility of routinely measuring and documenting additional prognostic factors in primary care (eg, adverse childhood events, relationship status and social support) and including these in prognostic models. Until we can more accurately identify individuals at increased risk of relapse, commonly used acute-phase treatments could be optimised to better prepare for and mitigate the risk of relapse and there is a need for brief, scalable relapse prevention interventions that could be provided more widely.

Background

Depression is the leading cause of disability worldwide1; the vast majority of adults seeking treatment for depression are managed in primary care.2 Relapse is common, with around half of people experiencing a relapse within 1 year of reaching remission.3 This high relapse rate contributes to the overall morbidity and burden associated with depression.4

The ability to predict an individual patient’s risk of relapse after an episode of depression might assist clinicians in targeting relapse prevention interventions towards those at greatest risk. Well-established prognostic factors associated with increased risk of relapse are: residual depressive symptoms, previous depressive episodes, childhood maltreatment, comorbid anxiety, neuroticism, younger age of first onset and rumination.5 While the presence or absence of these prognostic factors can help refine estimates of overall prognosis to particular subgroups, they do not effectively aid risk stratification at the individual level. Subgrouping methods have been used to predict average risk of relapse for groups of people with different combinations (or profiles) of prognostic factors.6 However, individualised outcome prediction is best shaped using multiple prognostic factors in combination, in the form of multivariable prognostic models.7

Our systematic review of prognostic models identified 12 studies of relapse prediction models.8 The majority were at high overall risk of bias (the most significant limitations being inadequate sample size, inappropriate handling of missing data and calibration or discrimination not reported). The developed models either demonstrated insufficient predictive performance on reported validation by the study authors or they could not be feasibly implemented in a primary care setting due to the large number and type of included predictors. We concluded that we currently lack evidence-based tools to assist clinicians with risk prediction of depressive relapse in any clinical setting and that new models are required to give accurate risk predictions in primary care settings.

Objective

The objective is to develop and validate a prognostic model, for use in clinical primary care settings, to predict risk of relapse in adults with remitted depression.9

Methods

The methods align with PROGnosis RESearch Strategy recommendations,7 and the study is reported according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis guidance.10 A patient advisory group (PAG) contributed to this study, including selecting predictors, definition of outcome, target patient population and clinical application. The study was registered prospectively (ClinicalTrials.gov: NCT04666662). Further methodological details are available in our protocol paper.9

Source of data, participants and setting

We formed the ‘PREDICTR’ dataset from combined individual participant data (IPD) from UK primary care-based studies,9 identified through a literature search and review of the National Institute for Health and Care Research trials registry. Authors were asked to share data if studies included adult patients (18 years and over) with depression, measured using the Patient Health Questionnaire (PHQ-9) at a minimum of three time-points (to identify depression, remission, relapse/no relapse). We excluded studies in patient groups with significant psychiatric comorbidity and feasibility studies. The PREDICTR dataset is derived from all arms (control and intervention) of six randomised controlled trials (RCTs) of primary care-based interventions for depression (CADET, CASPER Plus, COBRA, Healthlines Depression, REEACT and REEACT-2) and one observational cohort study (the West Yorkshire Low Intensity Outcome Watch (WYLOW) study) (online supplemental table 1.1 and online supplemental figure 1.1).

Starting point (remission)

Participants were in remission at the point of prediction. Participants must have had case-level depression at baseline (PHQ-9 Score of 10 or more11) and at 4 months after trial baseline: (1) a post-treatment PHQ-9 Score below the established cut-off of 10 (consistent with clinical recovery3 12) and (2) an improvement of ≥5 points on the PHQ-9 since depression diagnosis (which aligns with an established reliable change index to identify those with ‘reliable improvement’13).

End point/outcome (relapse)

We coded participants as relapsed if they fulfilled the following criteria within 6–8 months post remission: (1) PHQ-9 Score above the diagnostic cut-off (10 or more) and (2) ≥5 points greater than their symptom score at the time of remission. This is consistent with established criteria for reliable and clinically significant deterioration.13

Predictors

We identified predictors a priori, following a literature review and consensus within the multidisciplinary research team and the PAG. We included predictors that would currently be routinely available in primary care settings at the intended moment of prediction.

Predictors in primary analysis

The following variables have robust evidence for their role as relapse predictors5 14 and were included in the model:

  1. Residual depressive symptoms (PHQ-9 Score at remission (0–9); continuous variable).

  2. Previous episodes of depression (dichotomous predictor (0=no previous episodes, 1=one or more previous episodes.

  3. Comorbid anxiety (measured using the Generalized Anxiety Disorder Assessment (GAD-7)15 in six of the seven studies and the Clinic Interview Schedule—Revised (CIS-R)16 in REEACT (see online supplemental table 1.2). These measures were combined to create a composite score (z-score), modelled as a continuous predictor.

  4. Baseline severity of depressive symptoms (continuous predictor; PHQ-9 Score at baseline (pre treatment)).

  5. RCT intervention: to control for the presence of interventions within the RCTs, we coded the presence or absence of an effective intervention (based on the results of the RCT) as a dichotomous variable. This predictor was intended to control for the intervention as part of the model building process only; when making predictions in real-world primary care, this predictor would always be set to zero (ie, no experimental intervention present).

Exploratory predictors

These less well-evidenced predictors of relapse were included as part of an exploratory secondary analysis: age, gender, ethnicity, relationship status, multimorbidity (two or more long-term physical or mental health conditions, excluding depression and comorbid anxiety), employment status (unemployment being those of working age who do not have a job and are actively seeking one) and current antidepressant use.5 14 17

Sample size

We used the pmsampsize package18 to calculate the required minimum sample size of 722, with 145 events (see protocol for details9); our actual sample size (n=1244; 261 events) exceeded this.

Statistical analysis methods

Data integrity checks (risk of bias)

Data were summarised and checked against publications for key features, such as number of participants (total and in each study arm), demographics, primary outcomes of the study, relapse rates and missing data. Validity of data values were checked on data inspection and irregularities clarified through communication with the original authors. Risk-of-bias assessment was undertaken using the participants, predictors and outcome domains of PROBAST.19

Missing data

Missing data were handled using multiple imputation with chained equations, under a missing at random assumption.20 Missing values were imputed based on the values of other predictors and the outcome, using linear models for continuous predictors (residual symptoms, severity, comorbid anxiety) and logistic models for binary predictors (number of previous episodes, RCT intervention, outcome (relapse/no relapse)). Imputation was undertaken for each study separately, preserving the clustering of participants within studies and any between-study heterogeneity in predictor effects and outcome prevalence. Each imputed dataset was then analysed separately using the same statistical methods, and the estimates were combined using Rubin’s rules, to produce an overall estimate and measure of uncertainty of each regression coefficient and model performance measures.10 We used 30 imputations, based on the maximum percentage of participants with one or more missing values across all individual studies.20

Model development (primary analysis)

Multilevel multivariable logistic regression models were built to model the relationship of the predictors with the binary outcome (relapse/no relapse), forcing in all predictors. Model parameters were estimated via unpenalized maximum likelihood estimation, and then penalised post estimation using a uniform shrinkage factor. The modelling preserved the clustering of participants within studies, with a random effect on the intercept, a random intervention effect and allowing for between-study correlation in these effects. We explored non-linear relationships in the continuous variables using multivariable fractional polynomials.7 Predictive performance statistics (C-statistic for discrimination, calibration slope and calibration-in-the-large) were calculated for the final developed model, first within each cluster in turn and then pooled using random effects meta-analysis to summarise the model’s performance across clusters with estimates of the pooled average and 95% CIs. Prediction intervals were constructed to estimate the model’s likely performance in new but similar settings.10 Calibration was also assessed visually by producing calibration plots with smooth calibration curves.

Model validation

The optimism of the developed model was measured using non-parametric bootstrapping. 100 bootstrap samples (each stratified by study) were produced from the original dataset. Within each bootstrap sample, the same modelling procedures were used as for model development. The model estimated using each bootstrap sample was then applied in both the same bootstrap sample (‘apparent performance’) and in the original (imputed) dataset (‘test performance’). Each time, average performance measures were calculated by pooling within-study statistics using meta-analysis, as above.

Optimism was calculated as the difference between apparent and test performance; this process was repeated 100 times and the average difference between the bootstrap (apparent) and test performance for each performance statistic provided the estimate of overall optimism for that statistic. Optimism-adjusted performance statistics (C-statistic, calibration slope and calibration-in-the-large) were subsequently derived. The uniform shrinkage factor (in this study, the optimism-adjusted calibration slope) was applied to all of the original estimated beta coefficients (to shrink them towards zero to address overfitting) to produce a penalised logistic regression model. Finally, the intercept was re-estimated (while constraining the penalised predictor effects at their shrunken value) to maintain overall calibration. This formed the final model.

Generalisability of the model and between-study heterogeneity in model performance was assessed using internal–external cross-validation (IECV).21

Sensitivity analysis

To understand the impact of including a composite measure of comorbid anxiety calculated from both GAD-7 and CIS-R, a sensitivity analysis was performed measuring predictive performance statistics when omitting REEACT and using only GAD-7 as the measure of comorbid anxiety, rather than a z-score.

Secondary (exploratory) analyses

Univariable analyses were performed to evaluate the unadjusted association between each predictor variable and the outcome variable. Where univariable analysis found statistically significant associations (after accounting for multiple significance testing), the model was refit using all of the original included predictors plus the additional exploratory predictor, to explore the impact on model predictive performance (using only studies in which the exploratory predictor was available).

Results

Univariable analysis

Table 2 presents the results from univariable multilevel models. Residual symptoms (OR: 1.13 (1.07–1.20)) and severity (OR: 1.07 (1.04–1.11)) were statistically significantly associated with relapse; number of previous episodes and comorbid anxiety were not.

Table 2

Univariable and multivariable associations between outcome and predictors (primary and secondary analysis)

Model development and apparent predictive performance

Table 2 presents the results of multivariable, multilevel logistic regression analysis for the primary analysis. The developed model, prior to shrinkage, had a pooled apparent performance of: C-statistic 0.62 (95% CI: 0.57 to 0.67), calibration slope of 0.95 (95% CI: 0.54 to 1.36) and calibration-in-the-large of 0.03 (95% CI: −0.49–0.54). See online supplemental material 3 for within-study performance statistics.

Internal validation, shrinkage and final equation

Optimism-adjusted performance statistics, after bootstrapping, were: C-statistic 0.60, calibration slope 0.81 and calibration-in-the-large 0.03. The final model (table 3) was produced by multiplying the original beta regression coefficients (from table 2) by 0.81 (the optimism-adjusted calibration slope) and re-estimating the intercept to ensure calibration-in-the-large.

Table 3

Summary of model’s predictive performance for primary, sensitivity and secondary analyses

Internal–external cross-validation (IECV)

Generalisability of the model was assessed using IECV (online supplemental material 4).21 Calibration plots were compared for each validation in each of the different studies (figure 1). These demonstrate inadequate calibration in most studies and significant heterogeneity in predictive performance across clusters. For example, WYLOW study shows severe miscalibration, with estimated risks generally too low, whereas in the COBRA study estimated risks are generally too high. In some studies, calibration was generally excellent (eg, Healthlines Depression).

Figure 1Figure 1
Figure 1

Calibration plots for internal–external cross-validation within each study.

Sensitivity analysis

We removed REEACT and repeated the modelling process on the remaining six studies, to assess the impact of using z-scores to model comorbid anxiety. This did not change the study conclusions (see online supplemental material 5 for analysis).

Secondary analysis

On univariable multilevel logistic analysis, relationship status was a highly statistically significant predictor (after adjusting the significance level to account for multiple significance testing using the Bonferroni correction). To further explore relationship status as a predictor of relapse, we repeated the model development procedures used in the primary analysis for the studies that included relationship status (CADET, COBRA, REEACT and REEACT-2). We conducted these analyses both with and without the relationship status variable to provide a direct comparison (see online supplemental material 6). Relationship status remained a statistically significant relapse predictor after adjusting for other prognostic factors (previous episodes, residual symptoms, severity and comorbid anxiety).

Discussion

We developed a model for predicting depression relapse in adults with remitted depression in primary care. Generally, the model had suboptimal predictive performance, with heterogeneous calibration across clusters on IECV and C-statistic below that required for acceptable discrimination. We would not recommend implementation in its current form, though calibration was promising in a subset of studies. Secondary analysis found a statistically significant association between relationship status and relapse.

Findings in the context of the literature

The performance of this model was similar to performance measures for previous relapse prediction models.8 Residual symptoms were associated with relapse, which is consistent with the existing literature.5 14 Residual symptoms are also associated with a more chronic depression course and poorer psychosocial functioning22 and, as such, are an established treatment target in depression. The pre-existing evidence for severity as a prognostic factor for relapse is more equivocal than for residual symptoms.5 Residual symptoms are more likely in people with more severe initial depressive illness,23 and so the presence of residual symptoms may be a mediator of the relationship between baseline severity and relapse.

The lack of association between previous episodes and relapse in our study is not consistent with the consensus view.5 14 A possible explanation is that previous episodes are most strongly associated with recurrence (which occurs over a longer time period than relapse) and therefore our follow-up of 6–8 months was not sufficient to detect this association.5 Comorbid anxiety is a recognised predictor of relapse5; in particular, higher anxiety levels at baseline have been found to predict a shorter time to relapse after treatment.24 In this study, we modelled anxiety at baseline (when depressed); it may be that an isolated measure of anxiety symptom severity at a single time-point is a crude measure and less important than knowing an individual’s history of comorbid anxiety.

Marital status (being single) is a risk factor for developing depression.17 A recent study also identified being single or no longer married as being associated with a worse prognosis (more depressive symptoms) at 3–4 months (but not beyond 3–4 months).25 While marital status is not an established predictor of relapse,5 17 our systematic review of prognostic models identified low-quality evidence of an association between relapse and marital status.8 A potential mechanism by which being in a relationship may be protective against relapse is through providing increased social support, although this is likely to be mediated by other factors (eg, relationship quality). The lack of association between relapse and other exploratory predictors (age, gender, ethnicity, employment status and multimorbidity) is consistent with the literature.5 14

The prognostic factor research cited here is based on longitudinal studies, which examine the predictive value of specific variables based on sample-level trends. As our systematic review8 and the current study demonstrate, combining these well-evidenced relapse predictors to produce individualised predictions of relapse risk remains challenging and calibration remains suboptimal for the purpose of personalised decision-making.7

Strengths and limitations

This study was conducted according to best practice recommendations for methodology and reporting, with a sufficient sample size to produce precise risk estimates.26 We preselected predictors with a robust, pre-existing evidence base, to mitigate the risk of overfitting associated with data-driven approaches for predictor selection.

While cohort studies and RCTs are recommended sources of data for prognostic model development,7 participants may differ from the general population in important ways and results should be interpreted with this in mind. For example, the majority of participants in this study were white, limiting our ability to explore ethnicity as a predictor. Furthermore, our IPD were drawn from a subsample of participants in the identified studies and therefore a potentially limited representation of the wider patient population. Some potentially useful predictors were not included (eg, neuroticism, childhood maltreatment and rumination5), as they were not coded for in our cohorts and are not routinely measured in general practice settings. There was a risk of selection bias given the way studies were selected for inclusion.

There was heterogeneity in: IPD study populations (eg, CASPER Plus included older adults), relapse prevalence, antidepressant use at baseline and settings and treatment dose. WYLOW followed-up patients after low-intensity cognitive behavioural therapy, whereas interventions in other studies were more intensive and delivered over a greater number of sessions. While interventions were controlled for in the analysis, this heterogeneity could explain some of the observed miscalibration. Finally, outcomes (remission and relapse) were defined according to PHQ-9 (less optimal than diagnostic interview) and over a time-period (6–8 months) necessitated by the IPD data collection points (although one that is aligned with established definitions of relapse5).

Implications for future research

The strong statistically significant association between relationship status and relapse in our study warrants further confirmatory prognostic factor research going forwards. Further research is needed to better understand whether other relapse predictors can be captured and recorded in an acceptable and valid way by primary care health professionals. For example, we know that routinely asking people about childhood maltreatment is not harmful,27 therefore this could feasibly form part of routine relapse risk assessment in primary care. There have been some efforts to develop clinically useful and valid brief instruments to measure rumination,28 which could be explored in a primary care setting. If robust evidence supported the clinical utility of measuring and documenting additional relapse predictors, health professionals might then adopt this as routine practice.

The existing prognostic factor research in this area is not conclusive.5 There is likely value in further exploratory prognostic factor research to examine the role of other variables (eg, those associated with depression onset and poor prognosis) as relapse predictors. Improved risk prediction may theoretically be possible by incorporating a wider range of predictors (eg, biomarkers, genetics and brain imaging), although such data is unlikely to be routinely or widely available in primary care settings. Researchers are encouraged to develop and implement a set of core predictors based on a standard set of measurements to be integrated in future studies. Better data linkage and systems integration across health and well-being services may also be beneficial. A prospective, naturalistic cohort study would allow for the inclusion of a wider range of participants and measurement of predefined predictor and outcome information. This would potentially allow for more useful predictive models to be developed than secondary analysis of pre-existing data, although it would be more costly and time-consuming.

In this study, we modelled the outcome of relapse as a binary outcome; it would be informative as part of future work to model the outcome on its continuous scale. We focused on outcome occurrence by 6–8 months, but other time-points may be of interest. As our IECV (which used the mean intercept and predictor–outcome associations to estimate performance) demonstrated, there was heterogeneity in the external performance and generalisability of the model was not guaranteed. An alternative approach to IECV that could be considered in future is local recalibration or intercept selection, where similarities in the outcome frequency or baseline characteristics (eg, mean age or proportion female) of a new population of interest is used to guide the intercept when applying the model in a different context.29 If the predictive performance of relapse prediction models can be improved in the future through recalibration or updating, clinical usefulness (using net benefit analysis) must be considered prior to implementation.

Clinical implications

Existing relapse risk prediction models are currently insufficiently accurate, and unlikely to be suitable to guide the provision of relapse prevention in primary care. There are different approaches to prevention: universal approaches, which target whole populations; selective approaches, which target higher-risk groups; and indicated approaches, directed at individuals. In the absence of sufficiently accurate relapse risk prediction tools, we argue that a universal approach to relapse prevention of depression in primary care is currently warranted. This is likely to require a systems approach to mitigating the risk and improving the management of relapse for all patients. This could mean targeting treatment at known prognostic factors (eg, focussing on reducing residual symptoms) or providing interventions during the acute phase of depression treatment that target mechanisms of relapse. Clinicians should ensure they consider relapse risk, discuss this with patients and prioritise relapse prevention planning where appropriate.30 Longer term, brief, inexpensive and scalable relapse prevention interventions are likely to be required for use in primary care.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants. The University of York’s Health Sciences Research Governance Committee confirmed that this study was exempt from full ethical approval, as it entailed the secondary analysis of anonymised data from studies that had already received ethical approval. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We would like to thank the patient advisory group without whose contributions this study would not have been possible: Greg Ball, Joanne Castleton, Penney Mayall, Gillian Payne, Sue Penn and Emma Williams. Further, we would like to thank Trevor Sheldon, Paul Tiffin and Joanne Reeve for their role as thesis advisory panel members.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles