Multiple treatment comparison in narcolepsy: a network meta-analysis-methodological concerns

Are the Applied Methods Appropriate for the Aim?

PRISMA guidelines recommend that authors specify study characteristics (e.g. participants, interventions, comparisons, outcomes, study designs, and follow-up length) [2]. In this NMA, however, baseline population data (e.g. narcolepsy with or without cataplexy, severity of daytime sleepiness, cataplexy, and comedication or prior narcolepsy-specific medication) were not reported to confirm that the RCTs were sufficiently comparable for meta-analysis. The studies differed with respect to baseline cataplexy symptoms, daytime sleepiness severity, and receipt of concomitant stimulants. Additional anticataplectic agents were used in the pitolisant study but not in sodium oxybate (SXB) studies [3, 4]. Thus, it is unclear whether the authors investigated the similarity of included RCTs beyond their selection criteria. Crossover and parallel results were included, albeit with corrections for carryover effect, but it was not clear how the authors’ methodology accounted for outcome differences introduced when combining these types of data.

The multiple comparison network analysis nodes are SXB 6 and 9 g/day, pitolisant 20 and 40 mg/day, modafinil (with armodafinil) 200–400 mg/day, and placebo. The authors explain the pooling of modafinil and armodafinil data, stating that these are “close compounds” with identical pharmacological properties, but do not justify the pooled doses.

Secondary efficacy outcomes for this NMA include scores on the Epworth Sleepiness Scale (ESS) and Maintenance of Wakefulness Test (MWT), and the weekly rate of cataplexy (WRC). The authors combined 20 and 40 min MWT results, despite evidence that 40 min results are more reliable and can better detect difficulty in sustaining wakefulness [5]. Disturbed nighttime sleep (DNS) was not included as an efficacy endpoint because it was rarely documented in RCTs; an analysis of available DNS data would be valuable, given the clinical significance of this symptom [6].

The primary efficacy outcome, the narcolepsy score (NS), is a composite endpoint derived by combining the ESS and MWT scores into an excessive daytime sleepiness (EDS) mean Z score, and then combining this score with the WRC mean Z score. Combining ESS and MWT scores may be problematic, as they measure different sleepiness parameters; correlations between ESS and MWT values are often weak [7]. Using Z scores assumes that outcomes are normally distributed; this may not be true of the outcomes investigated. Furthermore, this composite endpoint may underestimate the impact of one symptom or overestimate that of another. The authors’ reported reason for using a composite endpoint was to reduce type 1 multiplicity in the analysis. However, NMAs statistically aggregate published data to simultaneously estimate the comparative efficacy of multiple treatments and do not suffer from the possibility of multiplicity as standard analyses of sequential pairwise comparisons would. We feel that reporting aggregation of the data (after appropriate assessment of study and outcome similarity) instead of a unique composite endpoint would give a more straightforward and clinically relevant assessment of comparative effectiveness.

It is unclear whether timing of the data was adequately considered in model calculations. If data from different follow-up times were combined, this could create bias, since treatment efficacy may be increased over time and TEAEs reduced. For example, a post hoc analysis of 2 RCTs evaluating time to response with SXB for treatment of EDS and cataplexy illustrates the inaccuracy introduced by not specifying time points [8]. Several months were required for optimal efficacy on sleepiness and cataplexy using SXB. Concern about timing also pertains to combining the NS with the overall safety score (OSS), which is a function of the unreported duration of treatment exposure.

The authors’ analysis provides an incomplete profile of the different interventions’ relative safety. The main safety endpoint (OSS) is defined as the incidence of TEAEs during the exposure period, but duration of exposure data were not reported. Although the authors note that the SXB safety profile may be penalized due to use of untitrated SXB doses, this may be an understatement given that the highest TEAE rates were at the untitrated 9 g dose [4], which is not the recommended administration [9]. Because the overall medical benefit (BR ratio) is defined as the ratio of NS to OSS, we question the value of this measure given the limitations of the OSS data provided. We may recommend that future narcolepsy RCTs use the Narcolepsy Severity Scale [10] to quantify baseline narcolepsy symptoms so that global changes over time can be better assessed and an artificial composite score is unnecessary.

Interpretation of Head-to-Head Comparisons

The authors present results of comparisons relative to placebo in the body of the text. However, the conclusion that pitolisant 40 mg/day is “best” implies a value judgment, seemingly using the composite measures as the benchmark. However, this conclusion is misleading, obscuring the magnitude and uncertainty of the clinical impact. Although pitolisant 40 mg/day resulted in numerically higher values for many outcomes, the authors acknowledge that comparisons across treatments were nonsignificant. Moreover, neither our clinical experience nor published head-to-head data versus modafinil (where pitolisant was not noninferior to modafinil) [3] suggests that pitolisant 40 mg/day is the most effective intervention for either cataplexy or EDS [3].

Conclusion

In conclusion, lack of study and outcome similarity assessment and combination of distinct clinical outcomes into composite scores raise questions about the validity of the NMA output data and the conclusion that pitolisant has the highest BR ratio of the interventions studied. Comparative effectiveness assessment of these treatments is a worthwhile, necessary investigation. However, the results and conclusions would have more clinical and face validity if the untransformed outcomes had been evaluated and compared instead of using constructed composite measures.

Authors S J Snedecor, G Mayer, M J Thorpy, Y Dauvilliers
Journal Sleep
Therapeutic Areas Mental Health
Centers of Excellence Modeling & Meta-Analyses
Year 2019
Read full article