The Challenges of Modeling Rare Diseases and Making the Best Use of Limited Data

Written by Martijn Simons, Craig Bennison and Claire Simons on Thursday 24th August 2023

In general, statistical modeling is used to explore factors such as patient mortality over time, quality of life and cost-effectiveness of a treatment to help various stakeholders make decisions.

Modeling the long-term effects of treatments for rare diseases presents some unique challenges. Rare disease is defined in the United States as a health condition that affects fewer than 200,000 individuals. To start with, researchers work with smaller sample sizes and shorter follow-up periods compared with the future lifetime of patients, and that tends to yield limited data and insufficient information for making robust long-term extrapolations and assumptions.

Another issue is that there is often no “normal approach” on which to base any analysis. Rare-disease literature in the public domain is often sparse, without previous health technology assessment (HTA) and prior cost-effectiveness models for researchers to use as assumptions or to support their data observations. As a result, we often start from scratch, building up the disease pathway as we go. Additionally, traditional methods such as estimating transition probabilities to build up a cost-effectiveness model tend to be ineffective with smaller samples because patients carry too much individual weight as they move through their disease progression. This can, in turn, skew the results because of outliers and lead to uncertainty.

Finally, the significant heterogeneity among patients from baseline to disease progression can make it extremely difficult to ascertain whether clinical variances observed are due to unknown quantities, the disease itself, how the treatment works on different groups of patients or just by chance.

How to make best use of limited data

Given this challenging research environment, a key question is, how do we make best use of the limited data we have?

The key is to start with early modeling, using the often limited trial data that is available, backed up with assumptions.

It is important to involve experts (both clinicians and HTA experts) to assess and validate the plausibility of your assumptions early in the process. Further, at this stage, some value of information analysis to explore which parameters are most likely to sway decisions on cost-effectiveness can help. This, in turn, will facilitate identifying any knowledge gaps to generate insights and prioritize future research.

It can also be prudent to think through what you want to estimate and the components that need to be included in a model. Since rare diseases often manifest in various pheno types, one way might be to try to identify one key aspect that needs to be followed through over the future lifetime. Another way could be to ask if there is one component that can be modeled that is strongly correlated with the others to make the model take a more holistic view, covering a whole spectrum of the diseases.

One might also consider new approaches to the data. For example, when using a typical approach to model mortality for a more common oncology condition with a large trial and long follow-up period, it is possible to fit models directly to the overall survival as observed in the trial. In a rare oncology condition, where few such events are observed, an alternative is to go one layer down and model a more fundamental element that has more observations, such as repeated tumor growth. In that case, you can model the tumor growth, followed by the link between that tumor growth and survival, and leverage that lower level of data, which has less variety and is available in much higher quantities, to estimate the primary endpoint you are interested in. While this may entail a few extra steps, it is a way to leverage your available data and bring it to bear on the predictive power of the model.

The tendency of asking too much from the data — such as including too many variables that affect the outcome you are trying to estimate or studying the effect on subgroups — should be avoided. In starting with a smaller sample, asking extra questions will simply shrink that sample even further. With artificial intelligence (AI) already playing a role in personalized treatment for rare diseases, can it play a role in modeling for it? The answer is yes and no. The challenge is that AI brings most value in identifying relationships in huge amounts of data, such as genomic data, and huge amounts of data is exactly what is lacking in rare diseases. However, AI tools that have recently come to the forefront could help identify literature that previously would have been missed with standard search techniques.

In summary, whether it is oncologic, neurological, degenerative or genetic-related, modeling, a bespoke approach, along with some creativity, can help researchers overcome some of the data limitations that would impede their work from moving forward.

Working in partnership with our clients, we embrace our different perspectives and strengths to deliver fresh thinking and solutions that make a difference.

Together we can unlock possibilities.

For information about OPEN Health’s services and how we could support you, please get in touch.