• Who we are
    • About us
    • Our values
    • Environmental, social & governance
    • Therapeutic areas
  • What we do
    • Consulting (Acsel Health)
    • HEOR & market access
    • Scientific communications
    • Patient engagement
  • Insights
  • News & Events
  • Join us
    • Careers
    • Reasons to join
  • Contact us
  • Menu Menu

Publication Library / Publications

Leveraging social media data to study disease and treatment characteristics of Hodgkin’s lymphoma Using Natural Language Processing methods

Background

The use of social media platforms in health research is increasing, yet their application in studying rare diseases is limited. Hodgkin’s lymphoma (HL) is a rare malignancy with a high incidence in young adults. This study evaluates the feasibility of using social media data to study the disease and treatment characteristics of HL.

Methods

We utilized the X (formerly Twitter) API v2 developer portal to download posts (formerly tweets) from January 2010 to October 2022. Annotation guidelines were developed from literature and a manual review of limited posts was performed to identify the class and attributes (characteristics) of HL discussed on X, and create a gold standard dataset. This dataset was subsequently employed to train, test, and validate a Named Entity Recognition (NER) Natural Language Processing (NLP) application.

Results

After data preparation, 80,811 posts were collected: 500 for annotation guideline development, 2,000 for NLP application development, and the remaining 78,311 for deploying the application. We identified nine classes related to HL, such as HL classification, etiopathology, stages and progression, and treatment. The treatment class and HL stages and progression were the most frequently discussed, with 20,013 (25.56%) posts mentioning HL’s treatments and 17,177 (21.93%) mentioning HL stages and progression. The model exhibited robust performance, achieving 86% accuracy and an 87% F1 score. The etiopathology class demonstrated excellent performance, with 93% accuracy and a 95% F1 score.

Discussion

The NLP application displayed high efficacy in extracting and characterizing HL-related information from social media posts, as evidenced by the high F1 score. Nonetheless, the data presented limitations in distinguishing between patients, providers, and caregivers and in establishing the temporal relationships between classes and attributes. Further research is necessary to bridge these gaps.

Conclusion

Our study demonstrated potential of using social media as a valuable preliminary research source for understanding the characteristics of rare diseases such as Hodgkin’s Lymphoma.

Authors Z A Siddiqui, M Pathan, S Nduaguba, T LeMasters, V G Scott 1, U Sambamoorthi, J S Patel
Journal PLOS digital health
Therapeutic Area Oncology
Center of Excellence Real-world Evidence & Data Analytics
Year 2025
Read full article

Services

  • Consulting
  • HEOR & market access
  • Scientific communications
  • Creative communications
  • Patient engagement

Company

  • About Us
  • Our values
  • Environmental, social & governance
  • Our commitment to rare disease
  • Careers
  • Reasons to join
  • News & insights
  • Events
  • Locations & contact

Legal and Governance

  • Terms of use
  • Privacy notice
  • Cookie policy
  • IT security measures
  • Modern slavery statement
  • Disclosure UK – ABPI
  • Looking for OpenHealth Company?
  • Legal statements & documents
  • Global ethical business conduct code
  • Suppliers
footer-logo-mark
  • Twitter
  • Linkedin
  • Instagram
  • Facebook

© Copyright OPEN Health 2025. All rights reserved. OPEN Health is a registered trademark.

backtotop-arrow
Scroll to top