Natural Language Processing for Clinical Entity Extraction from Medical Reports

Authors

  • Preeti Sandhu
  • Kiran Das
  • Nidhi Yadav
  • Mahesh Choudhary

DOI:

https://doi.org/10.5281/ijurd.v1i2.56

Keywords:

Natural Language Processing, Clinical Text, BioBERT, Named Entity Recognition, Medical Reports, Text Mining

Abstract

Unstructured medical reports contain valuable clinical information that is difficult to process manually. This study proposes a transformer-based NLP framework using BioBERT for extracting clinical entities. The system identifies diseases, medications, procedures, and anatomical terms with high precision. Evaluation on benchmark datasets shows an F1-score of 0.91. The extracted structured data improves downstream applications such as clinical decision support and automated summarization. The approach significantly reduces manual workload and enhances healthcare analytics capabilities.

Author Biographies

Preeti Sandhu

Artificial Intelligence and Machine Learning, Ajay Kumar Garg Engineering College, Ghaziabad

Kiran Das

Biomedical Engineering, Bhagat Phool Singh Mahila Vishwavidyalaya, Khanpur Kalan

Nidhi Yadav

Electronics and Communication Engineering, Bhagat Phool Singh Mahila Vishwavidyalaya, Khanpur Kalan

Mahesh Choudhary

Biomedical Engineering, Northern India Engineering College, Delhi

References

Aman, & Chhillar, R. S. (2021). Analyzing predictive algorithms in data mining for cardiovascular disease using WEKA tool. International Journal of Advanced Computer Science and Applications, 12(8), 144–150.

Aman, & Chhillar, R. S. (2022). Analyzing three predictive algorithms for diabetes mellitus against the Pima Indians dataset. ECS Transactions, 107(1), 2697.

Aman, & Chhillar, R. S. (2023). Optimized stacking ensemble for early-stage diabetes mellitus prediction. International Journal of Electrical and Computer Engineering, 13(6).

Aman, & Chhillar, R. S. (2024). A stacking-based hybrid model with random forest as meta-learner for diabetes mellitus prediction. International Journal of Machine Learning, 14(2), 54–58.

Aman, Chhillar, R. S., & Chhillar, U. (2023). Disease prediction in healthcare: An ensemble learning perspective.

Aman, Chhillar, R. S., & Chhillar, U. (2024). Machine learning in the battle against COVID-19: Predictive models and future directions. Future Computing Technologies for Sustainable Development (NCFCTSD-24).

Aman, Chhillar, R. S., & Chhillar, U. (2025). Machine learning and chronic kidney disease: Towards early prediction and diagnosis. Emerging Trends in Engineering, Commerce, Management and Hospitality Management in the Digital Age for a Sustainable Future.

Darolia, A., Chhillar, R. S., Alhussein, M., Dalal, S., Aurangzeb, K., & Lilhore, U. K. (2024). Enhanced cardiovascular disease prediction through self-improved Aquila optimized feature selection in quantum neural network and LSTM model. Frontiers in Medicine, 11, 1414637.

Aman, C. R. (2020). Disease predictive models for healthcare by using data mining techniques: State of the art. SSRG International Journal of Engineering Trends and Technology, 68(10). Available: https://www.researchgate.net/profile/Aman-Darolia/publication/345397957_Disease_Predictive_Models_for_Healthcare_by_using_Data_Mining_Techniques_State_of_the_Art/links/63b599fa03aad5368e64aa42/Disease-Predictive-Models-for-Healthcare-by-using-Data-Mining-Techniques-State-of-the-Art.pdf

Lample, G., Ballesteros, M., Subramanian, S., et al. (2016). Neural architectures for named entity recognition. Proceedings of NAACL.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT.

Lee, J., Yoon, W., Kim, S., et al. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240.

Uzuner, O., South, B. R., Shen, S., et al. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.

Published

2025-10-27

How to Cite

Sandhu, P., Das, K., Yadav, N., & Choudhary, M. (2025). Natural Language Processing for Clinical Entity Extraction from Medical Reports. International Journal of Unified Research & Development (IJURD), 1(2). https://doi.org/10.5281/ijurd.v1i2.56

Similar Articles

<< < 1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.