Natural Language Processing for Clinical Entity Extraction from Medical Reports
DOI:
https://doi.org/10.5281/ijurd.v1i2.56Keywords:
Natural Language Processing, Clinical Text, BioBERT, Named Entity Recognition, Medical Reports, Text MiningAbstract
Unstructured medical reports contain valuable clinical information that is difficult to process manually. This study proposes a transformer-based NLP framework using BioBERT for extracting clinical entities. The system identifies diseases, medications, procedures, and anatomical terms with high precision. Evaluation on benchmark datasets shows an F1-score of 0.91. The extracted structured data improves downstream applications such as clinical decision support and automated summarization. The approach significantly reduces manual workload and enhances healthcare analytics capabilities.
References
Aman, & Chhillar, R. S. (2021). Analyzing predictive algorithms in data mining for cardiovascular disease using WEKA tool. International Journal of Advanced Computer Science and Applications, 12(8), 144–150.
Aman, & Chhillar, R. S. (2022). Analyzing three predictive algorithms for diabetes mellitus against the Pima Indians dataset. ECS Transactions, 107(1), 2697.
Aman, & Chhillar, R. S. (2023). Optimized stacking ensemble for early-stage diabetes mellitus prediction. International Journal of Electrical and Computer Engineering, 13(6).
Aman, & Chhillar, R. S. (2024). A stacking-based hybrid model with random forest as meta-learner for diabetes mellitus prediction. International Journal of Machine Learning, 14(2), 54–58.
Aman, Chhillar, R. S., & Chhillar, U. (2023). Disease prediction in healthcare: An ensemble learning perspective.
Aman, Chhillar, R. S., & Chhillar, U. (2024). Machine learning in the battle against COVID-19: Predictive models and future directions. Future Computing Technologies for Sustainable Development (NCFCTSD-24).
Aman, Chhillar, R. S., & Chhillar, U. (2025). Machine learning and chronic kidney disease: Towards early prediction and diagnosis. Emerging Trends in Engineering, Commerce, Management and Hospitality Management in the Digital Age for a Sustainable Future.
Darolia, A., Chhillar, R. S., Alhussein, M., Dalal, S., Aurangzeb, K., & Lilhore, U. K. (2024). Enhanced cardiovascular disease prediction through self-improved Aquila optimized feature selection in quantum neural network and LSTM model. Frontiers in Medicine, 11, 1414637.
Aman, C. R. (2020). Disease predictive models for healthcare by using data mining techniques: State of the art. SSRG International Journal of Engineering Trends and Technology, 68(10). Available: https://www.researchgate.net/profile/Aman-Darolia/publication/345397957_Disease_Predictive_Models_for_Healthcare_by_using_Data_Mining_Techniques_State_of_the_Art/links/63b599fa03aad5368e64aa42/Disease-Predictive-Models-for-Healthcare-by-using-Data-Mining-Techniques-State-of-the-Art.pdf
Lample, G., Ballesteros, M., Subramanian, S., et al. (2016). Neural architectures for named entity recognition. Proceedings of NAACL.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT.
Lee, J., Yoon, W., Kim, S., et al. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240.
Uzuner, O., South, B. R., Shen, S., et al. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Preeti Sandhu, Kiran Das, Nidhi Yadav, Mahesh Choudhary

This work is licensed under a Creative Commons Attribution 4.0 International License.