By Dr. Mahalakshmi Lakshmi Nathan
Doctor of Health Informatics, Department of Health Informatics Rutgers, The StateUniversity of New Jersey, School of Health Professions
Published: Feb 20, 2026 | pg. no: 1-33
Abstract: Healthcare data today is vast but fragmented, inconsistent, and frequently incomplete, limiting the effectiveness of artificial intelligence (AI) models built for clinical decision-making. The central problem addressed in this project is the persistent gap between the potential of AI in healthcare and the poor quality, semantic inconsistency, and lack of interoperability of the datasets on which such models depend. The overall objective was to design an adaptive framework capable of refining, standardizing, and harmonizing heterogeneous healthcare data to ensure reliability, interpretability, and compliance for predictive and diagnostic applications. The research evolved through five foundational studies and one integrated system development. The first study examined the inconsistencies in reporting alternative medicine treatments for diabetes and showed how lack of structured representation distorts meta-analytical outcomes. The second focused on diabetic readmission prediction using machine learning, revealing that model accuracy collapses when data is incomplete or biased. The third study on AI applications in orthopedics identified the dependence of clinical models on data annotation quality. The fourth and fifth studies explored medical imaging and multimodal AI integration, demonstrating that transformer-driven harmonization and feature alignment significantly improve interpretability and robustness across diverse modalities. Building on these findings, the final stage introduced the Multi-Stage AI Data Refinement Network (MADR-Net) an adaptive pipeline that combines deep generative and sequential models for missing data imputation, duplicate detection, outlier correction, data augmentation, and semantic standardization using FHIR and SNOMED mappings. Evaluations on large-scale healthcare admission datasets confirmed substantial gains, with macro-precision, recall, and F1-scores exceeding 0.96, alongside measurable reductions in class imbalance and structural bias. The study concludes that the primary barrier to dependable healthcare AI is data quality, not algorithmic sophistication. By embedding intelligence throughout the preprocessing pipeline, MADR-Net redefines data preparation as a foundational, auditable stage in healthcare analytics transforming fragmented clinical information into standardized,
View eBook