COMPUTATIONAL MATHEMATICAL MODELING FOR LUNG CANCER DISEASE PREDICTION USING MULTIPLE LINEAR REGRESSION
DOI:
https://doi.org/10.47701/twy84j24Keywords:
Lung cancer, cLung cancer, multiple linear regression, passive smoking, genetic predispositionAbstract
Lung cancer remains one of the most prevalent and deadly types of cancer worldwide, especially in developing countries with high smoking rates and limited early detection resources. This study aims to develop a computational mathematical model for predicting lung cancer risk using multiple linear regression. The model focuses on two primary factors: genetic predisposition and exposure to passive smoking, which are among the most significant determinants of lung cancer. An observational analytic design was employed using secondary data obtained from cancer registries, hospital records, and national health survey datasets. Computational data preprocessing techniques, including data cleaning, missing value imputation, and variable normalization, were applied to ensure model accuracy and reliability. The regression analysis revealed that both genetic predisposition and passive smoking significantly increased the lung cancer risk score, with regression coefficients of 0.24 and 0.48, respectively. The findings indicate that passive smoking has a greater impact on lung cancer risk compared to genetic factors. The final model demonstrated a coefficient of determination (R²) 0.72 indicates that 72% of the variation in risk can be explained by the combination of these two variables. This finding suggests that environmental factors have a more dominant influence than lifestyle factors on increasing lung cancer risk. This computational model provides a practical tool for early detection and risk stratification, supporting public health policies aimed at tobacco control and targeted screening programs to reduce lung cancer incidence and mortality
References
AL-KHAİAT, S. S. J., NOORİ, M. Z., & CENGİZ, M. A. (2022). Application of the Regression Analysis in Python, SPSS and Microsoft Excel Programs. Journal of Current Research on Educational Studies, 12(2), 27–46. doi: 10.26579/jocures.12.2.3
Bade, B. C., & Cruz, C. S. Dela. (2020). Lung cancer 2020: epidemiology, etiology, and prevention. Clinics in Chest Medicine, 41(1), 1–24. https://doi.org/10.1016/j.ccm.2019.10.001
Berger, A., & Kiefer, M. (2021). Comparison of different response time outlier exclusion methods: A simulation study. Frontiers in Psychology, 12, 675558. https://doi.org/10.3389/fpsyg.2021.675558
Cheng, E. S., Weber, M., Steinberg, J., & Yu, X. Q. (2021). Lung cancer risk in never-smokers: An overview of environmental and genetic factors. Chinese Journal of Cancer Research, 33(5), 548. https://doi.org/10.21147/j.issn.1000-9604.2021.05.02
Cruz-Cárdenas, J., Zabelina, E., Guadalupe-Lanas, J., Palacio-Fierro, A., & Ramos-Galarza, C. (2021). COVID-19, consumer behavior, technology, and society: A literature review and bibliometric analysis. Technological Forecasting and Social Change, 173, 121179. https://doi.org/10.1016/j.techfore.2021.121179
Dahia, S. S., Konduru, L., & Barreto, S. G. (2024). A Systematic Review of Cancer Burden Forecasting Models: Evaluating Efficacy for Long-Term Predictions Using Annual Data. https://doi.org/10.21203/rs.3.rs-4194176/v1
El Morr, C., Jammal, M., Ali-Hassan, H., & El-Hallak, W. (2022). Data preprocessing. In Machine learning for practical decision making: a multidisciplinary perspective with applications from healthcare, engineering and business analytics (pp. 117–163). Springer. https://doi.org/10.1007/978-3-031-16990-8_4
Etemadi, S., & Khashei, M. (2021). Etemadi multiple linear regression. Measurement, 186, 110080. https://doi.org/10.1016/j.measurement.2021.110080
Farida, A., Atina, V., & Suwandi, D. (2025). Mathematical Modeling and Integration of Machine Learning-Based Prediction System on E-Learning Platform to Improve Students’ Academic Performance. JTAM (Jurnal Teori Dan Aplikasi Matematika), 9(3), 829–839.
Habibi, L. N., Matsui, T., & Tanaka, T. S. T. (2024). Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing. Journal of Agriculture and Food Research, 16, 101096. https://doi.org/10.1016/j.jafr.2024.101096
Hill, C., Du, L., Johnson, M., & McCullough, B. D. (2024). Comparing programming languages for data analytics: Accuracy of estimation in Python and R. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(3), e1531. https://doi.org/10.1002/widm.1531
Hussain, S., Mubeen, I., Ullah, N., Shah, S. S. U. D., Khan, B. A., Zahoor, M., Ullah, R., Khan, F. A., & Sultan, M. A. (2022). Modern diagnostic imaging technique applications and risk factors in the medical field: a review. BioMed Research International, 2022(1), 5164970. https://doi.org/10.1155/2022/5164970
Jiwnani, S., Penumadu, P., Ashok, A., & Pramesh, C. S. (2022). Lung cancer management in low and middle-income countries. Thoracic Surgery Clinics, 32(3), 383–395. https://doi.org/10.1016/j.thorsurg.2022.04.005
Kalinke, L., Thakrar, R., & Janes, S. M. (2021). The promises and challenges of early non‐small cell lung cancer detection: patient perceptions, low‐dose CT screening, bronchoscopy and biomarkers. Molecular Oncology, 15(10), 2544–2564. https://doi.org/10.1002/1878-0261.12864
Leiter, A., Veluswamy, R. R., & Wisnivesky, J. P. (2023). The global burden of lung cancer: current status and future trends. Nature Reviews Clinical Oncology, 20(9), 624–639. https://doi.org/10.1038/s41571-023-00798-3
Mitra, P., Chakraborty, D., Nayek, S., Dan, U., & Mondal, N. K. (2025). The assessment of health risk among biomass smoke exposed rural tribal women and its effect on blood platelet activities. Air Quality, Atmosphere & Health, 1–14. https://doi.org/10.63278/1320
Mondal, R. S., Bhuiyan, M. N. A., & Akter, L. (2024). Machine Learning for Chronic Disease Predictive Analysis for Early Intervention and Personalized Care. Applied IT & Engineering, 2(1), 1–11. https://doi.org/10.25163/engineering.2110301
Noel, C., Vanroelen, C., & Gadeyne, S. (2021). Qualitative research about public health risk perceptions on ambient air pollution. A review study. SSM-Population Health, 15, 100879. https://doi.org/10.1016/j.ssmph.2021.100879
Onuiri, E., Akwaronwu, B. G., & Umeaka, K. C. (2024). Environmental and genetic interaction models for predicting lung cancer risk using machine learning: A systematic review and meta-analysis. Asian Journal of Computer Science and Technology, 13(1), 45–58. https://doi.org/10.70112/ajcst-2024.13.1.4266
Osemeke, R. F., Igabari, J. N., & Christian, N. D. (2024). Detection and correction of violations of linear model assumptions by means of residuals. Journal of Science Innovation and Technology Research. https://africanscholarpub.com/ajsitr/article/view/139
Possenti, I., Romelli, M., Carreras, G., Biffi, A., Bagnardi, V., Specchia, C., Gallus, S., & Lugo, A. (2024). Association between second-hand smoke exposure and lung cancer risk in never-smokers: a systematic review and meta-analysis. European Respiratory Review, 33(174). https://doi.org/10.1183/16000617.0077-2024
Qureshi, R., Zou, B., Alam, T., Wu, J., Lee, V. H. F., & Yan, H. (2022). Computational methods for the analysis and prediction of egfr-mutated lung cancer drug resistance: Recent advances in drug design, challenges and future prospects. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1), 238–255. doi: 10.1109/TCBB.2022.3141697
Ramji, S. (2022). Study design: observational studies. Indian Pediatrics, 59(6), 493–498. https://doi.org/10.1007/s13312-022-2541-2
Roustaei, N. (2024). Application and interpretation of linear-regression analysis. Medical Hypothesis, Discovery and Innovation in Ophthalmology, 13(3), 151. doi: 10.51329/mehdiophthal1506
Weeden, C. E., Hill, W., Lim, E. L., Grönroos, E., & Swanton, C. (2023). Impact of risk factors on early cancer evolution. Cell, 186(8), 1541–1563. https://doi.org/10.1016/j.cell.2023.03.013
Weisburd, D., Wilson, D. B., Wooditch, A., & Britt, C. (2021). Multiple regression. In Advanced statistics in criminology and criminal justice (pp. 15–72). Springer. https://doi.org/10.1007/978-3-030-67738-1_2
Yaya, S., & Odusina, E. K. (2025). Association between household second-hand smoke and low birth weight in sub-Saharan Africa. Plos One, 20(8), e0330214. https://doi.org/10.1371/journal.pone.0330214
Zhou, J., Xu, Y., Liu, J., Feng, L., Yu, J., & Chen, D. (2024). Global burden of lung cancer in 2022 and projections to 2050: Incidence and mortality estimates from GLOBOCAN. Cancer Epidemiology, 93, 102693. https://doi.org/10.1016/j.canep.2024.102693