COMPUTATIONAL MATHEMATICAL MODELING FOR LUNG CANCER DISEASE PREDICTION USING MULTIPLE LINEAR REGRESSION

Authors

  • Anisatul Farida Universitas Duta Bangsa Surakarta
  • Ratna Puspita Indah Universitas Duta Bangsa Surakarta
  • Dwi Hartanti Universitas Duta Bangsa Surakarta
  • Adão Manuel da Silva Instituto Superior Crista

DOI:

https://doi.org/10.47701/twy84j24

Keywords:

Lung cancer, cLung cancer, multiple linear regression, passive smoking, genetic predisposition

Abstract

Lung cancer remains one of the most prevalent and deadly types of cancer worldwide, especially in developing countries with high smoking rates and limited early detection resources. This study aims to develop a computational mathematical model for predicting lung cancer risk using multiple linear regression. The model focuses on two primary factors: genetic predisposition and exposure to passive smoking, which are among the most significant determinants of lung cancer. An observational analytic design was employed using secondary data obtained from cancer registries, hospital records, and national health survey datasets. Computational data preprocessing techniques, including data cleaning, missing value imputation, and variable normalization, were applied to ensure model accuracy and reliability. The regression analysis revealed that both genetic predisposition and passive smoking significantly increased the lung cancer risk score, with regression coefficients of 0.24 and 0.48, respectively. The findings indicate that passive smoking has a greater impact on lung cancer risk compared to genetic factors. The final model demonstrated a coefficient of determination (R²) 0.72 indicates that 72% of the variation in risk can be explained by the combination of these two variables. This finding suggests that environmental factors have a more dominant influence than lifestyle factors on increasing lung cancer risk. This computational model provides a practical tool for early detection and risk stratification, supporting public health policies aimed at tobacco control and targeted screening programs to reduce lung cancer incidence and mortality

References

AL-KHAİAT, S. S. J., NOORİ, M. Z., & CENGİZ, M. A. (2022). Application of the Regression Analysis in Python, SPSS and Microsoft Excel Programs. Journal of Current Research on Educational Studies, 12(2), 27–46. doi: 10.26579/jocures.12.2.3

Bade, B. C., & Cruz, C. S. Dela. (2020). Lung cancer 2020: epidemiology, etiology, and prevention. Clinics in Chest Medicine, 41(1), 1–24. https://doi.org/10.1016/j.ccm.2019.10.001

Berger, A., & Kiefer, M. (2021). Comparison of different response time outlier exclusion methods: A simulation study. Frontiers in Psychology, 12, 675558. https://doi.org/10.3389/fpsyg.2021.675558

Cheng, E. S., Weber, M., Steinberg, J., & Yu, X. Q. (2021). Lung cancer risk in never-smokers: An overview of environmental and genetic factors. Chinese Journal of Cancer Research, 33(5), 548. https://doi.org/10.21147/j.issn.1000-9604.2021.05.02

Cruz-Cárdenas, J., Zabelina, E., Guadalupe-Lanas, J., Palacio-Fierro, A., & Ramos-Galarza, C. (2021). COVID-19, consumer behavior, technology, and society: A literature review and bibliometric analysis. Technological Forecasting and Social Change, 173, 121179. https://doi.org/10.1016/j.techfore.2021.121179

Dahia, S. S., Konduru, L., & Barreto, S. G. (2024). A Systematic Review of Cancer Burden Forecasting Models: Evaluating Efficacy for Long-Term Predictions Using Annual Data. https://doi.org/10.21203/rs.3.rs-4194176/v1

El Morr, C., Jammal, M., Ali-Hassan, H., & El-Hallak, W. (2022). Data preprocessing. In Machine learning for practical decision making: a multidisciplinary perspective with applications from healthcare, engineering and business analytics (pp. 117–163). Springer. https://doi.org/10.1007/978-3-031-16990-8_4

Etemadi, S., & Khashei, M. (2021). Etemadi multiple linear regression. Measurement, 186, 110080. https://doi.org/10.1016/j.measurement.2021.110080

Farida, A., Atina, V., & Suwandi, D. (2025). Mathematical Modeling and Integration of Machine Learning-Based Prediction System on E-Learning Platform to Improve Students’ Academic Performance. JTAM (Jurnal Teori Dan Aplikasi Matematika), 9(3), 829–839.

Habibi, L. N., Matsui, T., & Tanaka, T. S. T. (2024). Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing. Journal of Agriculture and Food Research, 16, 101096. https://doi.org/10.1016/j.jafr.2024.101096

Hill, C., Du, L., Johnson, M., & McCullough, B. D. (2024). Comparing programming languages for data analytics: Accuracy of estimation in Python and R. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(3), e1531. https://doi.org/10.1002/widm.1531

Hussain, S., Mubeen, I., Ullah, N., Shah, S. S. U. D., Khan, B. A., Zahoor, M., Ullah, R., Khan, F. A., & Sultan, M. A. (2022). Modern diagnostic imaging technique applications and risk factors in the medical field: a review. BioMed Research International, 2022(1), 5164970. https://doi.org/10.1155/2022/5164970

Jiwnani, S., Penumadu, P., Ashok, A., & Pramesh, C. S. (2022). Lung cancer management in low and middle-income countries. Thoracic Surgery Clinics, 32(3), 383–395. https://doi.org/10.1016/j.thorsurg.2022.04.005

Kalinke, L., Thakrar, R., & Janes, S. M. (2021). The promises and challenges of early non‐small cell lung cancer detection: patient perceptions, low‐dose CT screening, bronchoscopy and biomarkers. Molecular Oncology, 15(10), 2544–2564. https://doi.org/10.1002/1878-0261.12864

Leiter, A., Veluswamy, R. R., & Wisnivesky, J. P. (2023). The global burden of lung cancer: current status and future trends. Nature Reviews Clinical Oncology, 20(9), 624–639. https://doi.org/10.1038/s41571-023-00798-3

Mitra, P., Chakraborty, D., Nayek, S., Dan, U., & Mondal, N. K. (2025). The assessment of health risk among biomass smoke exposed rural tribal women and its effect on blood platelet activities. Air Quality, Atmosphere & Health, 1–14. https://doi.org/10.63278/1320

Mondal, R. S., Bhuiyan, M. N. A., & Akter, L. (2024). Machine Learning for Chronic Disease Predictive Analysis for Early Intervention and Personalized Care. Applied IT & Engineering, 2(1), 1–11. https://doi.org/10.25163/engineering.2110301

Noel, C., Vanroelen, C., & Gadeyne, S. (2021). Qualitative research about public health risk perceptions on ambient air pollution. A review study. SSM-Population Health, 15, 100879. https://doi.org/10.1016/j.ssmph.2021.100879

Onuiri, E., Akwaronwu, B. G., & Umeaka, K. C. (2024). Environmental and genetic interaction models for predicting lung cancer risk using machine learning: A systematic review and meta-analysis. Asian Journal of Computer Science and Technology, 13(1), 45–58. https://doi.org/10.70112/ajcst-2024.13.1.4266

Osemeke, R. F., Igabari, J. N., & Christian, N. D. (2024). Detection and correction of violations of linear model assumptions by means of residuals. Journal of Science Innovation and Technology Research. https://africanscholarpub.com/ajsitr/article/view/139

Possenti, I., Romelli, M., Carreras, G., Biffi, A., Bagnardi, V., Specchia, C., Gallus, S., & Lugo, A. (2024). Association between second-hand smoke exposure and lung cancer risk in never-smokers: a systematic review and meta-analysis. European Respiratory Review, 33(174). https://doi.org/10.1183/16000617.0077-2024

Qureshi, R., Zou, B., Alam, T., Wu, J., Lee, V. H. F., & Yan, H. (2022). Computational methods for the analysis and prediction of egfr-mutated lung cancer drug resistance: Recent advances in drug design, challenges and future prospects. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1), 238–255. doi: 10.1109/TCBB.2022.3141697

Ramji, S. (2022). Study design: observational studies. Indian Pediatrics, 59(6), 493–498. https://doi.org/10.1007/s13312-022-2541-2

Roustaei, N. (2024). Application and interpretation of linear-regression analysis. Medical Hypothesis, Discovery and Innovation in Ophthalmology, 13(3), 151. doi: 10.51329/mehdiophthal1506

Weeden, C. E., Hill, W., Lim, E. L., Grönroos, E., & Swanton, C. (2023). Impact of risk factors on early cancer evolution. Cell, 186(8), 1541–1563. https://doi.org/10.1016/j.cell.2023.03.013

Weisburd, D., Wilson, D. B., Wooditch, A., & Britt, C. (2021). Multiple regression. In Advanced statistics in criminology and criminal justice (pp. 15–72). Springer. https://doi.org/10.1007/978-3-030-67738-1_2

Yaya, S., & Odusina, E. K. (2025). Association between household second-hand smoke and low birth weight in sub-Saharan Africa. Plos One, 20(8), e0330214. https://doi.org/10.1371/journal.pone.0330214

Zhou, J., Xu, Y., Liu, J., Feng, L., Yu, J., & Chen, D. (2024). Global burden of lung cancer in 2022 and projections to 2050: Incidence and mortality estimates from GLOBOCAN. Cancer Epidemiology, 93, 102693. https://doi.org/10.1016/j.canep.2024.102693

Downloads

Published

2025-09-25