Lung Cancer Disease Prediction Model with Multiple Linear Regression
DOI:
https://doi.org/10.47701/qpmszf57Keywords:
Lung cancer, multiple linear regression, genetic risk, passive smoking, risk predictionAbstract
Lung cancer remains one of the leading causes of cancer-related mortality worldwide, particularly in developing countries where smoking prevalence is high. This study aims to develop a predictive model for lung cancer risk using multiple linear regression based on two main factors: genetic predisposition and exposure to passive smoking. The research was conducted using an observational analytic design with secondary data derived from cancer registries, hospital medical records, and national health surveys. Data processing included cleaning, imputation of missing values, and standardization of exposure variables. The results of the regression analysis showed that both genetic risk and passive smoking significantly increased the lung cancer risk score, with coefficients of 0.24 and 0.48, respectively. Interestingly, passive smoking demonstrated a stronger impact compared to genetic predisposition, indicating its role as a more dominant determinant of lung cancer risk. The model explained 20.5% of the variation in risk, while the remaining was influenced by other factors such as air pollution, occupational exposure, and lifestyle. These findings highlight the importance of strengthening public health policies, particularly tobacco control in public spaces, and implementing targeted risk-based screening strategies. This predictive model offers a practical tool for early detection, efficient allocation of health resources, and effective cancer prevention strategies.
References
Alimkhodjayeva, L. T., Norbekova, M. H., Kurbankulov, U. M., Khusainova, M. J., & Otajonov, J. H. (2025). LUNG CANCER IN NON-SMOKERS: EMERGING RISK FACTORS AND CHALLENGES.
Arsunan, A. (2023). Karakteristik dan Luaran Penderita Kanker yang Terpapar Covid-19 di Rumah Sakit Pendidikan Unhas Tahun 2020-2022= Characteristics and Outcomes On Cancer Patients Exposed to Covid-19 In Unhas Hospital, 2020-2022. Universitas Hasanuddin.
Basuki, R., Musyahidah, M., Risnawati, A., & Sumiati, B. (2025). Kesehatan Lingkungan dan Kesehatan Kerja: Paparan, Risiko, dan Strategi Mitigasi. PT Mafy Media Literasi Indonesia.
Berger, A., & Kiefer, M. (2021). Comparison of different response time outlier exclusion methods: A simulation study. Frontiers in Psychology, 12, 675558.
Busari, M., & Bolanle, T. (2025). INTEGRATING SOCIAL DETERMINANTS INTO PREDICTIVE MODELS FOR US PUBLIC HEALTH FORECASTING.
Cheng, E. S., Weber, M., Steinberg, J., & Yu, X. Q. (2021). Lung cancer risk in never-smokers: An overview of environmental and genetic factors. Chinese Journal of Cancer Research, 33(5), 548.
Cruz-Cárdenas, J., Zabelina, E., Guadalupe-Lanas, J., Palacio-Fierro, A., & Ramos-Galarza, C. (2021). COVID-19, consumer behavior, technology, and society: A literature review and bibliometric analysis. Technological Forecasting and Social Change, 173, 121179.
Dehghani, M. H., Bashardoust, P., Nayeri, D., Ghalhari, M. R., Yazdi, N. B., Jajarmi, F., Karri, R. R., & Mubarak, N. M. (2024). A comprehensive review of the potential outcomes of exposure to tobacco smoke or secondhand smoke. Health Effects of Indoor Air Pollution, 167–189.
Dianita, E. M., Fuadiati, L. L., & Alizain, A. A. (2025). PENINGKATAN PENGETAHUAN TERHADAP DETEKSI DINI KANKER PAYUDARA PADA REMAJA. Penerbit Tahta Media.
Dritsas, E., & Trigka, M. (2022). Lung cancer risk prediction with machine learning models. Big Data and Cognitive Computing, 6(4), 139.
Edoh, N. L., Chigboh, V. M., Zouo, S. J. C., & Olamijuwon, J. (2024). Improving healthcare decision-making with predictive analytics: A conceptual approach to patient risk assessment and care optimization. International Journal of Scholarly Research in Medicine and Dentistry, 3(2), 1–10.
Essam, F., El, H., & Ali, S. R. H. (2022). A comparison of the pearson, spearman rank and kendall tau correlation coefficients using quantitative variables. Asian J. Probab. Stat, 20(3), 36–48.
Farida, A., Atina, V., & Suwandi, D. (2025). Mathematical Modeling and Integration of Machine Learning-Based Prediction System on E-Learning Platform to Improve Students’ Academic Performance. JTAM (Jurnal Teori Dan Aplikasi Matematika), 9(3), 829–839.
Hecht, C. A., Yeager, D. S., Dweck, C. S., & Murphy, M. C. (2021). Beliefs, affordances, and adolescent development: Lessons from a decade of growth mindset interventions. In Advances in child development and behavior (Vol. 61, pp. 169–197). Elsevier.
Ilyas, I. F., & Rekatsinas, T. (2022). Machine learning and data cleaning: Which serves the other? ACM Journal of Data and Information Quality (JDIQ), 14(3), 1–11.
James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). Linear regression. In An introduction to statistical learning: With applications in python (pp. 69–134). Springer.
Mitra, P., Chakraborty, D., Nayek, S., Dan, U., & Mondal, N. K. (2025). The assessment of health risk among biomass smoke exposed rural tribal women and its effect on blood platelet activities. Air Quality, Atmosphere & Health, 1–14.
Ramji, S. (2022). Study design: observational studies. Indian Pediatrics, 59(6), 493–498.
Sinaga, V. S. E. Z., Siahaan, P. G., Purba, N. R., Adilla, A., Harahap, A. P., Salya, V., Sitompul, J. G., & Tarigan, P. S. B. (2024). Analisis Persepsi Mahasiswa Terhadap Kawasan Tanpa Rokok Sebagai Manifestasi Hak Atas Kesehatan (Studi Kasus: Mahasiswa Pendidikan Biologi Kelas A). Innovative: Journal Of Social Science Research, 4(6), 6654–6668.
Wang, Q., & Liu, S. (2023). The effects and pathogenesis of PM2. 5 and its components on chronic obstructive pulmonary disease. International Journal of Chronic Obstructive Pulmonary Disease, 493–506.
Weeden, C. E., Hill, W., Lim, E. L., Grönroos, E., & Swanton, C. (2023). Impact of risk factors on early cancer evolution. Cell, 186(8), 1541–1563.