Pengembangan Sistem Prediksi Harga Mobil Bekas Di Pasar India Menggunakan Algoritma XGBoost dan NLP

Authors

  • Rizky Eka Adinagoro Universitas Duta Bangsa Surakarta
  • Irfan Dwi Rangga Premana Universitas Duta Bangsa Surakarta
  • Rahadyan Bintang Pamungkas univer
  • Pradipta Aryasetya Universitas Duta Bangsa Surakarta
  • Dhian Joedhistiro univer

DOI:

https://doi.org/10.47701/w6511j19

Keywords:

Prediksi Harga, Mobil Bekas, pasar India, XGBoost, NLP, Machine Learning

Abstract

Perkembangan teknologi kecerdasan buatan telah membuka peluang baru dalam pengambilan keputusan berbasis data, termasuk dalam penentuan harga mobil bekas yang kompleks. Penelitian ini bertujuan mengembangkan sistem prediksi harga mobil bekas di pasar India menggunakan algoritma Extreme Gradient Boosting (XGBoost) yang dikombinasikan dengan Natural Language Processing (NLP) untuk menganalisis fitur tekstual detail kendaraan. Model dibangun menggunakan bahasa pemrograman Python dan diuji dalam lingkungan lokal. Hasil pengujian pada data yang belum pernah dilihat sebelumnya menunjukkan model terbaik yang dikembangkan berhasil mencapai tingkat akurasi dengan nilai R- squared (R2) sebesar 75.18%. Angka ini merepresentasikan kemampuan model dalam menjelaskan sebagian besar variasi harga. Secara praktis, rata-rata kesalahan absolut (Mean Absolute Error atau MAE) tercatat sebesar 87.071 Rupee India, yang membuktikan efektivitas pendekatan yang diusulkan. Sistem ini berpotensi untuk diintegrasikan ke dalam platform e- commerce otomotif sebagai fitur penentu harga otomatis.

References

[1] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu,

pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

[2] J. Brownlee, “XGBoost With Python: Gradient Boosted Trees with XGBoost and scikit-learn,” in Machine Learning Mastery, Machine Learning Mastery, 2016, p. 115.

[3] P. H. V. S. T. Sai et al., “Predicting Used Car Prices Employing Data Mining Techniques,” 2025, pp. 545–554.

[4] W. McKinney, “Data Structures for Statistical Computing in Python,” Proc. 9th Python Sci. Conf., no. December, pp. 56–61, 2010, doi: 10.25080/majora-92bf1922-00a.

[5] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi: 10.3390/info10040150.

[6] E. Gegic, B. Isakovic, D. Keco, Z. Masetic, and J. Kevric, “Car price prediction using machine learning techniques,” TEM J., vol. 8, no. 1, pp. 113–118, 2019, doi: 10.18421/TEM81-16.

[7] N. S. Bhatt, T. Nath Pandey, S. R. Reddy, B. Jayasurya, B. B. Dash, and S. Shekhar Patra, “An Emperical Analysis of Machine Learning Algorithms for Used Car Price Prediction System,” 2023 Glob. Conf. Inf. Technol. Commun. GCITC 2023, no. April, pp. 1–5, 2023, doi: 10.1109/GCITC60406.2023.10426270.

[8] A. Nigam, A. Dhruv, and F. J. Josephin S, “Text Pre-Processing and Feature Extraction Using Nlp,” www.irjmets.com @International Res. J. Mod. Eng., no. 06, pp. 1550–1554, 1550, [Online].

Available: www.irjmets.com.

[9] E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21,

2021, doi: 10.3390/informatics8040079.

[10] N. Desai and A. Naik, “Predictive Analytics for Used Car Pricing Using R and Regression Methods,” EPH-International J. Educ. Res., vol. 9, no. 01, pp. 54–58, 2025, doi: 10.53555/ephijer.v9i1.155.

[11] S. S. G. S. N. Totakura and H. Kosuru, “Comparison of Supervised Learning Models for predicting prices of Used Cars,” Bachelor Thesis Comput. Sci., no. October, 2021, [Online]. Available: www.bth.se.

[12] M. Arif and M. Faisal, “Penerapan Model Regresi Linear Untuk Estimasi Mobil Bekas Menggunakan Bahasa Python,” Euler J. Ilm. Mat. Sains dan Teknol., vol. 11, no. 2, pp. 182–191, 2023, doi: 10.37905/euler.v11i2.20698.

[13] J. Kaliappan, K. Srinivasan, S. Mian Qaisar, K. Sundararajan, C. Y. Chang, and C. Suganthan, “Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate,” Front. Public Heal., vol. 9, no. September, pp. 1–12, 2021, doi: 10.3389/fpubh.2021.729795.

[14] H. Oukhouya and K. El Himdi, “Comparing Machine Learning Methods—SVR, XGBoost, LSTM, and MLP— For Forecasting the Moroccan Stock Market,” p. 39, 2023, doi: 10.3390/iocma2023-14409.

[15] R. Shad, K. Potter, and A. Gracias, “Natural Language Processing (NLP) for Sentiment Analysis: A Comparative Study of Machine Learning Algorithms,” Int. J. Artif. Intell. Mach. Learn., vol. 5, no. 1, pp. 58–69, 2025, doi: 10.51483/ijaiml.5.1.2025.58-69.

[16] W. Pannakkong, K. Thiwa-Anont, K. Singthong, P. Parthanadee, and J. Buddhakulsomsiri, “Hyperparameter Tuning of Machine Learning Algorithms Using Response Surface Methodology: A Case Study of ANN, SVM, and DBN,” Math. Probl. Eng., vol. 2022, 2022, doi: 10.1155/2022/8513719.

[17] S. Bergmann and S. Feuerriegel, “Machine learning for predicting used car resale prices using granular vehicle equipment information,” Expert Syst. Appl., vol. 263, p. 125640, Mar. 2025, doi: 10.1016/j.eswa.2024.125640.

[18] U. Bose, R. Nawkhare, and N. Sharma, “Driven by Data: Analyzing Price & Trends in the Used Car Market,” Int. J. Multidiscip. Res., vol. 7, no. 3, pp. 1–11, 2025, doi: 10.36948/ijfmr.2025.v07i03.46706.

[19] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

Downloads

Published

2025-07-26