Building a machine learning model for predicting fraudulent transactions
A.F. Konstantinov, L.P. Dyakonova
Upload the full text
Abstract: The article presents development of a machine learning model for predicting fraudulent transactions using transactional data from a bank. It discusses the features of encoding categorical variables related to the presence of time in the transactional data to avoid information leakage. Additionally, experiments were conducted on the application of bagging and the creation of additional variables based on their contribution to the final prediction using Shapley values. The quality metrics of the machine learning model are examined and analyzed.
Keywords: fraudulent transactions, catboost, encoding categorical variables, catboost_encoder, target_encoder, bagging, variables creation, Shapley values
For citation. Konstantinov A.F., Dyakonova L.P. Building a machine learning model for predicting fraudulent transactions. News of the Kabardino-Balkarian Scientific Center of RAS. 2025. Vol. 27. No. 2. Pp. 11–22. DOI: 10.35330/1991-6639-2025-27-2-11-22
References
- Mashrur A., Luo W., Zaidi N.A., Robles-Kelly A. Machine Learning for Financial Risk Management: A Survey. IEEE Access. 2020. Vol. 8. Pp. 203203–203223. DOI: 10.1109/ACCESS.2020.3036322
- Awosika T., Shukla R.M., Pranggono B. Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection. IEEE Access. 2024. Vol. 12. Pp. 64551–64560. DOI: 10.1109/ACCESS.2024.3394528
- McMahan B., Moore E., Ramage D. et al. Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics. 2017. Vol. 54. Pp. 1273–1282. DOI: 10.48550/arXiv.1602.05629
- Ali A.A., Khedr A.M., El-Bannany M., Kanakkayil S. A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique. Applied Sciences. 2023. Vol. 13. No. 4. P. 2272. DOI: 10.3390/app13042272
- He K., Yang Q., Ji L. et al. Financial Time Series Forecasting with the Deep Learning Ensemble Model. Mathematics. 2023. Vol. 11. No. 4. P. 1054. DOI: 10.3390/math11041054
- Prokhorenkova L., Gusev G., Vorobev A. et al. CatBoost: unbiased boosting with categorical features. NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018. Pp. 6639–6649. DOI: 0.48550/arXiv.1706.09516
- Micci-Barreca D. A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. ACM SIGKDD Explorations Newsletter. Vol. 3. No. 1. Pp. 27–32. DOI: 10.1145/507533.507538
- Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. Workshop on ML Systems at NIPS. 2017. DOI: 10.48550/arXiv.1810.11363
- Breiman L. Bagging predictors. Machine Learning. 1996. Vol. 24. No. 2. Pp. 123–140. DOI: 10.1007/BF00058655
- Official website Catboost. Common parameters. Точка доступа: https://catboost.ai/en/docs/ references/training-parameters/common#bagging_temperature (дата обращения: 10 января 2025)
- Shapley L. Notes on the n-person game, ii: the value of an n-person game. 1951.
- Official website SHAP library. Точка доступа: https://shap.readthedocs.io/en/latest/ example_notebooks/tabular_examples/tree_based_models/Catboost%20tutorial.html (дата обращения: 10 января 2025)
- Brier Glenn W. Verification of forecasts expressed in terms of probability. Monthly Weather Review. 1950. Vol. 78. No. 1. Pp. 1–3. Bibcode:1950MWRv…78….1B. DOI: 10.1175/1520-0493(1950)078 <0001:VOFEIT> 2.0.CO
- Akiba T., Sano S., Yanase T. et al. Optuna: A Next-generation Hyperparameter Optimization Framework. KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Pp. 2623–2631. DOI: 10.1145/3292500.3330701
Information about the authors
Alexey F. Konstantinov, Post-graduate Student, Department of Informatics, Plekhanov Russian University of Economics;
115054, Russia, Moscow, 36 Stremyannyy lane;
konstantinovaf@gmail.com, ORCID: https://orcid.org/0009-0000-9591-3301, SPIN-code: 3088-3121
Lyudmila P. Dyakonova, Candidate of Physical and Mathematical Sciences, Associate Professor, Department of Informatics, Plekhanov Russian University of Economics;
115054, Russia, Moscow, 36 Stremyannyy lane;
Dyakonova.LP@rea.ru, ORCID: https://orcid.org/0000-0001-5229-8070,SPIN-code: 2513-8831