Using Machine Learning to Identify Incorrect Value-Added Tax Reports
- Autor
- G. Van Bree
- Masterarbeit
- MT2407 (November, 2024)
- Betreut von
- Assoz. Univ.-Prof. Mag. Dr. Christoph Schütz
- Angeleitet von
- Simon Staudinger, MSc
- Ausgeführt an
- Universität Linz, Institut für Wirtschaftsinformatik - Data & Knowledge Engineering
- Ressourcen
- Kopie
Kurzfassung (Englisch)
Abstract
Many companies and organisations worldwide bear the legal responsibility to collect and report value-added tax (VAT) over its sales. Typically, tax reporting is done manually by accountants. The complex-ities in tax legislation make the reporting of VAT error-prone and time-consuming. As a consequence, it occurs that tax is occasionally reported falsely. A potential solution to improve the compliance with tax regulations is a tax compliance system, i.e., a system that can automatically identify tax statements whereof the tax conditions are potentially reported incorrectly. The objective of this thesis is to design and implement a data-driven tax compliance system using machine learning (ML) techniques, specifi-cally for VAT on both incoming and outgoing invoices. The purpose of the tax compliance system is to support accountants in identifying and so preventing incorrect VAT reporting. Using classification, the tax conditions of invoices can be predicted. The thesis is conducted in cooperation with BDO, a con-sulting company which is specialised in tax consultancy services, including VAT reporting. Factors that determine the VAT payable are identified by a VAT domain expert. Real-world enterprise resource planning (ERP) data is used to predict the combination of tax conditions, i.e., the tax code. A framework is presented, where tax code deviations are considered anomalies, that is, potentially incorrect tax codes. A tax code deviation is an invoice whereof the predicted tax code deviates from the tax code assigned by the accountant. The data is pre-processed and a classification model is built for outgoing invoices, which can predict tax codes accurately with an accuracy of 98.9%. The classifier has the ability to learn from an ongoing stream of data and from human-generated feedback. This framework is implemented in a system, wherein the classification model actively attempts to identify incorrect VAT reporting by predicting tax codes of a stream of invoice data. Tax code deviations are highlighted in the system, such that users can interactively verify the correct tax code to prevent incorrect VAT re-porting. The definite tax codes, as entered by the user, are continuously used to retrain the classifier. The design artifact is this ML-based system to increase VAT compliance.
Keywords: data mining, machine learning, tax compliance, value-added tax