Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios
Abstract
The growing complexity of financial fraud has surpassed the standard approach of data engineering for detection. With the evolution and diversification of illegal behaviors, the problem of obtaining representative and labeled data for training fraud detection models is becoming more severe. Generative Artificial Intelligence (Generative AI), such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), provides a potential avenue for generating realistic synthetic data that is representative of real fraudulent behavior. This paper focuses on how generative AI can enhance the data engineering process by creating synthetic fraud-related use cases to address data scarcity, class imbalance, and privacy considerations.
We begin by describing the specific nature of data engineering for fraud detection pipelines, as well as the shortcomings in existing methods for obtaining data. We subsequently present a literature review of generative models, their mathematical underpinnings, and well-established applications from various domains. The Methodology section outlines a framework for incorporating GAN into real-world data pipelines, which combines labeled synthetic fraud, integration with modern ETL architecture, and detailed feature engineering.
Empirical results on synthesized datasets from the financial domain demonstrate that the proposed method exhibits better model robustness, with reduced false favorable rates. Moreover, the paper's other main thrust addresses the ethical, regulatory, and performance-related issues around creating synthetic data. Our results confirm the hypothesis that generative AI has the potential to significantly improve the completeness and diversity of training datasets -- in particular for rare fraud scenarios (generally adhering to data privacy requirements).
This study highlights the radical impact that generative AI can have in contemporary DE, demonstrating it as a pivotal technology for building fraud detection systems that are more robust to real-world adversarial attacks. The paper's best practices guide real-world applications, discussing significant trade-offs and practical considerations of scalable deployments.
How to Cite This Article
Ravi Kiran Alluri (2025). Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios . Journal of Frontiers in Multidisciplinary Research (JFMR), 6(2), 171-176. DOI: https://doi.org/10.54660/.JFMR.2025.6.2.171-176