Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios

Ravi Kiran Alluri

doi:10.54660/.JFMR.2025.6.2.171-176

Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios

Author(s): Ravi Kiran Alluri

Published: 2025

Volume: 6 | Issue: 2 | Pages: 171-176

Subject: Engineering

Country: United States

DOI: https://doi.org/10.54660/.JFMR.2025.6.2.171-176

License: CC BY 4.0

Full Text (PDF)

Open Access - Free to Download

Download Full Article (PDF)

Abstract

The growing complexity of financial fraud has surpassed the standard approach of data engineering for detection. With the evolution and diversification of illegal behaviors, the problem of obtaining representative and labeled data for training fraud detection models is becoming more severe. Generative Artificial Intelligence (Generative AI), such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), provides a potential avenue for generating realistic synthetic data that is representative of real fraudulent behavior. This paper focuses on how generative AI can enhance the data engineering process by creating synthetic fraud-related use cases to address data scarcity, class imbalance, and privacy considerations.
We begin by describing the specific nature of data engineering for fraud detection pipelines, as well as the shortcomings in existing methods for obtaining data. We subsequently present a literature review of generative models, their mathematical underpinnings, and well-established applications from various domains. The Methodology section outlines a framework for incorporating GAN into real-world data pipelines, which combines labeled synthetic fraud, integration with modern ETL architecture, and detailed feature engineering.
Empirical results on synthesized datasets from the financial domain demonstrate that the proposed method exhibits better model robustness, with reduced false favorable rates. Moreover, the paper's other main thrust addresses the ethical, regulatory, and performance-related issues around creating synthetic data. Our results confirm the hypothesis that generative AI has the potential to significantly improve the completeness and diversity of training datasets -- in particular for rare fraud scenarios (generally adhering to data privacy requirements).
This study highlights the radical impact that generative AI can have in contemporary DE, demonstrating it as a pivotal technology for building fraud detection systems that are more robust to real-world adversarial attacks. The paper's best practices guide real-world applications, discussing significant trade-offs and practical considerations of scalable deployments.

How to Cite This Article

Ravi Kiran Alluri (2025). Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios . Journal of Frontiers in Multidisciplinary Research (JFMR), 6(2), 171-176. DOI: https://doi.org/10.54660/.JFMR.2025.6.2.171-176

Publication Information

Journal: Journal of Frontiers in Multidisciplinary Research (JFMR)

Publisher: Anfo Publication House

ISSN: 3050-9718 (Print), 3050-9726 (Online)

Frequency: Half Yearly

Language: English

Open Access: Yes - This article is distributed under the terms of the Creative Commons Attribution 4.0 International License

Journal of Frontiers in Multidisciplinary Research

Generative AI in Data Engineering: Use Cases for Synthetic Fraud Scenarios

Full Text (PDF)

Abstract

How to Cite This Article

Publication Information

Share This Article:

Company

Useful Links

Follow Us