Skip to content

This repository presents a credit card fraud detection system utilizing a Logistic Regression model trained on a dataset of 284,807 transactions with significant class imbalance. After employing under-sampling for balance, the model achieves a test accuracy of around 93.40%, showcasing the effectiveness of ML in identifying fraudulent transactions.

Notifications You must be signed in to change notification settings

jigyasaG18/Credit-Card-Fraud-Detection-using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Credit Card Fraud Detection Using Machine Learning

Introduction

Background

Credit card fraud has become a prevalent issue in today's digital economy, where financial transactions are increasingly conducted online. The rapid increase in digital transactions has unfortunately been paralleled by a rise in fraudulent activities. This not only leads to significant financial losses for consumers and financial institutions but also undermines consumer trust in electronic payment systems. As a result, effective fraud detection systems are crucial for safeguarding financial transactions.

Dataset Overview

Source of Data

The dataset used in this project is a record of credit card transactions comprising 284,807 entries. It includes numerous features derived from financial transactions while anonymizing user information to protect privacy.

Features Description

Each transaction is characterized by several features:

  • Time: The time elapsed since the first transaction.
  • V1 to V28: Anonymized features generated through Principal Component Analysis (PCA).
  • Amount: The monetary value of the transaction.
  • Class: A binary label where 0 indicates a legitimate transaction and 1 denotes a fraudulent transaction.

Class Distribution

A notable characteristic of the dataset is the class imbalance:

  • Legitimate transactions (Class 0): 284,315
  • Fraudulent transactions (Class 1): 492

This imbalance can complicate model training, as traditional classification models may become biased towards the majority class.

Data Preprocessing

Exploratory Data Analysis

Initial exploratory data analysis is performed to obtain insights from the dataset, including:

  1. Statistical Summary: Describes the central tendency and variability of transaction amounts for both legitimate and fraudulent transactions.
  2. Class Distribution Visualization: Helps visualize the imbalance between the two classes.

Handling Class Imbalance

To address the class imbalance, an under-sampling technique is employed:

  • Under-Sampling: A random sample of 492 legitimate transactions is taken to match the quantity of fraudulent transactions, creating a balanced dataset that minimizes bias during model training.

Model Training and Evaluation

Algorithm Choice

The project employs a Logistic Regression model, a straightforward yet effective algorithm for binary classification tasks. Logistic Regression computes the probability of a transaction being fraudulent based on the linear combination of the input features.

Data Splitting

The balanced dataset is divided into training and testing sets, ensuring that both classes are proportionally represented in each subset through stratified sampling. This helps evaluate the model effectively.

Training the Model

The Logistic Regression model is then trained on the training data. The model learns the relationship between the features and their respective class labels.

Model Evaluation Metrics

The model's performance is assessed using accuracy as the primary metric:

  • Accuracy: This is the ratio of correctly predicted instances (both legitimate and fraudulent) to the total predictions made.

Results and Accuracy

The Logistic Regression model achieved:

  • Training Accuracy: Approximately 94.79% – suggesting that the model effectively recognized patterns in the training data.
  • Test Accuracy: Approximately 93.40% – indicating the model's capacity to generalize well to unseen data.

While these accuracy scores are commendable, it is essential to treat accuracy cautiously, especially in datasets with imbalanced classes. For this reason, additional metrics such as precision, recall, and F1 score should also be considered in a real-world setting to ensure that the model effectively identifies fraudulent transactions without generating excessive false positives.

Challenges and Considerations

Class Imbalance Issues

The skewed distribution of classes raises concerns about the model's effectiveness in detecting fraud. High accuracy in a heavily imbalanced dataset may be misleading; thus, it's vital to assess model performance with additional metrics that better reflect the model’s capability in distinguishing the minority class.

Data Privacy and Security

Given the sensitivity of credit card transaction data, the anonymization of features is fundamental. Continuous data protection measures should be in place, especially in production environments.

Conclusion

The project clearly demonstrates the potential of machine learning in revolutionizing fraud detection mechanisms in financial transactions. The Logistic Regression model performed adeptly, achieving high accuracy in both training and testing phases. However, as fraud techniques continue to evolve, ongoing refinement and testing of the model against new data sets will be essential for maintaining its accuracy and effectiveness. Future work could explore advanced algorithms such as Random Forests or Neural Networks to potentially boost predictive performance and reliability.

About

This repository presents a credit card fraud detection system utilizing a Logistic Regression model trained on a dataset of 284,807 transactions with significant class imbalance. After employing under-sampling for balance, the model achieves a test accuracy of around 93.40%, showcasing the effectiveness of ML in identifying fraudulent transactions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published