CV Parsing with BEIT-based Layout Detection

Overview

Welcome to the CV Parsing project! This project focuses on CV parsing and leverages the BEIT model for layout detection. The model consists of three key components:

Backbone: We employ BEIT (BERT for Image Transformers) as the backbone for pretraining image transformers.
Neck: Our model incorporates a Feature Pyramid Network (FPN) for improved feature extraction.
Head: We use Faster R-CNN for object detection and recognition.

Model Pretraining

To achieve accurate layout detection, our backbone (BEIT) is pretrained on a self-supervised task based on masked image modeling. For detailed information on the pretraining process, please refer to the pretraining readme.

Prerequisites

Before you start working with this project, ensure you have the following prerequisites in place:

MMdetection 3.1.0: Make sure you have MMdetection version 3.1.0 installed.
BEIT Backbone Integration: Move the file layout detection/backbone/beit.py to mmdetection/mmdet/models/backbones within your MMdetection installation. Additionally, import BEIT in mmdetection/mmdet/models/backbones/__init__.py.

Fine-Tuning

To fine-tune the model, you can use the following command:

python tools/train.py <config> --resume-from <last_checkpoint>

Testing

To test the model, you can use the following command:

python tools/test.py <config> <checkpoint> --show-dir <directory_results>

Results

Here are some results on PublayNet :

User Interface

For user interface examples and inference, please refer to the Gradio UI notebook included in this repository. You'll find examples of how to interact with the model through the Gradio user interface.

Extraction of Small Information

If you need to extract small pieces of information, such as names and company details, we've provided a notebook where we've implemented Pix2Struct to perform this task.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Pretraining		Pretraining
layout detection		layout detection
Add_image_embeddings_to_LayoutLM.ipynb		Add_image_embeddings_to_LayoutLM.ipynb
Gradio UI.ipynb		Gradio UI.ipynb
README.md		README.md
pix2struct.ipynb		pix2struct.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CV Parsing with BEIT-based Layout Detection

Overview

Model Pretraining

Prerequisites

Fine-Tuning

Testing

Results

User Interface

Extraction of Small Information

About

Releases

Packages

Languages

farouk09/Layout-detection-with-BEiT

Folders and files

Latest commit

History

Repository files navigation

CV Parsing with BEIT-based Layout Detection

Overview

Model Pretraining

Prerequisites

Fine-Tuning

Testing

Results

User Interface

Extraction of Small Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages