The entire repository further helps us to implement point clouds and create a custom object detecter which can be later scaled to detect objects specially for autonomous vehicle based applications.
Live demo here.
- General Info
- Technologies Used
- Features
- Screenshots
- Setup
- Dataset Utilized
- Usage
- Project Status
- Room for Improvement
- References
- Acknowledgements
- Contact
- The aim of this repository is to implement Frustum Pointnets on readily available 3d KITTI as well as Lyft datasets.
- Understand Pointnet Architecture and develop POC around research papers from main authors of Frustum Pointnet.
- Implement Frustum Pointnet for developing a 3d object detector.
This repository is code release for CVPR 2018 paper (arXiv report here). In this work, study of 3D object detection from RGB-D data was performed. The authors propose a novel detection pipeline that combines both mature 2D object detectors and the state-of-the-art 3D deep learning techniques. In this pipeline, first thing is to build object proposals with a 2D detector running on RGB images, where each 2D bounding box defines a 3D frustum region. Then based on 3D point clouds in those frustum regions, 3D instance segmentation is achieved and amodal 3D bounding box estimation, using PointNet/PointNet++ networks (see references at bottom).
By leveraging 2D object detectors, we greatly reduce 3D search space for object localization. The high resolution and rich texture information in images also enable high recalls for smaller objects like pedestrians or cyclists that are harder to localize by point clouds only. By adopting PointNet architectures, we are able to directly work on 3D point clouds, without the necessity to voxelize them to grids or to project them to image planes. Since the paper directly work on point clouds, they are able to fully respect and exploit the 3D geometry -- one example is the series of coordinate normalizations that authors apply, which help canocalizes the learning problem. They have further evaluated this approach on KITTI and SUNRGBD benchmarks, and this system significantly outperforms previous state of the art and is still in leading positions on current KITTI leaderboard.
- Tensorflow2
- Keras
List the ready features here:
- Preprocess KITTI Dataset - In Progress
- Train Frustum Poitnnet on KITTI Dataset - Not Yet Started
- Inference Script on KITTI Dataset - Not Yet Started
- Preprocess Lyft Dataset - In Progress
- Train Frustum Poitnnet on Lyft Dataset - Not Yet Started
- Inference Script on Lyft Dataset - Not Yet Started
- git clone https://github.com/ManashJKonwar/IP-Pointnet-ObjectDetection.git (Clone the repository)
- python3 -m venv IPPointnetODVenv (Create virtual environment from existing python3)
- activate the "IPPointnetODVenv" (Activating the virtual environment)
- pip install -r requirements.txt (Install all required python modules)
- KITTI 3d Object Detector
Credit goes to KITTI dataset
The 3D object detection benchmark consists of 7481 training images and 7518 test images as well as the corresponding point
clouds, comprising a total of 80.256 labeled objects
One of the best way to understand this data is to understand the sensor layout on the dataset accumulating vehicle which was used by KITTI
Both Training and Testing Dataset are broken into 4 parts-
- camera_2_image(.png) - This is a png image file captured by the camera
- camera_2_label(.txt) - This is a text file containing label and coordinate configuration for objects within the camera image having same naming convention under camera_2_image. Each row of this file is an object and it contains 15 values including the object tag (Car, Pedestrain, Cyclist, etc). The 2D bounding boxes are in terms of pixels in the camera image. The 3D bounding boxes are in 2 co-ordinates. The size (height, weight, and length) are in the object co-ordinate, and the center on the bounding box is in the camera co-ordinate.
Please see below table for details on these 15 parameters
Parameter Nos | Parameter Name | Parameter Description |
---|---|---|
1 | type | Describes the type of object: 'Car', 'Van', 'Truck', 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', 'Misc' or 'DontCare' |
1 | truncated | Float from 0 (non-truncated) to 1 (truncated), where truncated refers to the object leaving image boundaries |
1 | occluded | Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded 2 = largely occluded, 3 = unknown |
1 | alpha | Observation angle of object, ranging [-pi..pi] |
4 | bbox | 2D bounding box of object in the image (0-based index): contains left, top, right, bottom pixel coordinates |
3 | dimensions | 3D object dimensions: height, width, length (in meters) |
3 | location | 3D object location x,y,z in camera coordinates (in meters) |
1 | rotation_y | Rotation ry around Y-axis in camera coordinates [-pi..pi] |
1 | score | Only for results: Float, indicating confidence in detection, needed for p/r curves, higher is better. |
- calibration(.txt) - There are in total 7 parameters in this file.
P0, P1, P2, P3, R0_rect, Tr_velo_to_cam, Tr_imu_to_velo - velodyne point cloud (.bin)
- Lyft Dataset XXXXX
- python train_OD.py "kitti"
- python train_OD.py "lyft"
Step 1: Visualizing 2d box based object detection
Step 2: Point Cloud Projection on rgb image
Step 3: Lidar 3d Point Cloud Projection
Step 4: Lidar 3d Point Cloud box based object detection
Step 5: Lidar 3d frustum extraction for object
Project is: _in progress
Room for improvement:
- Develop frontend to visualize diffrent frames from datasets
- Provide support for CPUs, GPUs and TPUs as well
To do:
- Finish KIITI dataset operation.
[1] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation; Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas;
CVPR 2017; https://arxiv.org/abs/1612.00593.
[2] Frustum PointNets for 3D Object Detection from RGB-D Data; Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, Leonidas J. Guibas; https://arxiv.org/abs/1711.08488
Created by @ManashJKonwar - feel free to contact me!