Access patterns matter a lot when choosing the most appropriate library for video loading. In this repository, we implement different video loaders (see this list). We implement different access patterns, specifically, 1- loading the whole video, 2- loading random segments (clips) of the video, and 3- loading random frames.
We also contemplate other axis of variation, such as the sampling rate, different augmentations, as well as parameters related to these (cropping size, number of frames in a segment, etc.). All of these may influence which video loaders are more efficient. Specifically, we give options to center crop, resize, and normalize according to pre-defined mean and variances. While many other possible transformations are possible, these are some standard ones that are good enough to benchmark the video loaders.
Additionally, we also implement audio loading for those video loaders that support it.
Other than being a tool for comparing loading times, this repository is also convenient to compare other aspects of the video loaders (such as the different ways they deal with different frame rates), as well as to have a working implementation of multiple video loaders that use the same data structure and parameters.
See other Considerations later in this document.
Contributions are welcome!
First, prepare the data running preprocess.py
. Follow the instructions in that file to store the data in the correct
format.
Then, modify parameters.py
and run main.py
without any arguments (the arguments are loaded from parameters.py
).
Consider modifying the reporting/visualizations code in main.py
to report the numbers you are interested in.
Installation of each one of all of the implemented libraries at the same time is not possible, as they have conflicting
requirements. This is why we do not provide a single requirements.txt
file. Install the libraries that you are most
interested in using. Python 3.9 is the most compatible Python version for most of the libraries.
Make sure FFmpeg is installed in your machine.
The libraries that can be installed without incompatibilities are:
pip install git+https://github.com/mondeja/moviepy.git # Last pip package is not up to date. Install from git directly. It says it is not compatible with last numpy, but it works (do not downgrade numpy)
pip3 install torch torchvision torchaudio # Default version. See later for alternatives.
git clone https://github.com/pytorch/vision.git
cd vision
python setup.py develop
pip install -U openmim
mim install mmcv # Requires previous installation of torch
pip install git+https://github.com/facebookresearch/pytorchvideo.git # Last pip package is not up to date with latest torchvision release, so install from git directly
# pip install av # This should get installed with pytorchvideo
pip install numpy
pip install absl-py
pip install pandas
pip install tensorflow
pip install ffmpeg-python
pip install pims
conda install -c conda-forge gulpio
pip install git+https://github.com/willprice/torchvideo.git@master
pip install nvidia-dali-cuda110
conda install -c conda-forge lintel
The rest of libraries can present some incompatibilities with each other. Consider the following when installing them:
- To install other versions of PyTorch (for different CUDA versions, for example), see this link.
- The decord library can be installed with
pip install decord
if CPU support is enough. For GPU support, install from source following these instructions. - Pillow-SIMD is a fast replacement for Pillow. Install either one or the other. For the standard Pillow, just run
pip install pillow
, and for Pillow-SIMD, runCC="cc -mavx2" pip install -U --force-reinstall pillow-simd
, following these instructions. - The
TorchVideoVideo
video loader requires thelintel
library, that only work on python up to 3.9. If you want to use Python 3.10+, do not useTorchVideoVideo
.
- Some of these video-loading libraries provide other functionality that may be very convenient depending on the use-case, which we may not be benchmarking in this project. This project is not meant to provide a comprehensive comparison between the video loaders. Similarly, some libraries may not benefit from the specific steps and standardization measures followed in this repository. We do not claim this is the intended or ideal loading procedure for all loaders.
- NVVL (PyTorch wrapper in this link) is no longer maintained and instead is part of DALI now. Because of this, we only implement DALI, and not NVVL.
- We explicitly separated the
Pillow
andPillowSIMD
as two different video loaders, although the implementation is the same and only one of them can be installed at a time. This is to make the choice between them explicit.- Other loaders that use PIL may also benefit from Pillow-SIMD instead of Pillow, and this is not explicitly contemplated in this repository. Take into account in case you compare the two versions.
- If the video file is slightly corrupt or there was a bad conversion, the loaders may think the video is longer that what it actually is, which may cause some problems in the code. We controlled for this in some places but not all.
- We only implement video loaders that work for PyTorch. Therefore JAX or Tensorflow-specific loaders (such as DMVR) are not implemented.
lintel
fails for some videos.lintel
is used in theTorchVideoVideo
loader. The code does not raise an error, it simply crashes withSegmentation fault (core dumped)
. It is a similar issue to the one raised here, but also for .mp4 videos, not only .webm videos. The (not ideal) fix in that case is to re-encode the video usingffmpeg
.- GulpIO, used in the
TorchVideoGULP
loader, does not always work properly. We noticed some intercalation of frames in some of the gulp files created during pre-processing. - The resulting frames obtained from the different video loaders would ideally be exactly the same. However, this will not be the case. There are factors that make them return slightly different results. Some of these factors are:
- Different resize interpolation algorithms. This could also be standardized, but we chose to keep the default ones.
- Different ways of sampling temporally.
- Some loaders specify start/end of the clips with seconds, and others use frame ids.
- Small differences in seeking the starting position in a video.
- Loaders not working properly in some scenarios.
- Decord
- CPU version, in the loader called
DecordVideo
. - GPU version, in the loader called
DecordVideoGPU
.
- CPU version, in the loader called
- Pillow
- Standard Pillow library, in the loader called
Pillow
. - Optimized Pillow-SIMD library, in the loader called
PillowSIMD
.
- Standard Pillow library, in the loader called
- OpenCV
- Using the image loader, in
OpenCVImage
. - Using the VideoCapture video loader, in
OpenCVVideo
.
- Using the image loader, in
- MMCV.
- Using the image loader, in
MMCVImage
. - Using the VideoReader video loader, in
MMCVVideo
.
- Using the image loader, in
- PIMS.
- With PyAV backend, in
PIMSPyAV
. - With ImageIO backend, in
PIMSImageIO
. - With MoviePy backend, in
PIMSMoviePy
.
- With PyAV backend, in
- FFmpeg, in the
FFmpeg
loader. Some other libraries use ffmpeg under the hood. - MoviePy, in the
MoviePy
loader. - PyTorchVideo
- Frame-level decoder, in
PyTorchVideoFrames
. - PyAV decoder, in
PyTorchVideoPyAV
. - Torchvision decoder, in
PyTorchVideoTorchvision
. - Decord decoder, in
PyTorchVideoDecord
.
- Frame-level decoder, in
- TorchVision VideoReader
- Video backend, in
TorchVisionVideoReader
. - PyAV backend, in
TorchVisionVideoReaderPyAV
. - CUDA backend, in
TorchVisionVideoReaderCUDA
.
- Video backend, in
- TorchVision read_video, in the
TorchVisionReadVideo
video loader. - TorchVideo
- Using GULP, in the
TorchVideoGULP
loader. - Using PIL, in the
TorchVideoPIL
loader. - Using TorchVideo internal readers in
TorchVideoVideo
.
- Using GULP, in the
- DALI, in the
DALI
loader.
See loaders.py
for details.
Pull requests (corrections of bugs, more efficient loading, new features, or better documentation) are welcome! 😄
Some possible additions are:
- Implement new loaders. Follow the loaders that are already implemented, and specifically follow the instructions at the top of the
loaders.py
file. - Monitor I/O required
- Monitor memory (RAM or GPU memory) requirements
- Multiprocessing option, simulating multiple-worker PyTorch loader
- Improve reporting of results