orc v sparquet vs avro.txt

In simplest word, these all are file formats.

Hadoop like big storage and data processing ecosystem need optimized read and write performance oriented data formats.

1) AVRO:-

It is row major format.
Its primary design goal was schema evolution.
In the avro format, we store schema separately from data. Generally avro schema file (.avsc) is maintained.
2) ORC

Column oriented storage format.
Originally it is Hive's Row Columnar file. Now improved as Optimized RC (ORC)
Schema is with the data, but as a part of footer.
Data is stored as row groups and stripes.
Each stripe maintains indexes and stats about data it stores.
3) Parquet

Similar to ORC. Based on google dremel
Schema stored in footer
Column oriented storage format
Has integrated compression and indexes
Space or compression wise I found them pretty close to each other

Around 10 GB of CSV data compressed to 1.1 GB of ORC with ZLIB compression and same data to 1.2 GB of Parquet GZIP. Both file formats with SNAPPY compression, used around 1.6 GB of space.

Conversion speed wise ORC was little better it took 9 min where as parquet took 10 plus min.

Following link should be useful for more comparision

File Format Benchmark - Avro, JSON, ORC & Parquet
https://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet