readme updates

mwlon · Oct 29, 2023 · bbac876 · bbac876
1 parent 4ce1182
commit bbac876
Show file tree

Hide file tree

Showing 5 changed files with 15 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -16,7 +16,6 @@ with high compression ratio and moderately fast speed.
 * lossless; preserves ordering and exact bit representation
 * nth-order delta encoding
 * compresses faster or slower depending on compression level from 0 to 12
-* fully streaming decompression
 
 **Data types:**
 `u32`, `u64`, `i32`, `i64`, `f32`, `f64`
@@ -56,25 +55,26 @@ multiple chunks per file.
 | page     | interleaving w/ wrapping format | \>1k numbers        |
 | batch    | decompression                   | 256 numbers (fixed) |
 
-The standalone format is essentially a minimal implementation of a wrapped format.
-It supports batched decompression and seeking, but not nullability, multiple
-columns, random access, or other niceties.
+The standalone format is a minimal implementation of a wrapped format.
+It supports batched decompression only; no nullability, multiple
+columns, random access, seeking, or other niceties.
+It is mainly useful for quick proofs of concept (sometimes by the CLI).
 
 <img alt="pco compression and decompression steps" title="compression and decompression steps" src="./images/processing.svg" />
 
 ## Etymology
 
 The names pcodec and pco were chosen for these reasons:
 * "Pico" suggests that it makes very small things.
-* Pco is reminiscent of qco, its preceding format.
+* Pco is reminiscent of qco, its predecessor.
 * Pco is reminiscent of PancakeDB (Pancake COmpressed). Though PancakeDB is now
   history, it had a good name.
 * Pcodec is short, provides some semantic meaning, and should be easy to
   search for.
 
 The names are used for these purposes:
 * pco => the library and data format
-* pco_cli => the binary crate name
+* pco\_cli => the binary crate name
 * pcodec => the binary CLI and the repo
 
 ## Extra

diff --git a/pco/README.md b/pco/README.md
@@ -3,7 +3,7 @@
 **⚠️ Both the API and the data format are unstable for the 0.0.0-alpha.\*
 releases. Do not depend on pco for long-term storage yet. ⚠️**
 
-## Usage as a Standalone Format
+## Quick Start
 
 ```rust
 use pco::standalone::{auto_compress, auto_decompress};
@@ -32,7 +32,7 @@ To run something right away, try
 [the benchmarks](../bench/README.md).
 
 For a lower-level standalone API that allows writing one chunk at a time /
-streaming reads, see [the docs.rs documentation](https://docs.rs/pco/latest/pco/).
+batched reads, see [the docs.rs documentation](https://docs.rs/pco/latest/pco/).
 
 ## Usage as a Wrapped Format
 
@@ -58,15 +58,4 @@ implementations are insufficient)
 `pco::data_types::UnsignedLike` and
 `pco::data_types::FloatLike`.
 
-### Seeking and Statistics
-
-Each chunk has a metadata section containing
-* the total count of numbers in the chunk,
-* the bins for the chunk and relative frequency of each bin,
-* and the size in bytes of the compressed body.
-
-Using the compressed body size, it is easy to seek through the whole file
-and collect a list of all the chunk metadatas.
-One can aggregate them to obtain the total count of numbers in the whole file
-and even an approximate histogram.
-This is typically about 100x faster than decompressing all the numbers.
+The maximum legal precision of a custom data type is currently 128 bits.
diff --git a/pco/src/chunk_config.rs b/pco/src/chunk_config.rs
@@ -113,8 +113,7 @@ impl ChunkConfig {
   }
 }
 
-/// `PagingSpec` specifies how a chunk is split into pages
-/// (default: equal pages up to 1,000,000 numbers each).
+/// `PagingSpec` specifies how a chunk is split into pages.
 #[derive(Clone, Debug)]
 #[non_exhaustive]
 pub enum PagingSpec {
@@ -130,6 +129,7 @@ pub enum PagingSpec {
   ExactPageSizes(Vec<usize>),
 }
 
+/// Default: equal pages up to 1,000,000 numbers each.
 impl Default for PagingSpec {
   fn default() -> Self {
     Self::EqualPagesUpTo(DEFAULT_MAX_PAGE_SIZE)

diff --git a/pco/src/data_types/mod.rs b/pco/src/data_types/mod.rs
@@ -125,9 +125,9 @@ pub trait UnsignedLike:
 /// wouldn't preserve ordering and would cause pco to fail. In this example,
 /// one needs to flip the sign bit and, if negative, the rest of the bits.
 pub trait NumberLike: Copy + Debug + Display + Default + PartialEq + 'static {
-  /// A number from 0-255 that corresponds to the number's data type.
+  /// A number from 1-255 that corresponds to the number's data type.
   ///
-  /// Each `NumberLike` implementation should have a different `HEADER_BYTE`.
+  /// Each `NumberLike` implementation should have a different `DTYPE_BYTE`.
   /// This byte gets written into the file's header during compression, and
   /// if the wrong header byte shows up during decompression, the decompressor
   /// will return an error.

diff --git a/pco_cli/README.md b/pco_cli/README.md
@@ -2,7 +2,7 @@
 
 ## Setup
 
-You can compress, decompress, and inspect .pco files using our simple CLI.
+You can compress, decompress, and inspect standalone .pco files using our simple CLI.
 Follow this setup:
 
 1. Install Rust: https://www.rust-lang.org/tools/install
@@ -52,7 +52,7 @@ This command prints numbers in a .pco file to stdout.
 Examples:
 
 ```shell
-pcodec decompress --limit 10 in.pco
+pcodec decompress --limit 256 in.pco
 ```
 
 ### Inspect