You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just another minhash (jam) implementation. A high performance minhash variant to screen extremely large (metagenomic) datasets in a very short timeframe.
10
11
Implements parts of the ScaledMinHash / FracMinHash algorithm described in [sourmash](https://joss.theoj.org/papers/10.21105/joss.00027).
11
12
12
-
Unlike traditional implementations like [sourmash](https://joss.theoj.org/papers/10.21105/joss.00027) or [mash](https://doi.org/10.1186/s13059-016-0997-x) this version tries to specialise more on estimating the containment of small sequences in large sets. This is intended to be used to screen terabytes of data in just a few seconds / minutes.
13
+
Unlike traditional implementations like [sourmash](https://joss.theoj.org/papers/10.21105/joss.00027) or [mash](https://doi.org/10.1186/s13059-016-0997-x) this version tries to focus on estimating the containment of small sequences in large sets by (optionally) introducing an intentional bias towards smaller sequences. This is intended to be used to screen terabytes of data in just a few seconds / minutes.
13
14
14
15
### Installation
15
16
@@ -19,17 +20,17 @@ A pre-release is published via [crates.io](https://crates.io/) to install it use
19
20
cargo install jam-rs
20
21
```
21
22
22
-
If you want the bleeding edge development release you can install via git:
23
+
If you want the bleeding edge development release you can install it via git:
--singleton Create a separate sketch for each sequence record
@@ -95,9 +96,9 @@ Calculate the distance for one or more inputs vs. a large set of database sketch
95
96
96
97
```console
97
98
$ jam dist
98
-
Calculate distance of a (small) sketch against one or more sketches as database. Requires all sketches to have the same kmer size
99
+
Estimate containment of a (small) sketch against a subset of one or more sketches as database. Requires all sketches to have the same kmer size
99
100
100
-
Usage: jam dist [OPTIONS] --input <INPUT> --database <DATABASE>
101
+
Usage: jam dist [OPTIONS] --input <INPUT>
101
102
102
103
Options:
103
104
-i, --input <INPUT> Input sketch or raw file
@@ -106,12 +107,13 @@ Options:
106
107
-c, --cutoff <CUTOFF> Cut-off value for similarity [default: 0.0]
107
108
-t, --threads <THREADS> Number of threads to use [default: 1]
108
109
-f, --force Overwrite output files
110
+
--stats Use the Stats params for restricting results
111
+
--gc-lower <GC_LOWER> Use GC stats with an upper bound of x% (gc_lower and gc_upper must be set)
112
+
--gc-upper <GC_UPPER> Use GC stats with an lower bound of y% (gc_lower and gc_upper must be set)
109
113
-h, --help Print help
110
114
```
111
115
112
116
113
-
114
-
115
117
#### Merge
116
118
117
119
Merge multiple sketches into one large one.
@@ -138,7 +140,7 @@ This project is licensed under the MIT license. See the [LICENSE](LICENSE) file
138
140
139
141
### Disclaimer
140
142
141
-
jam-rs is still in early active development and not ready for production use. Use at your own risk. Once a stable version is released additional information and installation guidelines will be added.
143
+
jam-rs is still in active development and not ready for production use. Use at your own risk.
0 commit comments