-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathnon-software-packages.Rmd
211 lines (161 loc) · 7.93 KB
/
non-software-packages.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
# Non-Software Packages {#non-software}
Most packages contributed by users are software packages. However, there are
instances where other package types are submitted. The following sections will
go into specifics we look for in each of the non-software type packages.
## Annotation and Experiment data packages
Annotation packages are database-like packages that provide information linking
identifies (e.g., Entrez gene names or Affymetrix probe ids) to other
information (e.g., chromosomal location, Gene Ontology category).
Experiment data packages provide data sets that are used, often by software
packages, to illustrate particular analyses. These packages contain curated
data from an experiment, teaching course or publication and in most cases
contain a single data set.
We look for similar requirements as software packages, but most importantly is
proper documentation for the data included within the package. Traditional
Annotation and Experiment packages are not ideal; AnnotationHub and
ExperimentHub interfaces and packages are desirable.
## Annotation/Experiment Hub packages
These light weight packages are related to resources added to
`r BiocStyle::Biocpkg("AnnotationHub")` or
`r BiocStyle::Biocpkg("ExperimentHub")`. The package should minimally contain
the resource metadata, man pages describing the resources, and a vignette. It
may also contain supporting R function the author wants to provide. These
packages are similar to the above Annotation or Experiment data packages except
the data is stored in a provided _Bioconductor_ Microsoft [Azure Data Lakes][]
or publicly accessibly sites (ensembl, AWS S3 buckets, etc) instead of in the
package itself.
There is more information about creating a hub packages as well as the contents
of on in the [Create A Hub Package][] vignette within the
`r BiocStyle::Biocpkg("HubPub")` Bioconductor package.
## Workflow packages
Workflow packages contain vignettes that describe a bioinformatics workflow
that involves multiple Bioconductor packages. These vignettes are usually more
extensive than the vignettes that accompany software packages. These packages do
not need `man/` or `R/` directories nor a `data/` directory as ideally
workflows make use of existing data in a Bioconductor package.
[Existing Workflows][workflow-views]
Workflow vignettes may deal with larger data sets and/or be more computationally intensive
than typical Bioconductor package vignettes. For this reason, the automated builder that
produces these vignettes does not have a time limit (in contrast to the Bioconductor package
building system which will time out if package building takes too long). It is
expected the majority of vignette code chunks are evaluated.
### How do I write and submit a workflow vignette? {#submit-workflow}
* Write a package with the same name as the workflow. The workflow vignette
written in Markdown, using the [rmarkdown][] package should be included in the
vignette directory. You may include more than one vignette but please use
useful identifying names.
* The package does not need man/ or R/ directories nor a data/ directory as
ideally workflows make use of existing data in a Bioconductor repository or on
the web; the workflow package itself should not contain large data files.
* In the DESCRIPTION file, include the line "BiocType: Workflow". Please also
include a detailed Description field in the DESCRIPTION file. The DESCRIPTION
file should contain biocViews which should be from the [Workflow
branch][workflow-views]. If you think a new term is relevant please reach out to
<lori.shepherd@roswellpark.org>.
* Submit the package to the [GitHub submission tracker][tracker] for a formal
review. Please also indicate in the tracker issue that this package is a
workflow.
* Workflows are git version controlled. Once the package is accepted it will be
added to our git repository at git@git.bioconductor.org and instructions will
be sent for gaining access for maintainence.
### Consistent formatting
* In an effort to standardize the workflow vignette format, it is strongly
encouraged to use either `r BiocStyle::Biocpkg("BiocStyle")` for formatting or
utilize `r BiocStyle::Biocpkg("BiocWorkflowTools")`. The following header
shows how to use `r BiocStyle::Biocpkg("BiocStyle")` in the vignette:
```
output:
BiocStyle::html_document
```
* The following should also be include
- author affiliations
- a date representing when the workflow vignette has been modified
* The first section should have some versioning information. The R version,
Bioconductor version, and package version should be visible.
The following is an example of how this could be achieved:
<pre>
<code>
<p>
**R version**: `r R.version.string`
<br />
**Bioconductor version**: `r BiocManager::version()`
<br />
**Package version**: `r packageVersion("annotation")`
</p>
</code>
</pre>
* An example start to a workflow vignette:
The following is taken as an example header from the variants workflow package:
<pre>
<code>
– – –
title: Annotating Genomic Variants
author:
–name: Valerie Obenchain
affiliation: Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., P.O. Box 19024, Seattle, WA, USA 98109–1024
date: 11 April 2018
vignette: >
%\VignetteIndexEntry{Annotating Genomic Variants}
%\VignetteEngine{knitr::rmarkdown}
output:
BiocStyle::html_document
– – –
# Version Info
```{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({library('variants')})
```
<p>
**R version**: `r R.version.string`
<br />
**Bioconductor version**: `r BiocManager::version()`
<br />
**Package version**: `r packageVersion("variants")`
</p>
</code>
</pre>
### Tidying package loading output
Most workflows load a number of packages and you do not want
the output of loading those packages to clutter your workflow
document. Here's how you would solve this in markdown; you can
do something similar in Latex.
First, set up a code chunk that is evaluated but not echoed, and whose
results are hidden. We also set `warning=FALSE` to be sure that
no output from this chunk ends up in the document:
`r ''````{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
library(GenomicRanges)
library(GenomicAlignments)
library(Biostrings)
library(Rsamtools)
library(ShortRead)
library(BiocParallel)
library(rtracklayer)
library(VariantAnnotation)
library(AnnotationHub)
library(BSgenome.Hsapiens.UCSC.hg19)
library(RNAseqData.HNRNPC.bam.chr14)
})
```
Then you can set up another code chunk that *is* echoed,
which has almost the same contents. The second invocation of `library()`
will not produce any output since the package has already been loaded:
`r ''````{r}
library(GenomicRanges)
library(GenomicAlignments)
library(Biostrings)
library(Rsamtools)
library(ShortRead)
library(BiocParallel)
library(rtracklayer)
library(VariantAnnotation)
library(AnnotationHub)
library(BSgenome.Hsapiens.UCSC.hg19)
library(RNAseqData.HNRNPC.bam.chr14)
```
### Citations
To manage citations in your workflow document,
specify the bibliography file in the document metadata header.
bibliography: references.bib
You can then use citation keys in the form of `[@label]` to cite an entry with an identifier "label".
Normally, you will want to end your document with a section header "References" or similar, after which the bibliography will be appended.
For more details see the [rmarkdown documentation][].