Skip to content

Commit

Permalink
Merge pull request InsightSoftwareConsortium#4336 from thewtex/data-a…
Browse files Browse the repository at this point in the history
…rchive-5.4

Data archive docs for 5.4
  • Loading branch information
thewtex authored Nov 20, 2023
2 parents 273acdc + 2387097 commit b687781
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 39 deletions.
136 changes: 108 additions & 28 deletions Documentation/Maintenance/Release.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,55 +202,125 @@ Commit the result:
Archive ExternalData
--------------------

Set the environmental or CMake variable `ExternalData_OBJECT_STORES` to a
local directory. e.g.
More background on the testing data can be found in the
[Contributing Upload Binary Data][../docs/contributing/upload_binary_data.md) documentation.

The following steps archive data for release on various resources. Both
[datalad] and [@web3-storage/w3] should be installed locally. And the [kubo]
`ipfs` cli. It is recommended to install and run [ipfs-desktop] and symlink
the `ipfs` cli it comes with into your PATH.

### Fetch the latest ITKData datalad repository

Clone the ITKData datalad repository, if not already available.

```sh
export ExternalData_OBJECT_STORES=${HOME}/data
cd ~/data/
datalad clone https://gin.g-node.org/InsightSoftwareConsortium/ITKData.git
cd ITKData
```

Pre-populate the store with the contents of the 'InsightData' tarballs from a
previous release. Once the tarball extracted, move the content of its
subfolder called `.ExternalData` in your local `ExternalData_OBJECT_STORES`
directory.
Make sure the datalad repository is up-to-date.

```sh
datalad update -r .
datalad get .
```

### Fetch new data locally

Checkout the tag which we are archiving.

```sh
cd ~/src/ITK
git checkout <version>
```

Then, from the ITK build directory, configure ITK enabling the flags:
* `ITK_WRAP_PYTHON`
* `ITK_LEGACY_SILENT`
* `BUILD_TESTING`
* `BUILD_EXAMPLES`
And fetch new data into the datalad repository.

If you have previously enabled remote modules using the same ITK source
directory, either verify that they are enabled in your current build, or remove
their source directory that has been added inside ITK source directory
(i.e. `./Modules/Remote/$name_of_remote_module`).
```sh
cd ~/data/ITKData
./ContentLinkSynchronization.sh --create ~/src/ITK
```

Build the `ITKData` target
Upload the tree to archival storage with:

```sh
make ITKData
w3 put . --no-wrap -n ITKData-pre-verify -H
```

This will download new testing data since the previous release.
Verify and possibly update CID's in the ITK repository with the CID output
from the previous step.

```sh
./ContentLinkSynchronization.sh --root-cid bafy<rest-of-the-cid> ~/src/ITK
datalad status
```

Next, archive the data on data.kitware.com. Create a folder, e.g.
`$MAJOR_VERSION.$MINOR_VERSION`, in `ITK/ITKTestingData`, and run
If there is new content, commit it with:

```sh
python -m pip install girder-client
python ./Utilities/Maintenance/ArchiveTestingDataOnGirder.py --object-store ${ExternalData_OBJECT_STORES} --parent-id <the-girder-id-of-the-folder-created> --api-key <your-girder-user-api-key>
datalad save -m "ENH: Updates for ITK-v<itk-release-version>"
```

This script requires the girder-client Python package install from Girder
master, November 2016 or later, (Girder > 2.0.0).
Upload the repository update to web3.storage:

Archive the `InsightData` contents on ITK's file server at Kitware:
```sh
w3 put . --no-wrap -n ITKData-v<itk-release-version> -H
```

Edit the *README.md* file with the new root CID and push.

```sh
rsync -vrt ${ExternalData_OBJECT_STORES}/MD5/ kitware@web:ITKExternalData/MD5/
datalad save -m "DOC: Update root CID for ITK-v<itk-release-version>"
datalad push
```

Update the data archive at https://github.com/InsightSoftwareConsortium/ITKTestingData.
### Pin the CID on locally and on Pinata

If the [pinata] pinning service is not already available, create it:

```sh
ipfs pin remote service add pinata https://api.pinata.cloud/psa/ PINATA_JWT
```

Then pin the root CID locally and on Pinata:

```sh
ipfs pin add /ipfs/bafy<rest-of-cid>
ipfs pin remote add --service=pinata --name=ITKData-ITK-v<itk-release-version> /ipfs/bafy<rest-of-cid>
```

### Pin the CID on Kitware's ipfs server

Optionally, pin to Kitware's ipfs server:

```
ssh ipfs
export IPFS_PATH=/data/ipfs
ipfs pin add --progress /ipfs/bafy<rest-of-cid>
```

### Rsync the data to Kitware's Apache Server

Optionally, rsync the object to Kitware's Apache Server

```sh
rsync -vrtL ./Objects/CID kitware@web:ITKExternalData/
```

### Push the data to GitHub Pages

Push the data to the [ITKTestingData] `gh-pages` branch. GitHub restricts size
of files.

```
rsync -vrtL --max-size=45m ./Objects/CID ~/data/ITKTestingData/
cd ~/data/ITKTestingData
git add .
git commit -m "ENH: Updates for ITK <version>"
git push
```

Tag the ITK repository
----------------------
Expand Down Expand Up @@ -358,6 +428,9 @@ endings.

The `InsightData` tarballs are generated along with the source code tarballs.

Data is fetched from [IPFS]. An IPFS daemon must be running to fetch the data
- [ipfs-desktop] is recommended.

Once the repository has been tagged, we use the following script in the
repository to create the tarballs:

Expand Down Expand Up @@ -869,21 +942,28 @@ excellent packaging.
[Kitware blog]: https://blog.kitware.com/
[blog post]: https://blog.kitware.com/itk-packages-in-linux-distributions/
[Dashboard]: https://open.cdash.org/index.php?project=Insight
[datalad]: https://www.datalad.org/
[community]: https://discourse.itk.org/
[documentation page]: https://www.itk.org/ITK/help/documentation.html
[download page]: https://itk.org/ITK/resources/software.html
[GitHub]: https://github.com/InsightSoftwareConsortium/ITK
[IPFS]: https://ipfs.tech/
[ipfs-desktop]: https://github.com/ipfs/ipfs-desktop/releases
[ITKPythonPackage]: https://itkpythonpackage.readthedocs.io/en/latest/index.html
[ITKTestingData]: https://github.com/InsightSoftwareConsortium/ITKTestingData
[ITK discussion]: https://discourse.itk.org/
[Image.sc Forum]: https://image.sc
[ITK Open Collective page]: https://opencollective.org/itk
[ITK issue tracking]: https://issues.itk.org/
[ITK Software Guide]: https://itk.org/ItkSoftwareGuide.pdf
[ITK wiki]: https://itk.org/Wiki/ITK
[ITK Sphinx examples]: https://itk.org/ITKExamples/
[kubo]: https://github.com/ipfs/kubo
[pinata]: https://pinata.cloud
[releases page]: https://itk.org/Wiki/ITK/Releases
[release schedule]: https://itk.org/Wiki/ITK/Release_Schedule
[Software Guide]: https://itk.org/ItkSoftwareGuide.pdf
[@web3-storage/w3]: https://www.npmjs.com/package/@web3-storage/w3

[kitware]: https://www.kitware.com/
[public.kitware.com]: public.kitware.com
Expand Down
14 changes: 3 additions & 11 deletions Utilities/Maintenance/SourceTarball.bash
Original file line number Diff line number Diff line change
Expand Up @@ -36,27 +36,20 @@ return_pipe_status() {

find_data_objects() {
git ls-tree --full-tree -r "$1" |
egrep '\.(md5)$' |
egrep '\.(cid)$' |
while read mode type obj path; do
case "$path" in
*.md5) echo MD5/$(git cat-file blob $obj) ;;
*.cid) echo CID/$(git cat-file blob $obj) ;;
*) die "Unknown ExternalData content link: $path" ;;
esac
done | sort | uniq
return_pipe_status
}

validate_MD5() {
md5sum=$(md5sum "$1" | sed 's/ .*//') &&
if test "$md5sum" != "$2"; then
die "Object MD5/$2 is corrupt: $1"
fi
}

download_object() {
algo="$1" ; hash="$2" ; path="$3"
mkdir -p $(dirname "$path") &&
if curl -L "https://www.itk.org/files/ExternalData/$algo/$hash" -o "$path.tmp$$" 1>&2; then
if curl -L "http://127.0.01:8080/ipfs/$hash" -o "$path.tmp$$" 1>&2; then
mv "$path.tmp$$" "$path"
else
rm -f "$path.tmp$$"
Expand All @@ -78,7 +71,6 @@ index_data_objects() {
download_object "$algo" "$hash" "$path" &&
file="$path"
fi &&
validate_$algo "$file" "$hash" &&
obj=$(git hash-object -t blob -w "$file") &&
echo "100644 blob $obj $path" ||
return
Expand Down

0 comments on commit b687781

Please sign in to comment.