- python
- Creating a Virtual Environment with Python
- IDE
- Git version control
- nbconvert jupyter core package
- jq much faster
- To display error when filter not configured
- 1. Create a check-nb-filter.sh script in root that checks if a notebook filter exists, here nbstrip_full
- 2. Execute this in a notebook cell to add an additional check-nb-filter filter to git config which invokes above script whenever a notebook is staged
- 3. Set .gitattributes to run check-nb-filter for notebook files
# create virtual env
python -m venv .venv # name can be the folder name
# install dep
pip install <dep>
# store requirements
pip freeze > requirements.txt
# store python version
python --version > python_version.txt
# git ignore the virtual env folder and notebook artifacts
curl -o .gitignore https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
# install deps to reproduce env
pip install -r requirements.txt
A better newer notebook IDE than the older jupyter notebook. Provides extensions like vscode allowing features like:
- Vim
- Git version control
pip install jupyterlab
jupyter lab
Con:
- Required to be pip installed each time in a virtual env
- Alternatively, can be installed on a user level or in a conda env but pip install is still needed for python kernel
Equally good as JupyterLab. Advantage:
- No IDE specific pip install
- No metadata or cell output displayed using following settings
"notebook.diff.ignoreMetadata": true,
"notebook.diff.ignoreOutputs": true
- This will clear the cell ouputs before staging a notebook
- Useful to ensure diffs show only changes in cells.
See https://timstaley.co.uk/posts/making-git-and-jupyter-notebooks-play-nice/
This local git filter depends on nbconvert which comes with jupyter core packages (run jupyter --version)
git config filter.jupyternotebook.clean "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR"
git config filter.jupyternotebook.required true
echo "*.ipynb filter=jupyternotebook" >> .gitattributes
nbconvert is too slow so alternatively use jq, sed for json data, one caveat is this does not guarantee to conform to jupyter notebook spec but is much faster. jq can be installed in a conda env.
git config filter.nbstrip_full.clean "jq --indent 1 \
'(.cells[] | select(has(\"outputs\")) | .outputs) = [] \
| (.cells[] | select(has(\"execution_count\")) | .execution_count) = null \
| .metadata = {\"language_info\": {\"name\": \"python\", \"pygments_lexer\": \"ipython3\"}} \
| .cells[].metadata = {} \
'"
git config filter.nbstrip_full.smudge cat
git config filter.nbstrip_full.required true
echo "*.ipynb filter=nbstrip_full" >> .gitattributes
git config filter.nbstrip_full.clean 'jq --indent 1 \
"(.cells[] | select(has(\"outputs\")) | .outputs) = [] \
| (.cells[] | select(has(\"execution_count\")) | .execution_count) = null \
| .metadata = {\"language_info\": {\"name\": \"python\", \"pygments_lexer\": \"ipython3\"}} \
| .cells[].metadata = {} \
"'
1. Create a check-nb-filter.sh script in root that checks if a notebook filter exists, here nbstrip_full
# check-nb-filter.sh
#!/bin/bash
if ! git config --get "filter.$1.clean"; then
echo "Error: Filter '$1' is not defined" >&2
exit 1
fi
exec git config --get "filter.$1.clean"
2. Execute this in a notebook cell to add an additional check-nb-filter filter to git config which invokes above script whenever a notebook is staged
!git config filter.check-nb-filter.clean "./check-nb-filter.sh nbstrip_full"
!git config filter.check-nb-filter.smudge "git config --get filter.nbstrip_full.smudge"
Note: If the notebook filter does not exist check-nb-filter filter can only log this and cannot prevent staging of changes.
# %echo *.ipynb filter=check-nb-filter>.gitattributes