- Setup a lot of calculations (perovskite.py)
- Run calculations (run.py) a. Fix a few problem calculations by hand (nawo3.py)
- Analyze calculations (analysis.py, plot*.py,…)
- …(scripts4-8, miscellaneous command line work)
- Try to teach student steps 1-4
- Or, try to repeat steps 1-4 myself…
- Integrating derivation of methods with illustrative code examples
- Writing math in comments is tedious
- Pasting code and results into text is tedious
- Tedious = error-prone
- Or in my case: not likely to happen
- How did I make this figure?
- Where is the script?
- Where is the data?
- How did I make the data?
- Or how do I include the data in this figure from another paper in my current paper?
Data and methods tend to get lost over time as students leave, old computers die, …
- Minimal use of new tools (corollary: maximal use of existing tools)
- If a new tool is needed, it needs to be a long term benefit
- If I have to build a tool, it needs to help my overall skills
- Must be deeply integrated into and improve my workflow
- I like to work in one environment
- I am not likely to break out of workflow to do something
- I dislike switching tools (muscle memory)
- These are reflections of MWODT (my way of doing things)
- Problem 1 (documenting computational workflow)
- Solved if we can do all that work in single document and keep a record of the results of each step
- Problem 2 (integrating mathematics with code)
- Solved if text, data and code can be easily interspersed, and code can be run and output readily captured
- Problem 3 (how did I make the figure)
- Solved if we can keep everything together and export what we want in the form we need
- The solution is an editor that can knows code, data and text and can interact with the system, a markup language that separates code, data and text, and a convenient programming language
- I will present the combination of these that works best for me:
Emacs + org-mode + Python
- Emacs is an extensible editor
- Extensible in Emacs-Lisp, a full programming language
- Users can customize every aspect of the editor
- You can add any functionality you want
- Like a “browser” for text
- Operates in “modes” that provide features
- Every major language has a mode: Python, C/C++, Fortran, Shell, LaTeX, markdown, restructured Text, etc…
- provides editing functions, syntax highlighting, etc…
- Provides complete integration with the operating system
- This enables system commands to be run, and the output captured
- Extensible in Emacs-Lisp, a full programming language
Org-mode (http://orgmode.org/)
“Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.” - founded in 2003. Very active community.
- Org-mode is written in Emacs-Lisp
- Outline mode that enables document organization
- Amazing task management capability
- Lightweight markup language that differentiates text, data and code
- You can embed arbitrary LaTeX, HTML, tables, etc… in it
- Code is executable in the editor, and the results are captured in the editor
- Enables navigatable “links” to files, commands, locations, urls, ....
- Export engine that converts selected content to PDF, LaTeX, HTML, ascii, etc… (e.g. this presentation!) \attachfile{kitchin-emacs-orgmode-python.org}
ls | sort
import os
files = os.listdir('.')
files.sort()
for f in files: print f
(mapcar (lambda (arg)
(princ (format "%s\n" arg)))
(directory-files "."))
- PYCSE - http://jkitchin.github.io/pycse
- E-book on python calculations in science and engineering (~300 pages)
- Python blog - http://jkitchin.github.io
- 169 posts on mostly python, created and published using org-mode and blogofile
- dft-book - http://jkitchin.github.io/dft-book
- E-book on using python to drive quantum chemistry to compute material properties (~300 pages)
- Two scientific manuscripts submitted
- “Simulating temperature programmed desorption of oxygen on Pt(111) using DFT derived coverage dependent desorption barriers” to Topics in Catalysis
- “Effects of O_2 and SO_2 on the capture capacity of a primary-amine based polymeric CO_2 sorbent” to Industrial & Engineering Chemistry Research
- Manuscripts and supporting information were generated in Emacs + org-mode, and exported to LaTeX for submission
PYCSE - http://jkitchin.github.io/pycse
- Code is written and executed in the editor. Output captured.
- Exported to blog, HTML and PDF. Mobi and ePub are also possible.
dft-book - http://jkitchin.github.io/dft-book
- 300+ pages of using python to run quantum chemical calculations
- might be 50+% code!
- Every example written and run in the book
- no cut and paste code/results
- It ran correctly at least once
- Separation of data generation and analysis promotes data reuse
- Easier to read scripts
- Org-mode is deeply integrated with Emacs
- pro - You get all the power of Emacs
- on the other hand - You have to learn Emacs and Emacs-Lisp
- Other editors can mimic the capabilities
- Org-mode is markup and functionality
- restructured text + Sphinx is the closest in spirit
- has extensibility (in Python!)
- currently lacks editor integration even in Emacs
- Getting exported format perfect can be challenging
- This is a general problem with converting formats
- I actually prefer reading content in org-mode now
- My students prefer to read HTML/pdf
- Reproducible research needs new tools, new workflows
- Users will probably need to customize tools for their needs
- Emacs + org-mode was a game changer in reproducible research for me. It enabled:
- Authoring two books on using python in science and engineering
- A python based blog
- Scientific manuscripts with thorough documentation of data, methods, etc…
- Documenting computational work
- Managing the work-life of an engineering professor
- The key features that enabled this are
- Extensible editor
- Extensible markup language
- Scripting (Python + others)
Thanks for your attention!
https://github.com/jkitchin/scipy2013
import sys
print sys.version
# where the platform independent Python files are installed
print sys.prefix
Outline folding, latex rendering, blog post ../../../pycse/pycse.org::6531
Rendered pdf file:../../../pycse/pycse.pdf
275 lines of emacs-lisp creates blogofile (python-based static blog framework) posts ../../../.emacs.d/blogofile.el
clickable links ../../manuscripts/01-resubmitted-IER-SO2/IER-SO2.txt::20
Embed data files into document file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::20
Embed data files, read data from scripts file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::33
Tables of data inline. Use the data to make a figure. file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::175
Build the output pdf file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::455
Resulting pdf ../../manuscripts/01-resubmitted-IER-SO2/re-submitted/supporting-information.pdf
Example of integrated prose/code. Why you want deep integration with editor (menu TODO) file:../../classes/06-640-Molecular-Simulations-Fall-2012/dft-book/dft.org::*Simple estimate of the adsorption energy
file:../../classes/06-640-Molecular-Simulations-Fall-2012/dft-book/dft.pdf
elisp:(org-beamer-export-to-pdf)
file:kitchin-emacs-orgmode-python.pdf