Skip to content

Files

Latest commit

 

History

History
347 lines (297 loc) · 10.6 KB

kitchin-emacs-orgmode-python.org

File metadata and controls

347 lines (297 loc) · 10.6 KB

Emacs + org-mode + python in reproducible research

Problem to solve #1

Computational research workflow

  1. Setup a lot of calculations (perovskite.py)
  2. Run calculations (run.py) a. Fix a few problem calculations by hand (nawo3.py)
  3. Analyze calculations (analysis.py, plot*.py,…)
  4. …(scripts4-8, miscellaneous command line work)
  5. Try to teach student steps 1-4
  6. Or, try to repeat steps 1-4 myself…

Lots of scripts

./ls.png

Problem to solve #2

  • Integrating derivation of methods with illustrative code examples
    • Writing math in comments is tedious
    • Pasting code and results into text is tedious
    • Tedious = error-prone
    • Or in my case: not likely to happen

./blog.png

Problem to solve #3

The issue

  • How did I make this figure?
    • Where is the script?
    • Where is the data?
    • How did I make the data?
  • Or how do I include the data in this figure from another paper in my current paper?

Data and methods tend to get lost over time as students leave, old computers die, …

A figure from a manuscript

./fig8.png

Desired features in a solution

  1. Minimal use of new tools (corollary: maximal use of existing tools)
    • If a new tool is needed, it needs to be a long term benefit
    • If I have to build a tool, it needs to help my overall skills
  2. Must be deeply integrated into and improve my workflow
    • I like to work in one environment
    • I am not likely to break out of workflow to do something
    • I dislike switching tools (muscle memory)
  3. These are reflections of MWODT (my way of doing things)

These problems have related solutions

  • Problem 1 (documenting computational workflow)
    • Solved if we can do all that work in single document and keep a record of the results of each step
  • Problem 2 (integrating mathematics with code)
    • Solved if text, data and code can be easily interspersed, and code can be run and output readily captured
  • Problem 3 (how did I make the figure)
    • Solved if we can keep everything together and export what we want in the form we need
  • The solution is an editor that can knows code, data and text and can interact with the system, a markup language that separates code, data and text, and a convenient programming language
  • I will present the combination of these that works best for me:

Emacs + org-mode + Python

Emacs in a nutshell

  • Emacs is an extensible editor
    • Extensible in Emacs-Lisp, a full programming language
      • Users can customize every aspect of the editor
      • You can add any functionality you want
      • Like a “browser” for text
    • Operates in “modes” that provide features
      • Every major language has a mode: Python, C/C++, Fortran, Shell, LaTeX, markdown, restructured Text, etc…
      • provides editing functions, syntax highlighting, etc…
    • Provides complete integration with the operating system
      • This enables system commands to be run, and the output captured

Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.” - founded in 2003. Very active community.

  • Org-mode is written in Emacs-Lisp
  • Outline mode that enables document organization
  • Amazing task management capability
  • Lightweight markup language that differentiates text, data and code
  • You can embed arbitrary LaTeX, HTML, tables, etc… in it
  • Code is executable in the editor, and the results are captured in the editor
  • Enables navigatable “links” to files, commands, locations, urls, ....
  • Export engine that converts selected content to PDF, LaTeX, HTML, ascii, etc… (e.g. this presentation!) \attachfile{kitchin-emacs-orgmode-python.org}

Example - shell scripts

ls | sort

Example with python code

import os
files =  os.listdir('.')
files.sort()
for f in files: print f

Example with emacs-lisp

(mapcar (lambda (arg) 
	  (princ (format "%s\n" arg)))
	(directory-files "."))

Emacs + org-mode projects

  • PYCSE - http://jkitchin.github.io/pycse
    • E-book on python calculations in science and engineering (~300 pages)
  • Python blog - http://jkitchin.github.io
    • 169 posts on mostly python, created and published using org-mode and blogofile
  • dft-book - http://jkitchin.github.io/dft-book
    • E-book on using python to drive quantum chemistry to compute material properties (~300 pages)
  • Two scientific manuscripts submitted
    • “Simulating temperature programmed desorption of oxygen on Pt(111) using DFT derived coverage dependent desorption barriers” to Topics in Catalysis
    • “Effects of O_2 and SO_2 on the capture capacity of a primary-amine based polymeric CO_2 sorbent” to Industrial & Engineering Chemistry Research
    • Manuscripts and supporting information were generated in Emacs + org-mode, and exported to LaTeX for submission

Document overview

./pycse-1.png

  • Code is written and executed in the editor. Output captured.
  • Exported to blog, HTML and PDF. Mobi and ePub are also possible.

A subsection of the document

./pycse-2.png

Embedded text, math, code and output.

  • 300+ pages of using python to run quantum chemical calculations
  • might be 50+% code!
  • Every example written and run in the book
    • no cut and paste code/results
    • It ran correctly at least once

./dft-book-1.png

Org-mode in documenting computational/research workflow

  • Separation of data generation and analysis promotes data reuse
  • Easier to read scripts

./fe-ni-al.png

Do some demos

Challenges

  • Org-mode is deeply integrated with Emacs
    • pro - You get all the power of Emacs
    • on the other hand - You have to learn Emacs and Emacs-Lisp
    • Other editors can mimic the capabilities
  • Org-mode is markup and functionality
    • restructured text + Sphinx is the closest in spirit
    • has extensibility (in Python!)
    • currently lacks editor integration even in Emacs
  • Getting exported format perfect can be challenging
    • This is a general problem with converting formats
    • I actually prefer reading content in org-mode now
    • My students prefer to read HTML/pdf

Conclusions

  • Reproducible research needs new tools, new workflows
    • Users will probably need to customize tools for their needs
  • Emacs + org-mode was a game changer in reproducible research for me. It enabled:
    • Authoring two books on using python in science and engineering
    • A python based blog
    • Scientific manuscripts with thorough documentation of data, methods, etc…
    • Documenting computational work
    • Managing the work-life of an engineering professor
  • The key features that enabled this are
    • Extensible editor
    • Extensible markup language
    • Scripting (Python + others)

Thanks for your attention!

https://github.com/jkitchin/scipy2013

Links to examples

import sys

print sys.version

# where the platform independent Python files are installed
print sys.prefix

pycse

Outline folding, latex rendering, blog post ../../../pycse/pycse.org::6531

Rendered pdf file:../../../pycse/pycse.pdf

Blog lisp

275 lines of emacs-lisp creates blogofile (python-based static blog framework) posts ../../../.emacs.d/blogofile.el

http://jkitchin.github.io

Manuscript example

clickable links ../../manuscripts/01-resubmitted-IER-SO2/IER-SO2.txt::20

Embed data files into document file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::20

Embed data files, read data from scripts file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::33

Tables of data inline. Use the data to make a figure. file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::175

Build the output pdf file:~/Dropbox/CMU/manuscripts/01-resubmitted-IER-SO2/supporting-information.org::455

Resulting pdf ../../manuscripts/01-resubmitted-IER-SO2/re-submitted/supporting-information.pdf

dft-book

Example of integrated prose/code. Why you want deep integration with editor (menu TODO) file:../../classes/06-640-Molecular-Simulations-Fall-2012/dft-book/dft.org::*Simple estimate of the adsorption energy

file:../../classes/06-640-Molecular-Simulations-Fall-2012/dft-book/dft.pdf

build

elisp:(org-beamer-export-to-pdf)

file:kitchin-emacs-orgmode-python.pdf