http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16252
- JV
- Motivation
- Software needed everywhere
- Goals
- How to engineer AS of quality?
- Roadmap
- Recognition
- How to get software recognized as an academic contribution
- Best practices
- Balance promises with skepticism
- Recognition
- Output
- Manifesto
- Call to arms!
- How to organize prizes
- How to write proposals, letters of recommendation
- Rules, best practices
- Mythbustrs
- SWOT Analysis
- Manifesto
- Motivation
- JH
- Drafts of things to share?
- Eg Resources, version control
- Drafts of things to share?
- Claude Kirchner
- Design & implementation of robust, reliable systems
- Using Clu, Lisp, C ...
- Scientific director INRIA 2010-2014
- Design & implementation of robust, reliable systems
- Rob van Nieuwpoort
- Cloud computing, compilers, ...
- Making software part of the infrastructure
- How to measure the impact of software?
- Papers
- Producing other software
- Mike Croucher
- Understanding importance of software
- Emergency repair
- Getting software in shape for the masses
- How to engineer organizations to value software
- Carole Goble
- Infrastructure for life sciences
- Multi-institution, multi community software development
- Want software to get credit!
- How to get reputation
- Infrastructure for life sciences
- Alice Allen
- Editor of Astro Physics software (?)
- Register all available software
- Support reproducibility
- Curate the repo
- Want to help contributors get support
- Editor of Astro Physics software (?)
- Katy Huff
- Physics software (?)
- Produces a lot of code
- Software Carpentry ...
- Oscar Nierstrasz
- Software Evolution
- How to get industry involved in academic software?
- Cecilia Aragon
- Visual analytics
- Human centered data science
- How to get software development effort recognized?
- Contribute to career path
- Christoph Becker
- Management of digital resources for future use
- Software sustainability
- Software quality and curation
- Long-term effects often not considered when software is built
- Management of digital resources for future use
- Kevin Crowston
- How technologies change the way people work
- Eg open source development
- Citizen science
- How to build sustainability communities around open source software?
- How technologies change the way people work
- Andrei Chis
- Improving how developers create software
- Moldable development tools
- Domain aware
- Change tools to become moldable and extensible
- Software and applications are essential to research
- Many community challenges!
- Make sure work gets recognized and rewarded
- How to get academic tools adopted in industry
- Having users is good and bad!
- JV: old chinese curse — "I wish you many users"
- Improving how developers create software
- Daniel Garijo
- Scientific workflow
- Benoit Combemale
- Domain specific languages
- Model driven development ...
- Develop lots of software for real world case studies
- Software to assess software impact
- Domain specific languages
- James Howison
- Focus on collaboration
- Development of software in science
- Focus on collaboration
- Dan Katz
- Computational scientist
- Looking at how scientific software is developed
- Matt Vaughn
- Cyber infrastructure
- Democratizing access
- Creating environments for distributed, heterogeneous development
- Industry standards, ...
- Sharing digital objects
- How to engage participation?
- Matt Turk
- Ensuring (astronomy etc) software is available to community at large
- Katy Kuksenok
- Cognitive resources
- How are these resources shared and managed?
- Jeff Carver
- SE for Science
- Empirical SE
- Focus on human side
- How do people develop software?
- SE for Science
- Ralf Laemmel
- Software cristometries (?)
- Megamodels
- Models about models
- Documenting software development
- Rob Haines
- Software sustainability
- Support researchers in developing academic software
- Complete gamut of researchers and skills
- Caroline Jay
- Human computer interaction
- Needs lots of software, even to gather data
- Much of the software is simply not available
- Human computer interaction
- Jurgen Vinju
- I'm a programmer that somebody made into a professor.
- Dan Katz — Sustainable Software for Science
- WSSSPE
http://wssspe.researchcomputing.org.uk/- Three workshops so far
- Reports on line
- Hot topics
- White paper/journal paper about best practices in developing sustainable software
- Funding Research Programmer Expertise
- Software citation — Software Credit Working Group
- WSSSPE
- Mike Croucher — Supporting Research Software Engineering
- Emergency services for software
- Faster code
- Eg avoiding expensive operations in loops
- Using vectors instead of loops
- First use profilers and then look for anti patterns
- Often order of magnitude speedup
- Migrating to clusters
- GPU Computing
- Faster code
- Users "don't care" about clean engineering
- How do you know users can still work with the code?
- Small steps, back and forth
- CA: we get data scientists to sit with users several times a week
- Users are afraid I'm going to "do computer science" to them.
- Problem
- Software is not valued in academia
- CA: We have to stop calling software "infrastructure".
- The future
- Core funded software support staff
- Faculty tenure based on software output
- The first RSE professor?
- Please help
- First contact
- Demonstrating impact
- New technology
- Good practice
- Changing culture
- CK: INRIA has ~50 tenured engineers
- Can be assigned to a project for shorter or longer periods
- Emergency services for software
- Christoph Becker
- How to design for sustainability?
http://sustainabilitydesign.org- Sustainability of what?
- Sustainability debt
- Effects of software systems
- Immediate effect
- Enabling effect
- Structural effects
- Five dimensions
- Economic
- Technical
- Social
- Environmental
- Energy efficiency
- Individual
- Need examples of each
- Effects of software systems
- How to design for sustainability?
- Recognizing software as a primary research output
- Recording it
- Assessing it
- Measuring it
- Rewarding people for doing it
- Participants: Jeffrey Carver, James Howison, Robert Haines, Caroline Jay, Kevin Crowston, Oscar Nierstrasz
- Venue-specific empirical survey of academic software
- Select key/peak conferences
- Goal: provide data set that can be used to answer a variety of questions
- What are software practices in this domain?
- What software is cited? How is it cited?
- Research questions
- Which software ends up being mentioned in papers?
- What's the status of the developers?
- PhD students?
- Engineers?
- Where is the software now?
- What practices used?
- E.g., version control, testing etc.
- How was the technology chosen?
- Cf technology acceptance models
https://en.m.wikipedia.org/wiki/Technology_acceptance_model - Does this field adopt a dominant technology, or is it fractured?
- Cf technology acceptance models
- Who paid for the software?
- What is the return on investment?
- What problematic issues commonly arise?
- What is difficult?
- Pain points?
- What recommendations would improve the quality of software in the fields?
- Technology
- Practices
- ...
- What is difficult?
- Procedural questions
- How to achieve variance?
- What do you code for?
- Research questions?
- Venues?
- Grounded approach?
- Or predefined hypotheses?
- What do you code for?
- Would machine learning help to classify software in papers?
- How to start?
- Need a conference/domain with requirement for reproducibility
- Interviews first to generate hypotheses
- ...
- How to structure?
- Start with just 3 or 4 research questions
- How to achieve variance?
- Academic software project typology
- What is an academic sw project?
- No consensus
- Dimensions
- Intentions to write software
- For theory building and validation
- As part of empirical research method
- For SE itself
- The sw is the output
- Fix your own problem
- Automate tasks
- Demonstrators
- Hobby projects
- Exploratory programming
- Teaching
- Benchmarking
- Intentions to write software
- More dimensions
- Characterizing audience
- Maturity level
- What is an academic sw project?
- Examining sustainability for a particular project
- Spider graphic for various dimensions
- Sustaining software vs sustaining the team
- Making the impact of software more visible?
- Some fields like bioinformatics do better
- Less values in fields like physics
- DOIs for code with and without review
- Institutional change
- Through fear
- Through threat
- Need venue for software
- Emphasize software more in recommendation letters
- Provide templates
- Workshops
- Software awards
- Jeffrey Carver — SE practices in Science
- Lessons learned
- V&V is hard
- Agile not useful
- HL Languages are rare
- SE-CSE Workshop series
http://SE4Science.org- Facilitate interaction between SE and computational scientists
- Very different pressures on software
- Testability etc not considered as important
- Dan Katz: half life of business sw is 6 years; for scientific sw it's 6 months
- Challenge: eliminate stigma associated with SE
- How do you introduce SE practices?
- Demonstrate, don't tell
- Solve an actual problem
- Typically speed or bugs
- Stealth git
- Small steps; big value
- Word of mouth
- How do you introduce SE practices?
- NB: scientific productivity ~= sw productivity
- Lessons learned
- Matt Turk — Engineering yt
- Astronomy simulation software
- Many different systems and formats
- NIH disease
- yt project
http://yt-project.org/- Python based
- Cython C code all generated from Python
- Bespoke C routines all removed
- Python based
- Over time, community grew from users to user-developers
- Eventually hobbyist developers that did not use their own code
- Big adoption spike when fully automated installation became available
- Installs an isolated Python stack for your platform
- Practices
- yt enhancement proposals
- Fido and Code reviews
- Handle pull requests
- All requests are reviewed
- Continuous integration testing based on jenkins
- Handle pull requests
- Communication
- Based on slack
https://slack.com/
- Based on slack
- Governance
- Membership list
- Contributors are voted in
- Recognizes contributions
- Membership list
- Code of conduct
- No top-down coordinated team
- Tasks are self-assigned
- Kanban interface
- Tasks are self-assigned
- Failure modes
- Level of engagement
- Get a life!
- "Why was I not informed?"
- "I own this code. Why was this change introduced without me in the loop?"
- Innovation vs stability
- Changes that break everything
- Infrastructure dependencies
- Sweeping changes
- People involved have limited time available
- Narrative documentation
- Developers don't have experience
- Overpromise, underdeliver
- Underpromise, overdeliver
- Level of engagement
- Notes
- Most don't use IDEs
- Most develop "pragmatically"
- Lower diversity than field as a whole
- Mostly white males
- Little FLOSS experience
- Innovation diffused and through yt
- Citation is hard
- 2011 paper had 6 authors
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=QTmv2p0AAAAJ&cstart=20&pagesize=80&sortby=pubdate&citation_for_view=QTmv2p0AAAAJ:9yKSN-GCB0IC- 2 authors did not contribute code
- Need evolving author lists
- 2011 paper had 6 authors
- Caroline Jay, Robert Haines — Software as Academic Output
- When does sw count as scientific output?
- Software is hidden
- Role of sw in research
- Enables research
- Enabling research in a new way
- Or to a new group
- Software is the research
- Top 23 CHI 2016 papers
- 18 concerned software
- Only 4 described software
- Pseudocode
- Full analysis results
- Data + source code
- Tool + source code
- Only 2 provided source code
- Case study: crowdsourcing cataloguing fossil database
- Web app
- RQ Should people be required to register before they can contribute to a study?
- Answer: no, but this should be an option
- Zenodo is a platform for publishing data sets
http://zenodo.org- Docker automates deployment within software containers
https://en.m.wikipedia.org/wiki/Docker_(software)
- Docker automates deployment within software containers
- Academic software should be
- Findable
- Accessible
- Reusable
- Extensible
- Claude Kirchner — Software Heritage
- Ghezzi 2009 TOSEM paper
- On 20% of tools from papers 2001-2006 are installable
- Software Heritage
http://www.softwareheritage.org- Collect all software and preserve it
- Index and organize it
- Unique identifiers
- Web site to be launched next week
- Ghezzi 2009 TOSEM paper
- Daniel Garijo — Software Metadata
- "Dark software" in geosciences
- Phd-ware
- "Don't worry, you don't have to start your code from scratch"
- Counterpart of "dark data" [Heidorn 2008]
- Phd-ware
- Bourne and Gil, quantifying value of software through "reproducibility maps"
- 2 months efforts to reproduce a study
- OntoSoft ontology for scientific software metadata [Gil et al 2015]
http://ontosoft.org/ontology/software/- Six dimensions
- "Dark software" in geosciences
- Jurgen Vinju — Organising research team around the research software
- Use the source
https://github.com/usethesource - Lessons
- A research team is not a software team
- Fewer resources
- More investment in efficiency
- Seniors responsible for the long term
- Maintenance, documentation ...
- A research team is not a software team
- Use the source
- Dan Katz — Software Citation
- FORCE11 Software Citation working group
https://www.force11.org/group/software-citation-working-group - Working document
https://www.force11.org/software-citation-principles- 6 principles
- Cite the software if it impacts the results
- Don't cite office software
- Software papers
- Can cite software or a paper about the software
- FORCE11 Software Citation working group
- Katy Kuksenok — Best Practices by Any Other Name
- "I don’t want to use version control because I don’t want the world to see my terrible code."
- Alice Allen — ACL: restoring reproducibility
- Making astrophysics software accessible
- Robert Haines — A short history of research software engineers in the UK
- Several experiences contribution software to papers w/o acknowledgment
- SSI Collaborations Workshop 2012
http://software.ac.uk/cw12- Call to arms
- New name: research software engineers
- Ralf Lämmel —101companies
- Software chrestomathy
http://softlang.uni-koblenz.de/chrestomathy/ - 101companies
http://101companies.org/
- Software chrestomathy
- Cecilia Aragon — UE eScience Institute Initiatives
http://escience.washington.edu/- What's big data?
- It's two orders of magnitude larger data than you're used to dealing with?
- MSc in data science at UW
http://www.datasciencemasters.uw.edu/ - Incubator program
- Tools, environments, support
- Human-centered data science lab
https://depts.washington.edu/hdsl/
- What's big data?
- Rob van Nieuwpoort — eScience in NL
- Bridge gap between scientists and CS
- Netherlands eScience center
https://www.esciencecenter.nl/- Provide services in all NL
- Three tracks
- Management
- Technology
- Research
- Project kickoffs are important
- Establish rules for co-authorship
- Research engineers
- Also publish in eScience venues
- Domain scientists
- Research engineers
- Agree on software licenses
- Establish rules for co-authorship
- eStep
- Common repositories for knowledge resulting from projects
- Software
- Best practices
- Common repositories for knowledge resulting from projects
- Citations
- Reviewing FORCE11 software citation principles
https://www.force11.org/software-citation-principles - Each person adopts a persona
- Reader
- Reviewer
- Funding agency
- Reviewing FORCE11 software citation principles
- Research directions
- How much academic sw is "hidden"?
- Built by impactstory
https://impactstory.org/ - Only python and R
- Automatically analyze open source projects
- Examples
- Science Code Manifesto
http://sciencecodemanifesto.org/
- Science Code Manifesto
- Empirical/theoretical question
- Anti-dogma
- Recommendations
- Duties as citizens
- How to cite software?
- Sentence per breakout group
- First person pledges
- "I will ..."
- Topics
Try to come up with pithy statements on each. Narrow down to ~5 items.- Teaching attitudes toward appreciation of software
- Teaching how to design
- Ontology
- Citations
- Attribution
- Recognition of sw as an output
- Intellectual contributions
- Project sustainability
- Reproducibility
- Past
- Program
- Materials
- Citations
Dan- Reviewing FORCE11 etc
- Taxonomy of software contributor roles
- How to provide credit to all those involved in software
- Call for ...
- Guidance for tenure committees
- Templates for recommendation letters
James
- INRIA Evaluation grid
- Taxonomy
- Spreadsheet of example projects
- Typology paper
- Handbook TOC
- Study design
- State of art
- How much sw is open?
- Award proposal
- Proposal for RCN Workshops
- Sustainability Debt analysis example case