csvconf2023

Slides and resources from my CSV Conf 2023 keynote

https://github.com/karthik/csvconf2023

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com, wiley.com, acm.org, joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

csvconf csvconf2023 keynote
Last synced: 6 months ago · JSON representation ·

Repository

Slides and resources from my CSV Conf 2023 keynote

Basic Info
Statistics
  • Stars: 16
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
csvconf csvconf2023 keynote
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Citation

README.md

How to enable and sustain thriving Open Source Ecosystems (OSE)

Slides from a keynote presentation at CSV Conf 2023 in Buenos Aires, Argentina. April 19th, 2023

Abstract

Software impacts virtually all areas of research but has been a heavily undervalued contribution. Over the past decade alone, the research software landscape has changed dramatically. It is now substantially easier to start new software projects, find technical resources, and join a friendly community of practice. The research software engineer career track has also taken off and made it easier for many individuals to build careers in this field. However, several key challenges remain. Despite the growing recognition of research software, it is still challenging to demonstrate impact or find support for the maintenance of existing software. In this talk, I describe some ideas on how to uncover software that is driving research and construct knowledge graphs to ask questions about software use and sustainability. I also describe the various conditions necessary to turn nascent software projects into sustainable ecosystems.

General takeaways

  • Since research software is poorly cited, it’s hard to get a good picture of the software used in research. While software bills of materials are technically easy to generate and will provide a lot of value, they are not the norm in research or publishing.
  • One workaround is to extract scientific software entities from PDFs using tools like Grobid and the software mentions extractor. If carried out on a substantial collection of articles from a field where open source is widely used, it would be possible to ask all kinds of interesting questions like which software is driving research in a certain area, where the opportunities and challenges are, and how to use of tools is changing over time.
  • Many researchers write last-mile analysis code that never goes any further. Some of this code, especially the implementation of new methods, may see the light of day as prototype software. These are minimum viable prototypes, with a small test suite and documentation but not designed for speed or stability. A subset of prototypes that find product-market fit are the ones that enter the research software infrastructure space and need to be sustained.
  • One way for software projects to raise their visibility is to align roadmaps with adjacent tools (adjacent in the sense of hard dependencies or usage-based dependencies). This would reduce friction, allow for resource sharing, and raise visibility as a collection of tools (e.g. spatial data science, Tidyverse)
  • Besides solving technical challenges by aligning with the local ecosystem, projects also need to be in alignment with the larger ecosystem (actors and institutions that enable the work).
  • The definitions of software sustainability are clear, but a broader definition I provide is that “Software is sustainable as long as the people behind it have the resources to continue fulfilling its mission”.
  • There are examples of widely used software that have run out of resources while dealing with an outdated stack. Rather than sustain those tools, the community can choose to replace them with something more modern and aligned with the needs of users (see the IRAF → Astropy example below). In other words, not everything needs to be sustained forever.
  • At POSE training, we have identified 5 core areas that are necessary to sustain an OSE. These are org structure (the managing org that can guide future growth), governance (robust decision-making and collaboration management), business perspectives (managing hidden infrastructure costs and resources, which includes funding), security (technical and non-technical threats), and community.
  • Using Nadia Eghbal’s taxonomy (toy, club, federation & stadium), it would be a good exercise to categorize your project to see how best to engage your audience in meaningful ways.
  • Once projects have found product-market fit, there is little in the way of long-term support (funding or otherwise). COPs (and Ecosystem-level entities) can use tools like CHAOSS to surface certain types of issues (low maintainer growth, time to PR close as a way to engage new contributors) and address those before it is too late. Maintainer burnout is another growing problem that needs attention.
  • Security issues are important. While we have not seen major security issues in scientific open source (compared to the larger OSS community), it is still important to stay on top of CVEs and use CI/CD more extensively. It would also be a good idea to keep an eye on non-technical threats, like bad actors, and poor governance.
  • If all else fails and a project needs to end, it must be done responsibly. This includes notifying all stakeholders (downstream dependencies, users, trainers), providing pointers to comparable alternatives, archiving all code to support reproducibility efforts, and offering enough lead time (See the r-spatial example below).
  • The calls to action are:
    • If you are a developer, find ways to align with your local ecosystem to coordinate roadmaps and resources, and raise your visibility
    • If you’re a COP, find ways to support maintainer burnout, support governance templates, document managing org options, etc.
    • Lastly, folks operating at the level of an ecosystem (funders, foundations, training partners, infrastructure providers) can also pick and address one or more of these issues at scale.

Resources

How to cite this talk

Ram, Karthik. (2023, April 12). How to enable and sustain thriving Open Source Ecosystems (OSE). Zenodo. https://doi.org/10.5281/zenodo.7822917

Acknowledgements

This talk was greatly improved by discussions with Arfon Smith, James Howison, Sean Goggins, Patrice Lopez, and Abby Cabunoc Mayes.

Questions

Questions or comments are welcome at karthik dot ram at gmail.

Owner

  • Name: Karthik Ram
  • Login: karthik
  • Kind: user
  • Location: Berkeley, CA
  • Company: @ucberkeley

Research associate professor at UC Berkeley

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  How to enable and sustain thriving Open Source Ecosystems
  (OSE)
message: >-
  If you reference this talk, please cite it using the metadata
  from this file.
type: talk
authors:
  - given-names: Karthik
    family-names: Ram
    email: karthik.ram@berkeley.edu
    affiliation: 'University of California, Berkeley'
    orcid: 'https://orcid.org/0000-0002-0233-1757'
identifiers:
  - type: doi
    value: 10.5281/zenodo.7822917
    description: Archived copy of my keynote talk from CSV Conf 2023 in Buenos Aires
keywords:
  - software sustainability
  - research software
  - open-source
license: CC-BY-4.0
date-released: '2023-04-12'

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 61
  • Total Committers: 1
  • Avg Commits per committer: 61.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Karthik Ram k****m@g****m 61

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels