softwareimpacthackathon2023_tracing_dependencies
Tracing the dependencies of open source software mentioned in the biomedical literature
https://github.com/borisveytsman/softwareimpacthackathon2023_tracing_dependencies
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
Tracing the dependencies of open source software mentioned in the biomedical literature
Basic Info
- Host: GitHub
- Owner: borisveytsman
- Language: Jupyter Notebook
- Default Branch: main
- Size: 50.1 MB
Statistics
- Stars: 8
- Watchers: 11
- Forks: 2
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Steps to Reproduce
- Install dependencies using
pip install -e .or with Just withjust install. - Download some additional data files from: https://zenodo.org/records/10045415
- bioconductorwithmention_counts.ndjson.tar.gz
- cranwithmention_counts.ndjson.tar.gz
- pypiwithmentions.ndjson.tar.gz
- commdisambiguatedcvis_count.json.tar.gz
- Decompress each of these files and place them in the
data/directory. - Run the
preprocess.ipynbnotebook in thenotebooks/directory. This will create the "dois-graph/full-graph.gexf" file in thedata/directory. - Run the
directed-core-analysis.ipynbnotebook in thenotebooks/directory. This will use the "dois-graph/full-graph.gexf" file in thedata/directory to create the "pruned-network.csv" and "pruned-network.gexf" files in thedata/directory. - Run the
visualization.ipynbnotebook in thenotebooks/directory. This will use the "pruned-network.csv" file in thedata/to create the final mentions over katz centrality visualization.
All other visualizations in the original paper are created directly in Gephi from the "pruned-network.gexf" file. In Gephi, node size represents Katz centrality with a factor of 4 between the minimum and maximum value, nodes then follow the Yifan Hu layout with optimal distance 100.0 and relative strength 0.2 on edges
Exploring the dependencies of the CZI mentions dataset
Exploring the Dependencies of Mentioned Software in the CZI Software Mentions Dataset

We construct a graph of dependencies between software packages mentioned in the CZI Software Mentions Dataset. We then use the Katz centrality score to rank the importance of each software package. The data is available as Brown, E. M. (2023). A Dependency Graph for 460,000 Papers and Their Software Mentions from the CZI Software Mentions Dataset (1.0.0) [Data set]. CZI Research Software Hackathon. Zenodo. https://doi.org/10.5281/zenodo.10048132.
We find some interesting examples of "most important" (given that some of the ecosystems are incorrectly labelled): * PACE is the most mentioned software but may not be the most "critical" / connected * VELVET is seemingly the "most critical" / connected software that has very few mentions * PERMANOVA is seemingly a "true" (mentioned and identified in the correct ecosystem) which is incredidly important and correct and has a number of mentions.
Software That is Important but has No Mentions in the Literature
All of these are NEVER mentioned.
PyPI - six: Python 2 and 3 compatibility utilities.

Bioconductor - BiocIO: a package for basic file handling and some formats

CRAN - isoband: An R package to generate contour lines and polygons.

Exploring the Dependencies of Imported Software within Notebooks from the Combined CZI Software Mentions Dataset and
About this project
This repository was developed as part of the Mapping the Impact of Research Software in Science hackathon hosted by the Chan Zuckerberg Initiative (CZI). By participating in this hackathon, owners of this repository acknowledge the following: 1. The code for this project is hosted by the project contributors in a repository created from a template generated by CZI. The purpose of this template is to help ensure that repositories adhere to the hackathons project naming conventions and licensing recommendations. CZI does not claim any ownership or intellectual property on the outputs of the hackathon. This repository allows the contributing teams to maintain ownership of code after the project, and indicates that the code produced is not a CZI product, and CZI does not assume responsibility for assuring the legality, usability, safety, or security of the code produced. 2. This project is published under a MIT license.
Code of Conduct
Contributions to this project are subject to CZIs Contributor Covenant code of conduct. By participating, contributors are expected to uphold this code of conduct.
Reporting Security Issues
If you believe you have found a security issue, please responsibly disclose by contacting the repository owner via the security tab above.
Licenses
Licenses are annotated according to the REUSE Specification v3.0.
Please see the single files or respective .license files for the actual licenses.
Generally,
- code is licensed under the MIT license
- documents are licensed under CC-BY-4.0
- some data files and other files are licensed under CC0-1.0
Cite this project
To cite this project, please use the metadata in CITATION.cff.
You can also copy and paste an APA-formatted string, or a BibTeX entry directly from the "Cite this repository" widget on GitHub.
Owner
- Name: Boris Veytsman
- Login: borisveytsman
- Kind: user
- Location: Bay Area, CA, USA
- Company: Chan Zuckerberg Initiative and George Mason University
- Website: http://borisv.lk.net
- Repositories: 4
- Profile: https://github.com/borisveytsman
GitHub Events
Total
- Issue comment event: 6
- Push event: 14
- Pull request event: 1
- Pull request review event: 2
- Fork event: 1
- Create event: 1
Last Year
- Issue comment event: 6
- Push event: 14
- Pull request event: 1
- Pull request review event: 2
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 2
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 15 minutes
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 3.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Daniel-Mietchen (2)
Pull Request Authors
- sdruskat (2)
- LaurentHebert (1)