researchfish-investigation-across-councils

ResearchFish data on Software Outputs. Contact: @marioa

https://github.com/softwaresaved/researchfish-investigation-across-councils

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

policy
Last synced: 9 months ago · JSON representation ·

Repository

ResearchFish data on Software Outputs. Contact: @marioa

Basic Info
  • Host: GitHub
  • Owner: softwaresaved
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 7.38 MB
Statistics
  • Stars: 0
  • Watchers: 9
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
policy
Created over 9 years ago · Last pushed almost 9 years ago
Metadata Files
Readme License Citation

README.md

ResearchFish investigation

Purpose

This code analyses research outcomes provided by the EPSRC (but also available on Gateway to Research) to investigate some aspects of the outcomes: the number, the universities who register them, where the software is stored, whether it's stored under an open licence, and whether the URL linked to the storage is live.

The results of the investigation will be published on the Software Sustainability Institute's website under the title: "Researchfish®: what can it tell us about software in research?".

Files

The main directory contains:

  1. researchfishanalysis.py: the analysis code itself
  2. requirements.txt: a list of the libraries used by code
  3. What questions do we want to answer with this data?: a list of questions I want to answer with the data

The data directory contains:

  1. softwareandtechnicalproductsearch-1490799160221.csv: a csv file downloaded on 29 March 2017 from Gateway to Research from the outcomes page. This includes all outcomes registered on the site within the "software" categories: E-business platform, Grid application, Software and Webtool/application.
  2. "researchfish_results.xlsx": results from the analysis
  3. "impact.txt": all of the outputs' impact statements merged into a single text file for easy loading into a word frequency counter

The charts directory contains png images of the charts produced by the analysis. The venv directory is used by the virtual environment.

Requirements

The code runs on Python 3.5.

The code runs in a virtual environment which can be installed following this guide.

Inside the virtual environment, install the libraries by running the command:

pip install -r requirements.txt

Once they're installed, run the code with the command:

python research_fish_analysis.py

About the data

Following a discussion about "software outputs" with Louise Tillman, I was offered all of the software outputs for EPSRC grants.

Louise describes the content as:

"These are self-reported outputs that PIs have submitted. We had a good submission rate overall (I think the 2016 report that is due to be published on our website in the next couple of weeks says that we had 95% of PIs completed with 100% completion on current grants). However this does not tell you much about the quality of the data (software is not one of the mandatory boxes that must be filled in to comply). It is not the start date but the end date of the grants that controls whether PIs are asked to submit:

  • Normally mandatory if current or <5 yrs since grant end date
  • Either optional or not possible after that (most EP grants >6 yrs old ‘closed’, but could be re-opened on request).

So I think the earliest start date of the grants in the data set is 2006.The newest grants (i.e. those that have only just been funded / a year or so in) are not asked to submit outcomes immediately as that would not be realistic."

Troubleshooting

Sometimes the URL checker hangs on a URL. Hit ctrl+ c to skip the URL.

Owner

  • Name: The Software Sustainability Institute
  • Login: softwaresaved
  • Kind: organization
  • Email: info@software.ac.uk
  • Location: United Kingdom

The Software Sustainability Institute supports better research by helping researchers to build and use better software.

Citation (CITATION)

To cite this ResearchFish analysis tool in publications, please use:

"We used the ResearchFish analysis tool developed by The University of Southampton on behalf of the Software Sustainability Institute."

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • Counter ==1.0.0
  • DateTime ==4.1.1
  • cycler ==0.10.0
  • et-xmlfile ==1.0.1
  • httplib2 ==0.9.2
  • jdcal ==1.3
  • matplotlib ==1.5.3
  • numpy ==1.11.2
  • openpyxl ==2.4.1
  • pandas ==0.19.1
  • pyparsing ==2.1.10
  • python-dateutil ==2.6.0
  • pytz ==2016.7
  • requests ==2.12.3
  • scipy ==0.18.1
  • six ==1.10.0
  • xlrd ==1.0.0
  • zope.interface ==4.3.2