Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: edbennett
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 2.15 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Survey of reproducibility in hep-lat submissions in 2021

DOI

This repository makes available the results of a survey of all submissions and cross-lists to the hep-lat arxiv in 2021.

Methodology

Obtaining data

Papers were downloaded in PDF format using the arXiv bulk downloader, and filtered with a listing of papers from the arXiv API.

Information on which papers were conference talks were obtained via the Inspire API.

Both of these steps are documented in the file data_acquisition.ipynb.

Submissions with an identifier starting 21 (i.e. first appeared on the arXiv between 1 January and 31 December 2021), categorised in hep-lat (either as primary category or as a cross-list) are included.

Surveying data

Each paper was skim-read at a very high-level, and search tools (including pdfgrep) were used to search for relevant keywords.

Availability of data and workflows cited was verified. Otherwise, beyond what is already mentioned, all information was taken from what was reported in the submissions (and in a few cases in articles explicitly cited); no effort was made to seek out tools or data mentioned but not cited, but that may be publicly available.

Data were input into a Microsoft Excel spreadsheet. Some features of Microsoft Excel were used to ensure consistency, including conditional formatting and data validation.

Data structure

The survey results are in the file survey_2021.csv. This is in comma-separated format, with the columns as documented below.

Fields

  • arXiv ID: The arXiv identifier of the submission.
  • Primary category: the primary arXiv category of the submission.
  • Journal: The journal in which the submission was published, if known. (From the Inspire API.)
  • Is proceedings: Whether or not the work is from conference proceedings. Values are Y/N.
  • Presents new numerical results: Whether or not the work includes any new numerical results. This includes a table containing numbers with error bars, or a plot with numbers on the axes, where these are not directly quoted from other work. Values are Y/N, with a ? indicating that this is ambiguous.
  • UK authors: Whether or not at least one author has at least one affiliation to an institution in the United Kingdom. Values are Y/N.
  • Generates field configurations: Whether the work presented involved generating new field configurations by some Monte Carlo-adjacent algorithm. Values are Y/N, with a ? indicating that this is ambiguous.
  • Specifies software used for configuration generation: Whether the work specifies what software application or applications were used for generating configurations. Values are as in "Citation formats" below.
  • Software used for configuration generation: A comma-separated list of any software mentioned being used for generating field configurations.
  • Repository/hosting service for configuration generation code: A comma-separated list of any data repositories or hosting services used for any software mentioned in the previous column. Personal website indicates any web page controlled by an individual, including personal home pages on institutional web servers.
  • Performs measurements: Whether the work involves computing observables from gauge configurations. Values are Y/N, with a ? indicating that this is ambiguous.
  • Specifies software used for measurement: Where work performed measurements, whether the work specifies what software application or applications were used for this. Values are as in "Citation formats" below.
  • Software used for measurements: A comma-separated list of any software mentioned being used for performing measurements.
  • Repository/hosting service for measurement code: A comma-separated list of any data repositories or hosting services used for any software mentioned in the previous column. Personal website indicates any web page controlled by an individual, including personal home pages on institutional web servers.
  • Uses existing configurations: Whether the work makes use of field configurations generated as part of earlier work, either by the authors or by others. Values are Y/N, with a ? indicating that this is ambiguous.
  • Configuration hosting infrastructure acknowledged: Ifthe work acknowledges the infrastructure used to host and distribute field configurations used (e.g. ILDG or a Regional Grid), a comma-separated list of the services acknowledged.
  • Cites existing configurations: Whether the work provides a citation for any existing configurations used. Values are as in "Citation formats" below.
  • Configurations generated by: A comma-separated list of collaborations whose field configurations were used.
  • Reanalyses other existing data: Whether non-field configuration data from other work is incorporated into an analysis. Values are Y/N, with a ? indicating that this is ambiguous.
  • Cites other existing data: Where the work uses existing field configurations, whether and how the work is acknowledged. Values are as in "Citation formats" below.
  • Publishes data: Whether the work makes the data generated available in machine-readable format (i.e. beyond plots and numbers in tables in the PDF). Values are Y/N, with a ? indicating that this is ambiguous.
  • Data available on request?: Whether the work does not make data available, but claims that data would be available if requested. Values are Y/N, with a ? indicating that this is ambiguous.
  • Repository used for data: A comma-separated list of any data repositories or hosting services used for any data released. Personal website indicates any web page controlled by an individual, including personal home pages on institutional web servers.
  • Specifies software used for analysis: Where the work generated numerical results, whether and how the work specifies any software used for analysing these results. This is any software used outside of generating configurations and performing measurements that led to the presented results. Values are as in "Citation formats" below.
  • Software used for analysis: A comma-separated list of any software mentioned being used for data analysis.
  • Repository/hosting service for analysis code: A comma-separated list of any data repositories or hosting services used for any software mentioned in the previous column. Personal website indicates any web page controlled by an individual, including personal home pages on institutional web servers.
  • Publishes parts of analysis: Whether any aspect of the bespoke analysis workflow for this work was made available. Values are Y/N, with a ? indicating that this is ambiguous.
  • Publishes full analysis: Whether software that will reproduce the full analysis was made available. Values are Y/N, with a ? indicating that this is ambiguous.
  • Acknowledges an HPC centre: Whether computational time on shared computing facilities was acknowledged. Values are Y/N, with a ? indicating that this is ambiguous.
  • Acknowledges Supercomputing Wales: Whether the specific HPC service [Supercomputing Wales][scw] was acknowledged.
  • Acknowledges DiRAC: Whether the specific HPC service [DiRAC][dirac] was acknowledged.
  • Review paper: Whether the paper is a review or review-like (i.e. re-presents results previously presented elsewhere with attribution). Values are Y/N, with a ? indicating that this is ambiguous.
  • Cites any other software: Whether any other software not directly leading to the presented results is attributed. Values are as in "Citation formats" below.
  • Comments: Any other comments not fitting into the above categories.

Citation formats

  • No: No information is given
  • Mentioned by name: a piece of software, data, or collaboration is mentioned but no other information is provided
  • Data repository citation: where a citation is made to a data repository such as Zenodo
  • Paper citation: a citation is made to a paper, with no other information on how to access the data or software
  • URL citation: a URL is included in the references section/bibliogaphy
  • Included: code is included as part of the publication
  • Footnote/inline URL: a URL to the resource is included either inline in the text or in a footnote, but not in the references section/bibliography

Analysis

A summary initial analysis of the data is presented in the Jupyter notebook analysis.ipynb. Results from this analysis were presented at the UKLFT Annual Meeting in Liverpool in May 2022 and at the 39th annual symposium on Lattice Field Theory (LATTICE 2022).

Version history

  • 1.1.2: Fix error in previous release; minor plot tidying
  • 1.1.1: Allow plotting style to be switched for Lattice 2022 proceedings
  • 1.1.0: Updated analysis notebook as presented at Lattice 2022, removing split between UK and non-UK. Field "Lattice data grid acknowledged" renamed to "Configuration hosting infrastructure acknowledged" to avoid confusion with the specific ILDG Regional Grid called the "Lattice Data Grid".
  • 1.0.1: minor update to cropping in analysis notebook
  • 1.0.0: Initial release.

Owner

  • Name: Ed Bennett
  • Login: edbennett
  • Kind: user
  • Location: Swansea, UK
  • Company: Swansea Academy of Advanced Computing, Swansea University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use these data or the attached analysis code, please cite it as below."
authors:
  - family-names: Bennett
    given-names: Ed
    orcid: https://orcid.org/0000-0002-1678-6701
title: "Survey of reproducibility in hep-lat publications in 2021"
version: v1.1.0
date-released: 2022-08-09

GitHub Events

Total
Last Year

Dependencies

environment.yml conda
  • bzip2 1.0.8.*
  • ca-certificates 2022.4.26.*
  • certifi 2022.5.18.1.*
  • libcxx 12.0.0.*
  • libffi 3.3.*
  • ncurses 6.3.*
  • openssl 1.1.1o.*
  • pip 21.2.4.*
  • python 3.10.4.*
  • readline 8.1.2.*
  • setuptools 61.2.0.*
  • sqlite 3.38.3.*
  • tk 8.6.11.*
  • tzdata 2022a.*
  • wheel 0.37.1.*
  • xz 5.2.5.*
  • zlib 1.2.12.*