only-time-will-tell

Only Time Will Tell: Replication package

https://github.com/michaeldorner/only-time-will-tell

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 10 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

python replication-package simulation

Last synced: 6 months ago · JSON representation

Repository

Only Time Will Tell: Replication package

Basic Info

Host: GitHub
Owner: michaeldorner
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 15 MB

Statistics

Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 2

Topics

python replication-package simulation

Created over 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

Only Time Will Tell: Replication package

Simulation code for the publication "Only Time Will Tell: Modelling Information Diffusion in Code Review With Time-Varying Hypergraphs"

🏆🏆🏆ESEM Best Paper Award 2022🏆🏆🏆

Data

The results of the simulation can be found on Zenodo.

Prerequisites

The simulation requires

at least 50 GB storage,
16 GB RAM,
a powerful CPU for running the entire simulation (for options, see next section), and
Python 3.8 or higher with the dependencies specified in requirements.txt (installed via pip3 install -r requirements.txt)

However, we highly recommend significantly more resources and Python 3.9 or later.

If you want to create or change the plots, please install and use jupyter.

Run simulation

Download or pull this repository
cd only-time-will-tell (or the directory it is located)
run pip3 install -r requirements.txt
python3 -m simulation to run the simulation. With the optional --time_ignoring_only and --time_respecting_only you can run the simulation with the time-ignoring model or time-respecting model, respectively.

Although highly hardware-dependent, we recommend to plan for the simulation run to take 10-20 min with --time_ignoring_only and 2-4 hours with --time_respecting_only at least. The option --skip_storing_reachables saves you about 50 GB of data on your local drive and about 10-20 minutes of the time but also does not allow you to check those intermediate results.

Tests and verification

python3 -m unittest discover runs all tests.

The outputs are reproducible and hashable: Verify the files using hashes such as SHA256. The plots can be reproduced by the jupyter notebook in the folder notebooks.

To verify the results, run

shasum -a 256 results/*.json and compare the hash values of our results:

d455f1e37237014830fa9aaca76232594c92c193c241b71ca28ec69969163daf results/time_ignoring_reachables.json 4a6b2e596f24a3f00784851789cc0244f3688c1a01316e0aef96b0b7add233a3 results/time_ignoring_upper_bound.json f8e6472e74819e6ac74c4bb7ae16aac3d75728da0b795448d849951eb4dd3bd6 results/time_respecting_reachables.json a07356ef7bf8b8af152e95857c39cbb8138c63ad110f455a309083545b94cbb5 results/time_respecting_upper_bound.jsonn

Design decisions

All computations and simulations are packed into a executable Python module that allow testing the code thoroughly and running it quickly via the command line. Only the visualization is a jupyter notebook and not covered by our test setup.

We use JSON to store our simulation results despite its limitation (i.e., no native time or set type) because it is widely adopted and allows dictionary-like data (in contrast to table-like data formats such as HDF5 or Apache Arrow). We decide against Python's internal serialization module pickle due to its inherent security issues and lousy performance. Since JSON does not support sets, we use sorted arrays for the reachables. Writing an adjacency matrix as CSV is a magnitude slower than our approach.

At the current state, we do not support multiple cores since the whole graph is kept in memory (about 100 GB peak memory footprint) which causes performance issues on Windows and macOS due to their restriction on COW and process forking. Please find more information here and here.

We used an explicit caching approach, i.e., we precomputed each needed relation as dict to improve the performance. The built-in caching via functools.cache is not intended to work properly with instance methods (such as hypergraph.vertices()). For custom-made solutions such as suggest here I am not smart enough.

Owner

Name: Michael Dorner
Login: michaeldorner
Kind: user
Location: Sweden
Company: Blekinge Institute of Technology

Website: www.michaeldorner.de
Repositories: 33
Profile: https://github.com/michaeldorner

git push origin main --force

GitHub Events

Total

Last Year

Committers

Last synced: about 1 year ago

All Time

Total Commits: 144
Total Committers: 2
Avg Commits per committer: 72.0
Development Distribution Score (DDS): 0.014

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Michael Dorner	m**l@m**e	142
Andreas Bauer	a****r	2

Committer Domains (Top 20 + Academic)

michaeldorner.de: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 11 hours
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.33
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

andreas-bauer (2)
codacy-badger (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

matplotlib >=3.4.2
networkx >=2.6.2
orjson >=3.6.3
pandas >=1.3.2
tqdm >=4.61.2

.github/workflows/main.yml actions

actions/checkout master composite
actions/setup-python master composite
codecov/codecov-action v2 composite

setup.py pypi

only-time-will-tell

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Only Time Will Tell: Replication package

Data

Prerequisites

Run simulation

Tests and verification

Design decisions

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies