https://github.com/alan-turing-institute/arc-tigers

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: alan-turing-institute
License: mit
Language: Python
Default Branch: main
Size: 927 KB

Statistics

Stars: 0
Watchers: 4
Forks: 0
Open Issues: 7
Releases: 0

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License Code of conduct

ARC-TIGERS

Testing Imbalanced cateGory classifiERS

Installation

bash git clone https://github.com/alan-turing-institute/ARC-TIGERS cd ARC-TIGERS python -m pip install .

Usage

`scripts/dataset_download.py`

This script downloads a reddit dataset and saves in an appropriate place in the parent directory. It takes as arguments: - dataset_name: the name of the dataset to load from huggingface, for example bit0/reddit_dataset_12 - target_subreddits: a list subreddits being used for the experiment, this should be in .json format. - max_rows: The maximum number of rows to use in the resultant dataset, this should be an integer. It saves a .json file called filtered_rows containing the data in a subdirectory named using the dataset name and the maximum number of rows.

`scripts/dataset_generation.py`

This script generates train and test splits from the downloaded reddit dataset(s). It currently takes as arguments: - data_dir: the path to the dataset being used to form the splits - split: the specific subreddit split to generate. Defined in the dictionary DATASET_COMBINATIONS in /data/utils. - r: The ratio of target subreddits to non-target subreddits

The script saves two csv files, train.csv and test.csv within a subdirectory splits, these contain the train and evaluation splits and are of roughly equal size.

Contributing

See CONTRIBUTING.md for instructions on how to contribute.

License

Distributed under the terms of the MIT license.

Owner

Name: The Alan Turing Institute
Login: alan-turing-institute
Kind: organization
Email: info@turing.ac.uk

Website: https://turing.ac.uk
Repositories: 477
Profile: https://github.com/alan-turing-institute

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total

Create event: 13
Issues event: 20
Delete event: 10
Issue comment event: 7
Member event: 1
Public event: 1
Push event: 155
Pull request review event: 41
Pull request review comment event: 47
Pull request event: 21

Last Year

Create event: 13
Issues event: 20
Delete event: 10
Issue comment event: 7
Member event: 1
Public event: 1
Push event: 155
Pull request review event: 41
Pull request review comment event: 47
Pull request event: 21

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 6
Total pull requests: 1
Average time to close issues: 9 days
Average time to close pull requests: 7 days
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 1
Average time to close issues: 9 days
Average time to close pull requests: 7 days
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

J-Dymond (7)
klh5 (7)
jack89roberts (1)

Pull Request Authors

J-Dymond (10)
jack89roberts (5)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/ci.yml actions

actions/checkout v4 composite
actions/setup-python v4 composite
codecov/codecov-action v3.1.4 composite
pre-commit/action v3.0.0 composite

pyproject.toml pypi

uv.lock pypi

aiohappyeyeballs 2.6.1
aiohttp 3.11.18
aiosignal 1.3.2
arc-tigers 0.1.0
async-timeout 5.0.1
attrs 25.3.0
certifi 2025.4.26
cfgv 3.4.0
charset-normalizer 3.4.1
colorama 0.4.6
coverage 7.8.0
datasets 3.5.1
dill 0.3.8
distlib 0.3.9
exceptiongroup 1.2.2
filelock 3.18.0
frozenlist 1.6.0
fsspec 2025.3.0
huggingface-hub 0.30.2
identify 2.6.10
idna 3.10
iniconfig 2.1.0
jinja2 3.1.6
markupsafe 3.0.2
mpmath 1.3.0
multidict 6.4.3
multiprocess 0.70.16
networkx 3.4.2
nodeenv 1.9.1
numpy 2.2.5
nvidia-cublas-cu12 12.6.4.1
nvidia-cuda-cupti-cu12 12.6.80
nvidia-cuda-nvrtc-cu12 12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12 9.5.1.17
nvidia-cufft-cu12 11.3.0.4
nvidia-cufile-cu12 1.11.1.6
nvidia-curand-cu12 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2
nvidia-cusparselt-cu12 0.6.3
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.6.77
packaging 25.0
pandas 2.2.3
platformdirs 4.3.7
pluggy 1.5.0
pre-commit 4.2.0
propcache 0.3.1
pyarrow 20.0.0
pytest 8.3.5
pytest-cov 6.1.1
python-dateutil 2.9.0.post0
pytz 2025.2
pyyaml 6.0.2
regex 2024.11.6
requests 2.32.3
safetensors 0.5.3
setuptools 80.0.1
six 1.17.0
sympy 1.14.0
tokenizers 0.21.1
tomli 2.2.1
torch 2.7.0
tqdm 4.67.1
transformers 4.51.3
triton 3.3.0
typing-extensions 4.13.2
tzdata 2025.2
urllib3 2.4.0
virtualenv 20.30.0
xxhash 3.5.0
yarl 1.20.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science