https://github.com/alan-turing-institute/arc-tigers

https://github.com/alan-turing-institute/arc-tigers

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 927 KB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 7
  • Releases: 0
Created 11 months ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

ARC-TIGERS

Actions Status

Testing Imbalanced cateGory classifiERS

Installation

bash git clone https://github.com/alan-turing-institute/ARC-TIGERS cd ARC-TIGERS python -m pip install .

Usage

scripts/dataset_download.py

This script downloads a reddit dataset and saves in an appropriate place in the parent directory. It takes as arguments: - dataset_name: the name of the dataset to load from huggingface, for example bit0/reddit_dataset_12 - target_subreddits: a list subreddits being used for the experiment, this should be in .json format. - max_rows: The maximum number of rows to use in the resultant dataset, this should be an integer. It saves a .json file called filtered_rows containing the data in a subdirectory named using the dataset name and the maximum number of rows.

scripts/dataset_generation.py

This script generates train and test splits from the downloaded reddit dataset(s). It currently takes as arguments: - data_dir: the path to the dataset being used to form the splits - split: the specific subreddit split to generate. Defined in the dictionary DATASET_COMBINATIONS in /data/utils. - r: The ratio of target subreddits to non-target subreddits

The script saves two csv files, train.csv and test.csv within a subdirectory splits, these contain the train and evaluation splits and are of roughly equal size.

Contributing

See CONTRIBUTING.md for instructions on how to contribute.

License

Distributed under the terms of the MIT license.

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Create event: 13
  • Issues event: 20
  • Delete event: 10
  • Issue comment event: 7
  • Member event: 1
  • Public event: 1
  • Push event: 155
  • Pull request review event: 41
  • Pull request review comment event: 47
  • Pull request event: 21
Last Year
  • Create event: 13
  • Issues event: 20
  • Delete event: 10
  • Issue comment event: 7
  • Member event: 1
  • Public event: 1
  • Push event: 155
  • Pull request review event: 41
  • Pull request review comment event: 47
  • Pull request event: 21

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 6
  • Total pull requests: 1
  • Average time to close issues: 9 days
  • Average time to close pull requests: 7 days
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 1
  • Average time to close issues: 9 days
  • Average time to close pull requests: 7 days
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • J-Dymond (7)
  • klh5 (7)
  • jack89roberts (1)
Pull Request Authors
  • J-Dymond (10)
  • jack89roberts (5)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3.1.4 composite
  • pre-commit/action v3.0.0 composite
pyproject.toml pypi
uv.lock pypi
  • aiohappyeyeballs 2.6.1
  • aiohttp 3.11.18
  • aiosignal 1.3.2
  • arc-tigers 0.1.0
  • async-timeout 5.0.1
  • attrs 25.3.0
  • certifi 2025.4.26
  • cfgv 3.4.0
  • charset-normalizer 3.4.1
  • colorama 0.4.6
  • coverage 7.8.0
  • datasets 3.5.1
  • dill 0.3.8
  • distlib 0.3.9
  • exceptiongroup 1.2.2
  • filelock 3.18.0
  • frozenlist 1.6.0
  • fsspec 2025.3.0
  • huggingface-hub 0.30.2
  • identify 2.6.10
  • idna 3.10
  • iniconfig 2.1.0
  • jinja2 3.1.6
  • markupsafe 3.0.2
  • mpmath 1.3.0
  • multidict 6.4.3
  • multiprocess 0.70.16
  • networkx 3.4.2
  • nodeenv 1.9.1
  • numpy 2.2.5
  • nvidia-cublas-cu12 12.6.4.1
  • nvidia-cuda-cupti-cu12 12.6.80
  • nvidia-cuda-nvrtc-cu12 12.6.77
  • nvidia-cuda-runtime-cu12 12.6.77
  • nvidia-cudnn-cu12 9.5.1.17
  • nvidia-cufft-cu12 11.3.0.4
  • nvidia-cufile-cu12 1.11.1.6
  • nvidia-curand-cu12 10.3.7.77
  • nvidia-cusolver-cu12 11.7.1.2
  • nvidia-cusparse-cu12 12.5.4.2
  • nvidia-cusparselt-cu12 0.6.3
  • nvidia-nccl-cu12 2.26.2
  • nvidia-nvjitlink-cu12 12.6.85
  • nvidia-nvtx-cu12 12.6.77
  • packaging 25.0
  • pandas 2.2.3
  • platformdirs 4.3.7
  • pluggy 1.5.0
  • pre-commit 4.2.0
  • propcache 0.3.1
  • pyarrow 20.0.0
  • pytest 8.3.5
  • pytest-cov 6.1.1
  • python-dateutil 2.9.0.post0
  • pytz 2025.2
  • pyyaml 6.0.2
  • regex 2024.11.6
  • requests 2.32.3
  • safetensors 0.5.3
  • setuptools 80.0.1
  • six 1.17.0
  • sympy 1.14.0
  • tokenizers 0.21.1
  • tomli 2.2.1
  • torch 2.7.0
  • tqdm 4.67.1
  • transformers 4.51.3
  • triton 3.3.0
  • typing-extensions 4.13.2
  • tzdata 2025.2
  • urllib3 2.4.0
  • virtualenv 20.30.0
  • xxhash 3.5.0
  • yarl 1.20.0