https://github.com/climatecompatiblegrowth/topic_classification

https://github.com/climatecompatiblegrowth/topic_classification

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: ClimateCompatibleGrowth
  • Language: Python
  • Default Branch: main
  • Size: 26.4 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

Topic Classification Service

An API service for academic topic classification based on OpenAlex's predictor model.

Prerequisites

  • Python 3.10+
  • curl or wget for downloading model artifacts

Model and Artifacts

  1. Download the trained model and artifacts:

bash wget https://zenodo.org/records/10568402/files/topic_classifier_v1_artifacts.tar.gz

  1. Create models directory and extract artifacts:

bash mkdir -p model tar -xzf topic_classifier_v1_artifacts.tar.gz -C model

Development Setup

  1. Install the uv package manager:

bash curl -LsSf https://astral.sh/uv/install.sh | sh

  1. Create and activate virtual environment:

bash uv venv source .venv/bin/activate

  1. Install dependencies:

bash uv pip install -r requirements.txt --no-cache-dir

  1. Start the development server:

bash uvicorn main:app --reload --port <PORT>

API ENDPOINTS

Health Check

  • Endpoint: /health_check
  • Method: GET
  • Description: Checks the service and model health.
  • Example Request:

bash curl http://localhost:<PORT>/health_check

  • Example Response:

json { "status": "healthy", "model": "loaded" }

Single Paper Prediction

  • Endpoint: /single
  • Method: POST
  • Description: Predicts topics for a single academic paper.
  • Input Data Format:

```json [ { "title": "Multiplication of matrices of arbitrary shape on a data parallel computer", "abstractinvertedindex": { "Some": [0], "level-2": [1], "and": [2], "level-3": [3], "Distributed": [4], "Basic": [5], "Linear": [6], "Algebra": [7], "Subroutines": [8], "(DBLAS)": [9], "that": [10], "have": [11], "been": [12], "implemented": [13], "on": [14, 26], "the": [15, 27], "Connection": [16], "Machine": [17], "system": [18], "CM-200": [19], "are": [20], "described.": [21], "No": [22], "assumption": [23], "is": [24], "made": [25], "shape": [28], "or": [29], "...": [30] }, "journaldisplayname": "Fire Safety Science", "referenced_works": [ "https://openalex.org/W183327403", "https://openalex.org/W1851212222", "https://openalex.org/W1967958850", "https://openalex.org/W1988425770", "https://openalex.org/W1991286031", "https://openalex.org/W2029342163", "https://openalex.org/W2045381439", "https://openalex.org/W2053280233", "https://openalex.org/W2071782145", "https://openalex.org/W2083202979", "https://openalex.org/W2104487100", "https://openalex.org/W4234919994" ], "inverted": true } ] - Example Request:

```python import requests import json

url = "http://localhost:/single" headers = {"Content-Type": "application/json"} with open("testsamples/testjson_single.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json()) ```

  • Example Response:

    json [ [ { "topic_id": 10829, "topic_label": "829: Networks on Chip in System-on-Chip Design", "topic_score": 0.9978 }, { "topic_id": 10054, "topic_label": "54: Parallel Computing and Performance Optimization", "topic_score": 0.9963 }, { "topic_id": 11522, "topic_label": "1522: Design and Optimization of Field-Programmable Gate Arrays and Application-Specific Integrated Circuits", "topic_score": 0.991 } ] ]

Non-inverted Abstract Prediction

  • Endpoint: /single
  • Method: POST
  • Description: Predicts topics for a single academic paper with an univerted abstract.
  • Input Data Format:

json [ { "title": "The renewable energy role in the global energy Transformations", "abstract": "In a comprehensive analysis of the global transition towards renewable energy, the study revealed...", "abstract_inverted_index": {}, "journal_display_name": "Renewable energy focus", "referenced_works": [ "https://openalex.org/W2275853436", "https://openalex.org/W2412247133", "https://openalex.org/W2545730423", ...., "https://openalex.org/W2601431494"] "inverted": false } ]

  • Example Request:

```python import requests import json

url = "http://localhost:/single" headers = {"Content-Type": "application/json"} with open("testsamples/testjsonsinglenot_inverted.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json()) ```

  • Example Response:

    json [ [ { "topic_id": 12639, "topic_label": "2639: Global Energy Transition and Fossil Fuel Depletion", "topic_score": 0.9951 }, { "topic_id": 11185, "topic_label": "1185: Integration of Renewable Energy Systems in Power Grids", "topic_score": 0.9747 }, { "topic_id": 12129, "topic_label": "2129: Energy Supply and Security Issues for Developed Economies", "topic_score": 0.9722 } ] ]

Batch Paper Prediction

  • Endpoint: /batch
  • Method: POST
  • Description: Predicts topics for a batch of academic papers.
  • Example Request:

```python import requests import json

url = "http://localhost:/batch" headers = {"Content-Type": "application/json"} with open("testsamples/testjson_batch.json", "r") as f: data = json.load(f) response = requests.post(url, headers=headers, json=data) print(response.json()) ```

  • Example Response:

json [ [ { "topic_id": 10829, "topic_label": "829: Networks on Chip in System-on-Chip Design", "topic_score": 0.9978 }, { "topic_id": 10054, "topic_label": "54: Parallel Computing and Performance Optimization", "topic_score": 0.9962 }, { "topic_id": 11522, "topic_label": "1522: Design and Optimization of Field-Programmable Gate Arrays and Application-Specific Integrated Circuits", "topic_score": 0.9909 } ], [ { "topic_id": 10110, "topic_label": "110: Seismicity and Tectonic Plate Interactions", "topic_score": 0.9995 }, { "topic_id": 12157, "topic_label": "2157: Machine Learning for Mineral Prospectivity Mapping", "topic_score": 0.9933 }, { "topic_id": 10399, "topic_label": "399: Characterization of Shale Gas Pore Structure", "topic_score": 0.991 } ] ]

License

This project uses OpenAlex's topic classification model. Please refer to their license for terms of use.

Owner

  • Name: Climate Compatible Growth
  • Login: ClimateCompatibleGrowth
  • Kind: organization
  • Location: United Kingdom

GitHub Events

Total
  • Issues event: 5
  • Delete event: 1
  • Member event: 1
  • Push event: 5
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 3
Last Year
  • Issues event: 5
  • Delete event: 1
  • Member event: 1
  • Push event: 5
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 3

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 1
  • Average time to close issues: 9 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 1
  • Average time to close issues: 9 days
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • FrancisTembo (4)
Pull Request Authors
  • FrancisTembo (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels

Dependencies

requirements.txt pypi
  • absl-py ==2.0.0
  • aiohttp ==3.8.6
  • aiosignal ==1.3.1
  • amqp ==5.1.1
  • anyio ==4.0.0
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • arrow ==1.3.0
  • asttokens ==2.4.1
  • astunparse ==1.6.3
  • async-lru ==2.0.4
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • automat ==22.10.0
  • babel ==2.13.1
  • beautifulsoup4 ==4.12.2
  • billiard ==4.1.0
  • bleach ==6.1.0
  • blinker ==1.6.3
  • bokeh ==3.3.0
  • cachetools ==5.3.2
  • celery ==5.3.4
  • certifi ==2023.7.22
  • cffi ==1.16.0
  • charset-normalizer ==3.3.1
  • click ==8.1.7
  • click-didyoumean ==0.3.0
  • click-plugins ==1.1.1
  • click-repl ==0.3.0
  • cloudpickle ==3.0.0
  • colorama ==0.4.4
  • comm ==0.1.4
  • constantly ==23.10.4
  • contourpy ==1.1.1
  • cryptography ==41.0.5
  • cssselect ==1.2.0
  • cycler ==0.12.1
  • dask ==2023.10.1
  • datasets ==2.15.0
  • debugpy ==1.8.0
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • dill ==0.3.7
  • docutils ==0.16
  • dparse ==0.6.3
  • evaluate ==0.4.1
  • exceptiongroup ==1.1.3
  • executing ==2.0.1
  • fastapi ==0.99.1
  • fastjsonschema ==2.18.1
  • filelock ==3.13.0
  • flask ==3.0.0
  • flatbuffers ==23.5.26
  • fonttools ==4.43.1
  • fqdn ==1.5.1
  • frozenlist ==1.4.0
  • fsspec ==2023.10.0
  • gast ==0.4.0
  • google-auth ==2.38.0
  • google-auth-oauthlib ==1.0.0
  • google-pasta ==0.2.0
  • greenlet ==3.0.1
  • grpcio ==1.59.0
  • h11 ==0.14.0
  • h5py ==3.10.0
  • httpie ==3.2.2
  • httplib2 ==0.22.0
  • huggingface-hub ==0.19.4
  • hyperlink ==21.0.0
  • idna ==3.4
  • imageio ==2.31.6
  • importlib-metadata ==6.8.0
  • incremental ==22.10.0
  • iniconfig ==2.0.0
  • isoduration ==20.11.0
  • itemadapter ==0.8.0
  • itemloaders ==1.1.0
  • itsdangerous ==2.1.2
  • jedi ==0.19.1
  • jinja2 ==3.1.2
  • jmespath ==1.0.1
  • joblib ==1.3.2
  • json5 ==0.9.14
  • jsonpointer ==2.4
  • jsonschema ==4.19.1
  • jsonschema-specifications ==2023.7.1
  • keras ==2.13.1
  • kiwisolver ==1.4.5
  • kombu ==5.3.2
  • libclang ==16.0.6
  • llvmlite ==0.41.1
  • locket ==1.0.0
  • lxml ==4.9.3
  • markdown ==3.5
  • markdown-it-py ==3.0.0
  • markupsafe ==2.1.3
  • matplotlib ==3.8.0
  • matplotlib-inline ==0.1.6
  • mdurl ==0.1.2
  • mistune ==3.0.2
  • mpmath ==1.3.0
  • multidict ==6.0.4
  • multiprocess ==0.70.15
  • nest-asyncio ==1.5.8
  • networkx ==3.4.2
  • nltk ==3.9.1
  • numba ==0.58.1
  • numpy ==1.24.3
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.18.1
  • nvidia-nvjitlink-cu12 ==12.8.61
  • nvidia-nvtx-cu12 ==12.1.105
  • oauthlib ==3.2.2
  • opt-einsum ==3.3.0
  • overrides ==7.4.0
  • packaging ==21.3
  • pandas ==2.1.2
  • pandocfilters ==1.5.0
  • parsel ==1.8.1
  • parso ==0.8.3
  • partd ==1.4.1
  • pbr ==5.11.1
  • pexpect ==4.8.0
  • pillow ==10.0.1
  • pip ==25.0.1
  • pipdeptree ==2.13.0
  • platformdirs ==3.11.0
  • plotly ==5.18.0
  • pluggy ==1.3.0
  • prometheus-client ==0.17.1
  • prompt-toolkit ==3.0.39
  • protego ==0.3.0
  • protobuf ==4.24.4
  • psutil ==5.9.6
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pyarrow ==14.0.1
  • pyarrow-hotfix ==0.6
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pycparser ==2.21
  • pydantic ==1.10.13
  • pydispatcher ==2.0.7
  • pygments ==2.16.1
  • pyjwt ==2.8.0
  • pyopenssl ==23.3.0
  • pyparsing ==3.1.1
  • pysocks ==1.7.1
  • pytest ==7.4.3
  • python-dateutil ==2.8.2
  • python-json-logger ==2.0.7
  • pytz ==2023.3.post1
  • pyyaml ==6.0.1
  • pyzmq ==25.1.1
  • queuelib ==1.6.2
  • redis ==5.0.1
  • referencing ==0.30.2
  • regex ==2023.10.3
  • requests ==2.31.0
  • requests-file ==1.5.1
  • requests-oauthlib ==1.3.1
  • requests-toolbelt ==1.0.0
  • responses ==0.18.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rich ==13.6.0
  • rpds-py ==0.10.6
  • rsa ==4.7.2
  • ruamel-yaml ==0.18.3
  • ruamel-yaml-clib ==0.2.8
  • safetensors ==0.4.1
  • scikit-learn ==1.3.2
  • scipy ==1.11.3
  • scrapy ==2.11.0
  • seaborn ==0.13.0
  • send2trash ==1.8.2
  • sentence-transformers ==2.2.2
  • sentencepiece ==0.2.0
  • service-identity ==23.1.0
  • setuptools ==75.8.0
  • six ==1.16.0
  • slicer ==0.0.7
  • sniffio ==1.3.0
  • soupsieve ==2.5
  • sqlalchemy ==2.0.22
  • stack-data ==0.6.3
  • starlette ==0.27.0
  • sympy ==1.13.3
  • tenacity ==8.2.3
  • tensorboard ==2.13.0
  • tensorboard-data-server ==0.7.2
  • tensorflow ==2.13.0
  • tensorflow-estimator ==2.13.0
  • tensorflow-io-gcs-filesystem ==0.34.0
  • termcolor ==2.3.0
  • terminado ==0.17.1
  • testresources ==2.0.1
  • threadpoolctl ==3.2.0
  • tinycss2 ==1.2.1
  • tldextract ==5.0.1
  • tokenizers ==0.15.0
  • tomli ==2.0.1
  • toolz ==0.12.0
  • torch ==2.1.2
  • torchvision ==0.16.2
  • tornado ==6.3.3
  • tqdm ==4.66.1
  • traitlets ==5.12.0
  • transformers ==4.35.2
  • triton ==2.1.0
  • twisted ==22.10.0
  • types-python-dateutil ==2.8.19.14
  • typing-extensions ==4.5.0
  • tzdata ==2023.3
  • uri-template ==1.3.0
  • urllib3 ==2.0.7
  • uvicorn ==0.34.0
  • vine ==5.0.0
  • w3lib ==2.1.2
  • wcwidth ==0.2.8
  • webcolors ==1.13
  • webencodings ==0.5.1
  • websocket-client ==1.6.4
  • werkzeug ==3.0.1
  • wheel ==0.45.1
  • widgetsnbextension ==4.0.9
  • wrapt ==1.15.0
  • xxhash ==3.4.1
  • xyzservices ==2023.10.1
  • yarl ==1.9.2
  • zipp ==3.17.0
  • zope-interface ==6.1