https://github.com/compnet/splice

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

https://github.com/compnet/splice

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary

Keywords

alias-resolution character-networks literary-texts ner nlp
Last synced: 5 months ago · JSON representation

Repository

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

Basic Info
  • Host: GitHub
  • Owner: CompNet
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 150 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
alias-resolution character-networks literary-texts ner nlp
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Splice

The Role of Information Extraction Tasks in Automatic Literary Character Network Construction

Reproducing Results

First, you should:

  • install dependencies. Either use poetry install if you have poetry, or pip install -r requirements.txt otherwise.
  • get the litbank dataset

The main experiment can be run with xp.py:

sh python xp.py with\ min_graph_nodes=10\ co_occurrences_dist=32\ litbank.root="/path/to/litbank"

Degradation Experiments

The following script will run all of the degradation experiments:

```sh MAINXPRUN="/path/to/main/xp/run"

python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=NER degradationname=addwrongentity degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=NER degradationname=removecorrectentity degradationsteps=200 degradationreportfrequency=0.5 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=addwrongmention degradationsteps=200 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=removecorrectmention degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=addwronglink degradationsteps=500 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=removecorrectlink degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=corefall degradationsteps=1000 degradationreportfrequency=0.05 ```

End-to-end LLM-based Pipelines

The E2E-Coref experiment can be reproduced with the xp_e2e_llm_coref.py script:

```sh MAINXPRUN="/path/to/main/xp/run" LITBANK_PATH="/path/to/litbank"

python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="gpt3.5"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"

python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="gpt40"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"

python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="llama3-8b-instruct"\ hgaccesstoken="insert your Huggingface access token"\ device="cuda"\ litbank.root="${LITBANK_PATH}" ```

Similarly, the *E2E-Graphml experiment can be reproduced with the xp_e2e_llm_graphml.py script:

```sh MAINXPRUN="/path/to/main/xp/run"

python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="gpt3.5"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"

python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="gpt40"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"

python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="llama3-8b-instruct"\ hgaccesstoken="insert your Huggingface access token"\ device="cuda"\ litbank.root="${LITBANK_PATH}" ```

Printing / Plotting Results

| Figure | Corresponding Script | |----------|-------------------------------------| | Table 1 | print_main_task_results.py | | Table 2 | print_main_graph_results.py | | Table 3 | | | Figure 1 | plot_degradation_metrics.py | | Figure 2 | plot_ner_degradation_metrics.py | | Figure 3 | plot_coref_degradation_metrics.py | | Table 4 | print_e2e_graph_results.py |

Owner

  • Name: Complex Networks
  • Login: CompNet
  • Kind: organization
  • Location: Avignon, France

GitHub Events

Total
  • Push event: 1
  • Public event: 1
Last Year
  • Push event: 1
  • Public event: 1

Dependencies

poetry.lock pypi
  • 109 dependencies
pyproject.toml pypi
requirements.txt pypi
  • accelerate ==0.22.0
  • aiohttp ==3.9.3
  • aiosignal ==1.3.1
  • annotated-types ==0.7.0
  • anyio ==4.4.0
  • async-timeout ==4.0.3
  • attrs ==23.2.0
  • certifi ==2024.2.2
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • colorama ==0.4.6
  • contourpy ==1.1.1
  • cycler ==0.12.1
  • datasets ==2.18.0
  • dill ==0.3.8
  • distro ==1.9.0
  • docopt ==0.6.2
  • exceptiongroup ==1.2.0
  • filelock ==3.13.1
  • fonttools ==4.50.0
  • frozenlist ==1.4.1
  • fsspec ==2024.2.0
  • gitdb ==4.0.11
  • gitpython ==3.1.42
  • grimbert ==0.1.1
  • h11 ==0.14.0
  • httpcore ==1.0.5
  • httpx ==0.27.0
  • huggingface-hub ==0.21.4
  • idna ==3.6
  • importlib-resources ==6.3.2
  • iniconfig ==2.0.0
  • jinja2 ==3.1.3
  • joblib ==1.3.2
  • jsonpickle ==3.0.3
  • kiwisolver ==1.4.5
  • markdown-it-py ==3.0.0
  • markupsafe ==2.1.5
  • matplotlib ==3.7.5
  • mdurl ==0.1.2
  • more-itertools ==10.2.0
  • mpmath ==1.3.0
  • multidict ==6.0.5
  • multiprocess ==0.70.16
  • munch ==4.0.0
  • nameparser ==1.1.3
  • neleval ==3.1.1
  • networkx ==2.8.8
  • nltk ==3.8.1
  • numpy ==1.24.4
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.19.3
  • nvidia-nvjitlink-cu12 ==12.4.99
  • nvidia-nvtx-cu12 ==12.1.105
  • openai ==1.31.0
  • packaging ==24.0
  • pandas ==2.0.0
  • pillow ==10.2.0
  • pluggy ==1.4.0
  • psutil ==5.9.8
  • py-cpuinfo ==9.0.0
  • pyarrow ==15.0.2
  • pyarrow-hotfix ==0.6
  • pydantic ==2.7.3
  • pydantic-core ==2.18.4
  • pygments ==2.17.2
  • pyparsing ==3.1.2
  • pytest ==7.4.4
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • pyyaml ==6.0.1
  • regex ==2023.12.25
  • requests ==2.31.0
  • rich ==13.7.1
  • sacred ==0.8.5
  • sacremoses ==0.0.53
  • safetensors ==0.4.2
  • scienceplots ==2.1.1
  • scikit-learn ==1.3.2
  • scipy ==1.10.1
  • seqeval ==1.2.2
  • six ==1.16.0
  • smmap ==5.0.1
  • sniffio ==1.3.1
  • sympy ==1.12
  • threadpoolctl ==3.3.0
  • tokenizers ==0.15.2
  • tomli ==2.0.1
  • torch ==2.2.1
  • tqdm ==4.66.2
  • transformers ==4.38.2
  • triton ==2.2.0
  • typing-extensions ==4.10.0
  • tzdata ==2024.1
  • urllib3 ==2.2.1
  • wrapt ==1.16.0
  • xxhash ==3.4.1
  • yarl ==1.9.4
  • zipp ==3.18.1