https://github.com/compnet/splice
The Role of Information Extraction Tasks in Automatic Literary Character Network Construction
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Keywords
Repository
The Role of Information Extraction Tasks in Automatic Literary Character Network Construction
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Splice
The Role of Information Extraction Tasks in Automatic Literary Character Network Construction
Reproducing Results
First, you should:
- install dependencies. Either use
poetry installif you have poetry, orpip install -r requirements.txtotherwise. - get the litbank dataset
The main experiment can be run with xp.py:
sh
python xp.py with\
min_graph_nodes=10\
co_occurrences_dist=32\
litbank.root="/path/to/litbank"
Degradation Experiments
The following script will run all of the degradation experiments:
```sh MAINXPRUN="/path/to/main/xp/run"
python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=NER degradationname=addwrongentity degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=NER degradationname=removecorrectentity degradationsteps=200 degradationreportfrequency=0.5 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=addwrongmention degradationsteps=200 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=removecorrectmention degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=addwronglink degradationsteps=500 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=removecorrectlink degradationsteps=1000 degradationreportfrequency=0.05 python xpmetricsoverdegradation.py with inputdir="${MAINXPRUN}" taskname=coref degradationname=corefall degradationsteps=1000 degradationreportfrequency=0.05 ```
End-to-end LLM-based Pipelines
The E2E-Coref experiment can be reproduced with the xp_e2e_llm_coref.py script:
```sh MAINXPRUN="/path/to/main/xp/run" LITBANK_PATH="/path/to/litbank"
python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="gpt3.5"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"
python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="gpt40"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"
python xpe2ellmcoref.py with\ inputdir="${MAINXPRUN}"\ model="llama3-8b-instruct"\ hgaccesstoken="insert your Huggingface access token"\ device="cuda"\ litbank.root="${LITBANK_PATH}" ```
Similarly, the *E2E-Graphml experiment can be reproduced with the xp_e2e_llm_graphml.py script:
```sh MAINXPRUN="/path/to/main/xp/run"
python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="gpt3.5"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"
python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="gpt40"\ openAIAPIkey="insert your openAI key"\ litbank.root="${LITBANK_PATH}"
python xpe2ellmgraphml.py with\ inputdir="${MAINXPRUN}"\ model="llama3-8b-instruct"\ hgaccesstoken="insert your Huggingface access token"\ device="cuda"\ litbank.root="${LITBANK_PATH}" ```
Printing / Plotting Results
| Figure | Corresponding Script |
|----------|-------------------------------------|
| Table 1 | print_main_task_results.py |
| Table 2 | print_main_graph_results.py |
| Table 3 | |
| Figure 1 | plot_degradation_metrics.py |
| Figure 2 | plot_ner_degradation_metrics.py |
| Figure 3 | plot_coref_degradation_metrics.py |
| Table 4 | print_e2e_graph_results.py |
Owner
- Name: Complex Networks
- Login: CompNet
- Kind: organization
- Location: Avignon, France
- Website: http://lia.univ-avignon.fr
- Repositories: 44
- Profile: https://github.com/CompNet
GitHub Events
Total
- Push event: 1
- Public event: 1
Last Year
- Push event: 1
- Public event: 1
Dependencies
- 109 dependencies
- accelerate ==0.22.0
- aiohttp ==3.9.3
- aiosignal ==1.3.1
- annotated-types ==0.7.0
- anyio ==4.4.0
- async-timeout ==4.0.3
- attrs ==23.2.0
- certifi ==2024.2.2
- charset-normalizer ==3.3.2
- click ==8.1.7
- colorama ==0.4.6
- contourpy ==1.1.1
- cycler ==0.12.1
- datasets ==2.18.0
- dill ==0.3.8
- distro ==1.9.0
- docopt ==0.6.2
- exceptiongroup ==1.2.0
- filelock ==3.13.1
- fonttools ==4.50.0
- frozenlist ==1.4.1
- fsspec ==2024.2.0
- gitdb ==4.0.11
- gitpython ==3.1.42
- grimbert ==0.1.1
- h11 ==0.14.0
- httpcore ==1.0.5
- httpx ==0.27.0
- huggingface-hub ==0.21.4
- idna ==3.6
- importlib-resources ==6.3.2
- iniconfig ==2.0.0
- jinja2 ==3.1.3
- joblib ==1.3.2
- jsonpickle ==3.0.3
- kiwisolver ==1.4.5
- markdown-it-py ==3.0.0
- markupsafe ==2.1.5
- matplotlib ==3.7.5
- mdurl ==0.1.2
- more-itertools ==10.2.0
- mpmath ==1.3.0
- multidict ==6.0.5
- multiprocess ==0.70.16
- munch ==4.0.0
- nameparser ==1.1.3
- neleval ==3.1.1
- networkx ==2.8.8
- nltk ==3.8.1
- numpy ==1.24.4
- nvidia-cublas-cu12 ==12.1.3.1
- nvidia-cuda-cupti-cu12 ==12.1.105
- nvidia-cuda-nvrtc-cu12 ==12.1.105
- nvidia-cuda-runtime-cu12 ==12.1.105
- nvidia-cudnn-cu12 ==8.9.2.26
- nvidia-cufft-cu12 ==11.0.2.54
- nvidia-curand-cu12 ==10.3.2.106
- nvidia-cusolver-cu12 ==11.4.5.107
- nvidia-cusparse-cu12 ==12.1.0.106
- nvidia-nccl-cu12 ==2.19.3
- nvidia-nvjitlink-cu12 ==12.4.99
- nvidia-nvtx-cu12 ==12.1.105
- openai ==1.31.0
- packaging ==24.0
- pandas ==2.0.0
- pillow ==10.2.0
- pluggy ==1.4.0
- psutil ==5.9.8
- py-cpuinfo ==9.0.0
- pyarrow ==15.0.2
- pyarrow-hotfix ==0.6
- pydantic ==2.7.3
- pydantic-core ==2.18.4
- pygments ==2.17.2
- pyparsing ==3.1.2
- pytest ==7.4.4
- python-dateutil ==2.9.0.post0
- pytz ==2024.1
- pyyaml ==6.0.1
- regex ==2023.12.25
- requests ==2.31.0
- rich ==13.7.1
- sacred ==0.8.5
- sacremoses ==0.0.53
- safetensors ==0.4.2
- scienceplots ==2.1.1
- scikit-learn ==1.3.2
- scipy ==1.10.1
- seqeval ==1.2.2
- six ==1.16.0
- smmap ==5.0.1
- sniffio ==1.3.1
- sympy ==1.12
- threadpoolctl ==3.3.0
- tokenizers ==0.15.2
- tomli ==2.0.1
- torch ==2.2.1
- tqdm ==4.66.2
- transformers ==4.38.2
- triton ==2.2.0
- typing-extensions ==4.10.0
- tzdata ==2024.1
- urllib3 ==2.2.1
- wrapt ==1.16.0
- xxhash ==3.4.1
- yarl ==1.9.4
- zipp ==3.18.1