knowurenvironment
Official release of KnowUREnvironment, a knowledge graph on climate change and related environmental issues. Paper link: https://www.climatechange.ai/papers/aaaifss2022/3
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Keywords
Repository
Official release of KnowUREnvironment, a knowledge graph on climate change and related environmental issues. Paper link: https://www.climatechange.ai/papers/aaaifss2022/3
Basic Info
Statistics
- Stars: 11
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
KnowUREnvironment
Despite climate change being one of the greatest threats to humanity, many people are still in denial or lack motivation for appropriate action. A structured source of knowledge can help increase public awareness while also helping crucial natural language understanding tasks such as information retrieval, question answering, and recommendation systems. We introduce KnowUREnvironment -- a knowledge graph for climate change and related environmental issues, extracted from the scientific literature. We automatically identify 210,230 domain-specific entities/concepts and encode how these concepts are interrelated with 411,860 RDF triples backed up with evidence from the literature, without using any supervision or human intervention. Human evaluation shows our extracted triples are syntactically and factually correct (81.69% syntactic correctness and 75.85% precision). The proposed framework can be easily extended to any domain that can benefit from such a knowledge graph.
Climate Change Abstracts
We extracted 228,860 abstracts related to climate change and environmental issues, by mining articles published before 2020. The following keywords were matched against author provided keywords to find the relevant articles:
- climate change
- sustainability
- pollution
- global warming
- sea-level rise
- climate
- water stress
- coastal flooding
Download the extracted abstracts with author-provided keywords
About the Knowledge Graph
KnowUREnvironment includes facts about climate change and related environmental issues in the form of Resource Description Framework (RDF) triples. An RDF triple consists of three components: (i) subject, (ii) predicate, and (iii) object. For example, we can express the fact "automobile emits CO2" can be expressed as ("automobile", "emit", "CO2") where "automobile" is the subject, "CO2" is the object, and "emit" is a directional relationship between the subject and the object. The entire set of the RDF triples can be captured by a directed multigraph (there may be multiple relationships between the same subject-object pair) where the subjects and objects are represented using nodes, and the predicate is represented using a directed, labeled edge in the graph.
Accessing the RDF Triples
The RDF triplets extracted by mining academic literature are made available in this repository.
The CSV file contains 6 columns:
- subject of the RDF triple
- object of the RDF triple
- relation is the predicate of the RDF triple, a directed relationship from the subject to the object
- paper_id traces the paper(s) from where the triple was extracted. Please use the uploaded json file in Climate Change Abstracts section to link this paper_id to the corresponding paper. The json file contains a list of 228,860 papers, the first paper in the list is assigned paper_id = 0, and the last paper is assigned paper_id = 228,859.
- sentence_no traces the exact sentence in the corresponding paper abstract from where the triple is extracted. The corresponding paper_id is index matched with the sentence_no column. For example, if a triple is extracted from multiple papers, the paper_id column can be [17218, 36262] and the sentence_no column can be [3,4] -- the evidence sentences can be found from the 3rd sentence of the abstract of 17218th paper, and the 4th sentence of the abstract of the 36262th paper.
- num_evidence is the number of different sources the triple was obtained from. More sources typically means the triple can be trusted more.
Each triple can be traced back to the sentence of the abstract of a published paper by using the "paperid", and "sentenceno" column. Please note that the knowledge graph is still ongoing development, and currently made available for information purpose only.
Another version of the knowledge graph that is less trusted, but more extensive in terms of coverage is also available to download.
Sample code for showing evidence for the triples:
```python import pandas as pd import json import nltk from nltk.tokenize import sent_tokenize import numpy as np
with open("allabstractswithkeywords.json", encoding='utf-8') as f: allabstracts = json.loads(f.read())
df = pd.read_csv("final_tuples.csv")
for i,r in df.iterrows():
triple = (r['subject'], r['relation'], r['object'])
evidence = []
papers = np.asarray(r['paper_id'][1:-1].split(","), dtype=int)
n = len(papers)
for j in range(0,n):
abstract = sent_tokenize(all_abstracts[papers[j]]['abstract'])
sentence_nos = np.asarray(r['sentence_no'][1:-1].split(","), dtype=int)
sentence = abstract[sentence_nos[j]]
evidence.append(sentence)
print("+"*20)
print("Triple: ",end="")
print(triple)
print("Evidence sentence: ",end="\n")
for j in range(0,len(evidence)):
print("%d. %s"%((j+1),evidence[j]))
print("-"*20)
break
```
Code for Triplet Extraction
i. Install AMR: Follow the installation instructions of AMR: https://amrlib.readthedocs.io/en/latest/install/ The requirements.txt file (for AMR installation) is here: https://github.com/bjascob/amrlib/blob/master/requirements.txt
ii. Install additional packages (e.g., matplotlib, nltk, rapidfuzz).
Alternatively, the conda environment I worked with is provided as an environment.yml file. You may create an environment with the yml file and then download and place the AMR models as instructed here: https://amrlib.readthedocs.io/en/latest/install/
ClimateKB.ipynb provides a more detailed documentation of the methodology, and can be used for extracting the triplets.
Visualization.ipynb visualizes a part of the knowledge graph (hand-picked) as a directed graph.
Citation
If you are using any of the materials from this repository, please make sure to cite the following article:
js
@article{islam2022know,
title={KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues},
author={Islam, Md Saiful and Proma, Adiba and Zhou, Yilin and Akter, Syeda Nahida and Wohn, Caleb and Hoque, Ehsan},
booktitle={2022 AAAI Fall Symposium Series},
year={2022}
}
Owner
- Name: Md. Saiful Islam
- Login: saiful1105020
- Kind: user
- Location: Rochester, New York. United States
- Company: Graduate Fellow, University of Rochester
- Repositories: 8
- Profile: https://github.com/saiful1105020
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use any of the materials from this repository, please cite it as below." authors: - family-names: "Islam" given-names: "Md Saiful" orcid: "https://orcid.org/0000-0003-3725-3493" - family-names: "Proma" given-names: "Adiba" - family-names: "Zhou" given-names: "Yilin" - family-names: "Akter" given-names: "Syeda Nahida" - family-names: "Wohn" given-names: "Caleb" - family-names: "Hoque" given-names: "Ehsan" title: "KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues" doi: # date-released: 2022-11-18 url: "https://github.com/saiful1105020/KnowUREnvironment" booktitle: "2022 AAAI Fall Symposium Series"
GitHub Events
Total
- Watch event: 4
Last Year
- Watch event: 4
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Md. Saiful Islam | s****2@g****m | 36 |
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- _libgcc_mutex 0.1
- blas 1.0
- brotlipy 0.7.0
- bzip2 1.0.8
- ca-certificates 2022.3.29
- certifi 2021.5.30
- cffi 1.15.0
- charset-normalizer 2.0.4
- cryptography 36.0.0
- cudatoolkit 10.2.89
- ffmpeg 4.3
- freetype 2.11.0
- giflib 5.2.1
- gmp 6.2.1
- gnutls 3.6.15
- idna 3.3
- intel-openmp 2021.4.0
- jpeg 9e
- lame 3.100
- lcms2 2.12
- ld_impl_linux-64 2.35.1
- libffi 3.3
- libgcc-ng 9.1.0
- libiconv 1.16
- libidn2 2.3.2
- libpng 1.6.37
- libstdcxx-ng 9.1.0
- libtasn1 4.16.0
- libtiff 4.2.0
- libunistring 0.9.10
- libuuid 1.0.3
- libuv 1.40.0
- libwebp 1.2.2
- libwebp-base 1.2.2
- lz4-c 1.9.3
- mkl 2021.4.0
- mkl-service 2.4.0
- mkl_fft 1.3.1
- mkl_random 1.2.2
- ncurses 6.3
- nettle 3.7.3
- numpy 1.21.5
- numpy-base 1.21.5
- openh264 2.1.1
- openssl 1.1.1n
- pillow 9.0.1
- pip 21.2.4
- pycparser 2.21
- pyopenssl 22.0.0
- pysocks 1.7.1
- python 3.10.4
- pytorch 1.11.0
- pytorch-mutex 1.0
- readline 8.1.2
- requests 2.27.1
- setuptools 61.2.0
- six 1.16.0
- sqlite 3.38.2
- tk 8.6.11
- torchaudio 0.11.0
- torchvision 0.12.0
- typing_extensions 4.1.1
- tzdata 2022a
- urllib3 1.26.9
- wheel 0.37.1
- xz 5.2.5
- zlib 1.2.12
- zstd 1.4.9