knowurenvironment

Official release of KnowUREnvironment, a knowledge graph on climate change and related environmental issues. Paper link: https://www.climatechange.ai/papers/aaaifss2022/3

https://github.com/saiful1105020/knowurenvironment

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary

Keywords

climate-change knowledge-graph natural-language-processing
Last synced: 6 months ago · JSON representation ·

Repository

Official release of KnowUREnvironment, a knowledge graph on climate change and related environmental issues. Paper link: https://www.climatechange.ai/papers/aaaifss2022/3

Basic Info
  • Host: GitHub
  • Owner: saiful1105020
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 10.3 MB
Statistics
  • Stars: 11
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
climate-change knowledge-graph natural-language-processing
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

KnowUREnvironment

Despite climate change being one of the greatest threats to humanity, many people are still in denial or lack motivation for appropriate action. A structured source of knowledge can help increase public awareness while also helping crucial natural language understanding tasks such as information retrieval, question answering, and recommendation systems. We introduce KnowUREnvironment -- a knowledge graph for climate change and related environmental issues, extracted from the scientific literature. We automatically identify 210,230 domain-specific entities/concepts and encode how these concepts are interrelated with 411,860 RDF triples backed up with evidence from the literature, without using any supervision or human intervention. Human evaluation shows our extracted triples are syntactically and factually correct (81.69% syntactic correctness and 75.85% precision). The proposed framework can be easily extended to any domain that can benefit from such a knowledge graph.

Climate Change Abstracts

We extracted 228,860 abstracts related to climate change and environmental issues, by mining articles published before 2020. The following keywords were matched against author provided keywords to find the relevant articles:

  • climate change
  • sustainability
  • pollution
  • global warming
  • sea-level rise
  • climate
  • water stress
  • coastal flooding

Download the extracted abstracts with author-provided keywords

About the Knowledge Graph

KnowUREnvironment includes facts about climate change and related environmental issues in the form of Resource Description Framework (RDF) triples. An RDF triple consists of three components: (i) subject, (ii) predicate, and (iii) object. For example, we can express the fact "automobile emits CO2" can be expressed as ("automobile", "emit", "CO2") where "automobile" is the subject, "CO2" is the object, and "emit" is a directional relationship between the subject and the object. The entire set of the RDF triples can be captured by a directed multigraph (there may be multiple relationships between the same subject-object pair) where the subjects and objects are represented using nodes, and the predicate is represented using a directed, labeled edge in the graph.

A snapshot of KnowUREnvironment depicts how "automobile" could be related to climate change and related issues. Within a few hops, the graph can connect concepts from diverse yet relevant fields like environment, agriculture, and public health demonstrating how powerful yet so compact a knowledge graph can be.

Accessing the RDF Triples

The RDF triplets extracted by mining academic literature are made available in this repository.

The CSV file contains 6 columns:

  • subject of the RDF triple
  • object of the RDF triple
  • relation is the predicate of the RDF triple, a directed relationship from the subject to the object
  • paper_id traces the paper(s) from where the triple was extracted. Please use the uploaded json file in Climate Change Abstracts section to link this paper_id to the corresponding paper. The json file contains a list of 228,860 papers, the first paper in the list is assigned paper_id = 0, and the last paper is assigned paper_id = 228,859.
  • sentence_no traces the exact sentence in the corresponding paper abstract from where the triple is extracted. The corresponding paper_id is index matched with the sentence_no column. For example, if a triple is extracted from multiple papers, the paper_id column can be [17218, 36262] and the sentence_no column can be [3,4] -- the evidence sentences can be found from the 3rd sentence of the abstract of 17218th paper, and the 4th sentence of the abstract of the 36262th paper.
  • num_evidence is the number of different sources the triple was obtained from. More sources typically means the triple can be trusted more.

Each triple can be traced back to the sentence of the abstract of a published paper by using the "paperid", and "sentenceno" column. Please note that the knowledge graph is still ongoing development, and currently made available for information purpose only.

Another version of the knowledge graph that is less trusted, but more extensive in terms of coverage is also available to download.

Sample code for showing evidence for the triples:

```python import pandas as pd import json import nltk from nltk.tokenize import sent_tokenize import numpy as np

with open("allabstractswithkeywords.json", encoding='utf-8') as f: allabstracts = json.loads(f.read())

df = pd.read_csv("final_tuples.csv")
for i,r in df.iterrows():
    triple = (r['subject'], r['relation'], r['object'])
    evidence = []

    papers = np.asarray(r['paper_id'][1:-1].split(","), dtype=int)
    n = len(papers)

    for j in range(0,n):
        abstract = sent_tokenize(all_abstracts[papers[j]]['abstract'])
        sentence_nos = np.asarray(r['sentence_no'][1:-1].split(","), dtype=int) 
        sentence = abstract[sentence_nos[j]]
        evidence.append(sentence)

    print("+"*20)
    print("Triple: ",end="")
    print(triple)
    print("Evidence sentence: ",end="\n")
    for j in range(0,len(evidence)):
        print("%d. %s"%((j+1),evidence[j]))
    print("-"*20)

    break

```

Code for Triplet Extraction

i. Install AMR: Follow the installation instructions of AMR: https://amrlib.readthedocs.io/en/latest/install/ The requirements.txt file (for AMR installation) is here: https://github.com/bjascob/amrlib/blob/master/requirements.txt

ii. Install additional packages (e.g., matplotlib, nltk, rapidfuzz).

Alternatively, the conda environment I worked with is provided as an environment.yml file. You may create an environment with the yml file and then download and place the AMR models as instructed here: https://amrlib.readthedocs.io/en/latest/install/

ClimateKB.ipynb provides a more detailed documentation of the methodology, and can be used for extracting the triplets.

Visualization.ipynb visualizes a part of the knowledge graph (hand-picked) as a directed graph.

Citation

If you are using any of the materials from this repository, please make sure to cite the following article:

js @article{islam2022know, title={KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues}, author={Islam, Md Saiful and Proma, Adiba and Zhou, Yilin and Akter, Syeda Nahida and Wohn, Caleb and Hoque, Ehsan}, booktitle={2022 AAAI Fall Symposium Series}, year={2022} }

Owner

  • Name: Md. Saiful Islam
  • Login: saiful1105020
  • Kind: user
  • Location: Rochester, New York. United States
  • Company: Graduate Fellow, University of Rochester

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use any of the materials from this repository, please cite it as below."
authors:
- family-names: "Islam"
  given-names: "Md Saiful"
  orcid: "https://orcid.org/0000-0003-3725-3493"
- family-names: "Proma"
  given-names: "Adiba"
- family-names: "Zhou"
  given-names: "Yilin"
- family-names: "Akter"
  given-names: "Syeda Nahida"
- family-names: "Wohn"
  given-names: "Caleb"
- family-names: "Hoque"
  given-names: "Ehsan"
title: "KnowUREnvironment: An Automated Knowledge Graph for Climate Change and Environmental Issues"
doi: #
date-released: 2022-11-18
url: "https://github.com/saiful1105020/KnowUREnvironment"
booktitle: "2022 AAAI Fall Symposium Series"

GitHub Events

Total
  • Watch event: 4
Last Year
  • Watch event: 4

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 36
  • Total Committers: 1
  • Avg Commits per committer: 36.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Md. Saiful Islam s****2@g****m 36

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml conda
  • _libgcc_mutex 0.1
  • blas 1.0
  • brotlipy 0.7.0
  • bzip2 1.0.8
  • ca-certificates 2022.3.29
  • certifi 2021.5.30
  • cffi 1.15.0
  • charset-normalizer 2.0.4
  • cryptography 36.0.0
  • cudatoolkit 10.2.89
  • ffmpeg 4.3
  • freetype 2.11.0
  • giflib 5.2.1
  • gmp 6.2.1
  • gnutls 3.6.15
  • idna 3.3
  • intel-openmp 2021.4.0
  • jpeg 9e
  • lame 3.100
  • lcms2 2.12
  • ld_impl_linux-64 2.35.1
  • libffi 3.3
  • libgcc-ng 9.1.0
  • libiconv 1.16
  • libidn2 2.3.2
  • libpng 1.6.37
  • libstdcxx-ng 9.1.0
  • libtasn1 4.16.0
  • libtiff 4.2.0
  • libunistring 0.9.10
  • libuuid 1.0.3
  • libuv 1.40.0
  • libwebp 1.2.2
  • libwebp-base 1.2.2
  • lz4-c 1.9.3
  • mkl 2021.4.0
  • mkl-service 2.4.0
  • mkl_fft 1.3.1
  • mkl_random 1.2.2
  • ncurses 6.3
  • nettle 3.7.3
  • numpy 1.21.5
  • numpy-base 1.21.5
  • openh264 2.1.1
  • openssl 1.1.1n
  • pillow 9.0.1
  • pip 21.2.4
  • pycparser 2.21
  • pyopenssl 22.0.0
  • pysocks 1.7.1
  • python 3.10.4
  • pytorch 1.11.0
  • pytorch-mutex 1.0
  • readline 8.1.2
  • requests 2.27.1
  • setuptools 61.2.0
  • six 1.16.0
  • sqlite 3.38.2
  • tk 8.6.11
  • torchaudio 0.11.0
  • torchvision 0.12.0
  • typing_extensions 4.1.1
  • tzdata 2022a
  • urllib3 1.26.9
  • wheel 0.37.1
  • xz 5.2.5
  • zlib 1.2.12
  • zstd 1.4.9