https://github.com/blueobelisk/chemicaltagger

ChemicalTagger is a tool for semantic text-mining in chemistry.

https://github.com/blueobelisk/chemicaltagger

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

ChemicalTagger is a tool for semantic text-mining in chemistry.

Basic Info
  • Host: GitHub
  • Owner: BlueObelisk
  • License: apache-2.0
  • Language: Java
  • Default Branch: master
  • Homepage:
  • Size: 21.9 MB
Statistics
  • Stars: 43
  • Watchers: 6
  • Forks: 10
  • Open Issues: 1
  • Releases: 0
Created over 6 years ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

ChemicalTagger Overview

Build Status Maven Central

ChemicalTagger is a tool for semantic text-mining in chemistry; the associated publication can be found here.

A. Components:

This package is used for marking up experimental sections in chemistry papers: It has 3 main classes:

I. ChemistryPOSTagger:

This class takes a sentence and runs it against (by default) three taggers:

  • OSCAR4 (for chemical entities)
  • Regex (for recognising key words)
  • OpenNLP (for English parts of speech)

II. ChemistrySentenceParser:

This class converts a tagged sentence into a parseTree. It uses a lexer and parser generated by the ANTLR grammar.

III. ASTtoXML:

This class converts a parseTree into an XML document.

B. Running ChemicalTagger:

```java import uk.ac.cam.ch.wwmm.chemicaltagger.POSContainer; import uk.ac.cam.ch.wwmm.chemicaltagger.ChemistryPOSTagger; import uk.ac.cam.ch.wwmm.chemicaltagger.ChemistrySentenceParser; import uk.ac.cam.ch.wwmm.chemicaltagger.Utils; import nu.xom.Document;

public class ChemicalTaggerTest {

public static void main(String[] args) { String text = "A solution of 124C (7.0 g, 32.4 mmol) in concentrate H2SO4 " + "(9.5 mL) was added to a solution of concentrate H2SO4 (9.5 mL) " + "and fuming HNO3 (13 mL) and the mixture was heated at 60°C for " + "30 min. After cooling to room temperature, the reaction mixture " + "was added to iced 6M solution of NaOH (150 mL) and neutralized " + "to pH 6 with 1N NaOH solution. The reaction mixture was extracted " + "with dichloromethane (4x100 mL). The combined organic phases were " + "dried over Na2SO4, filtered and concentrated to give 124D as a solid.";

  // Calling ChemistryPOSTagger
  POSContainer posContainer = ChemistryPOSTagger.getDefaultInstance().runTaggers(text);

  // Returns a string of TAG TOKEN format (e.g.: DT The NN cat VB sat IN on DT the NN matt)
  // Call ChemistrySentenceParser either by passing the POSContainer or by InputStream
  ChemistrySentenceParser chemistrySentenceParser = new ChemistrySentenceParser(posContainer);

  // Create a parseTree of the tagged input
  chemistrySentenceParser.parseTags();

  // Return an XMLDoc
  Document doc = chemistrySentenceParser.makeXMLDocument();

  Utils.writeXMLToFile(doc,"target/file1.xml");

} } ```

Owner

  • Name: Blue Obelisk
  • Login: BlueObelisk
  • Kind: organization

GitHub Events

Total
  • Watch event: 5
  • Push event: 1
  • Fork event: 2
Last Year
  • Watch event: 5
  • Push event: 1
  • Fork event: 2

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 8
  • Total pull requests: 4
  • Average time to close issues: 17 days
  • Average time to close pull requests: about 8 hours
  • Total issue authors: 7
  • Total pull request authors: 3
  • Average comments per issue: 2.63
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 2 months
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • biotech7 (2)
  • abhibha1807 (1)
  • CreamyLong (1)
  • lo2aayy (1)
  • mjw99 (1)
  • tongyey (1)
  • sathiyabalu89 (1)
Pull Request Authors
  • dependabot[bot] (3)
  • lo2aayy (1)
  • dan2097 (1)
Top Labels
Issue Labels
question (1)
Pull Request Labels
dependencies (3)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 1
  • Total dependent repositories: 4
  • Total versions: 6
repo1.maven.org: uk.ac.cam.ch.wwmm:chemicalTagger

ChemicalTagger

  • Versions: 6
  • Dependent Packages: 1
  • Dependent Repositories: 4
Rankings
Dependent repos count: 12.1%
Average: 30.2%
Dependent packages count: 33.0%
Stargazers count: 35.8%
Forks count: 39.8%
Last synced: 11 months ago

Dependencies

pom.xml maven
  • com.tunnelvisionlabs:antlr4 4.5.3
  • com.tunnelvisionlabs:antlr4-annotations 4.5.3
  • com.tunnelvisionlabs:antlr4-runtime 4.5.3
  • commons-io:commons-io 2.11.0
  • commons-lang:commons-lang 2.6
  • org.apache.logging.log4j:log4j-1.2-api 2.18.0
  • org.apache.opennlp:opennlp-tools 1.9.4
  • org.jsoup:jsoup 1.15.2
  • uk.ac.cam.ch.wwmm.oscar:oscar4-api 5.2.0
  • uk.ac.cam.ch.wwmm.oscar:oscar4-data 5.2.0
  • xom:xom 1.3.7
  • junit:junit test
  • org.xml-cml:jumbo-testutil 1.0.1 test
.github/workflows/maven.yml actions
  • actions/checkout v3 composite
  • actions/setup-java v3 composite