https://github.com/blueobelisk/iupac-names
Project that collects all possible used IUPAC names in literature into a CCZero database.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
Project that collects all possible used IUPAC names in literature into a CCZero database.
Basic Info
- Host: GitHub
- Owner: BlueObelisk
- License: cc0-1.0
- Language: Groovy
- Default Branch: main
- Size: 20.6 MB
Statistics
- Stars: 6
- Watchers: 7
- Forks: 2
- Open Issues: 10
- Releases: 4
Metadata Files
README.md
iupac-names
Project that collects all possible used IUPAC names in literature into a CCZero database. Every contributor guarantees that the contribution is CCZero and void of any legal claim otherwise. Autogenerated IUPAC names are forbidden and the IUPAC name must be found in literature. The latter includes the IUPAC names to be part of larger names, but a valid IUPAC name by itself. Zero metadata on the origin of the IUPAC name is recorded, and just the existence that the IUPAC name exists is the copyright-free fact we are recording here.
Our ambition is to have 1M IUPAC names within the first year.
This repository is very simple, consists of a single, sorted list of IUPAC names in the iupac-names.txt file.
Each line in that file is a valid IUPAC names, as defined by OPSIN
being able to generate a SMILES
string from it.
Adding new names
The list is sort and contains only unique names. On GNU/Linux, the reference algorithm for this process is:
shell
csvtool -u TAB -t TAB col 3 ~/Downloads/chemicals.tsv | tail -n +2 | sort | uniq >> iupac-names.txt
sort -f iupac-names.txt | uniq -i | tee tmp.txt | wc -l ; mv tmp.txt iupac-names.txt
There are small name variants, like 1,1 Dimethylhydrazine and 1,1-dimethylhydrazine,
of which it is not clear if they are typos in the articles, artifacts of the text mining,
but we do know they parse into a SMILES. By removing all spaces and all hyphens, we can count
the number of unique lower-case names with:
shell
cat iupac-names.txt | sed 's/[--‐]//g' | sed 's/\ //' | tr '[:upper:]' '[:lower:]' | sort | uniq | tee iupac-names-flat.txt | wc -l
Calculating unique InChIKeys
As an idea of the chemical space covered, we can check the number of unique InChIKeys (mind the tautomerism normalization):
shell
groovy extractInChIKeys.groovy | sort | uniq | tee inchikeys.txt | wc -l
Owner
- Name: Blue Obelisk
- Login: BlueObelisk
- Kind: organization
- Website: http://www.blueobelisk.org/
- Repositories: 59
- Profile: https://github.com/BlueObelisk
GitHub Events
Total
- Create event: 4
- Release event: 4
- Issues event: 14
- Watch event: 6
- Issue comment event: 21
- Push event: 133
- Pull request event: 1
- Fork event: 2
Last Year
- Create event: 4
- Release event: 4
- Issues event: 14
- Watch event: 6
- Issue comment event: 21
- Push event: 133
- Pull request event: 1
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 12
- Total pull requests: 1
- Average time to close issues: 26 days
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.25
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 12
- Pull requests: 1
- Average time to close issues: 26 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.25
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- egonw (11)
- haydn-jones (1)
Pull Request Authors
- dehaenw (1)