scribl
scribl: A system for the semantic capture of relationships in biological literature - Published in JOSS (2024)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Repository
Semantic Capture of Relationships In Biological Literature (scribl)
Basic Info
- Host: GitHub
- Owner: amberbiology
- License: agpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://amberbiology.com/
- Size: 5.86 MB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 5
Topics
Metadata Files
README.md
scribl
A system for the semantic capture of relationships in biological literature
The scribl language was designed for the curation from scientific articles, of the relationships between the various biological agents and processes that they describe, with a view to generating a graph database that captures these relationships as a connected network.
scribl statements are added as tags to articles in a literature database (the scribl codebase currently supports the free, open-source Zotero database). These tags are parsed and used to create a graph data structure that can be then be exported for use in a graph database platform such as neo4j, or Python's NetworkX.
For example:
::agent c9orf72 :gene :protein :url https://www.uniprot.org/uniprot/Q96LT7
::agent gtp :tag nucleoside, purine, nucleoside triphosphate
::process exportin releases cargo into cytoplasm @ exportin-1
::process smcr8 mutation > ulk1 phosphorylation < autophagy = smcr8 expression
Full details of the language are available in the full documentation contained in scribl.pdf.
How to cite scribl
If you write a paper that uses scribl in your analysis, please cite
both:
- our 2024 article published in the Journal of Open Source Software:
Webster GD, Lancaster AK. (2024) "scribl: A system for the semantic capture of relationships in biological literature." Journal of Open Source Software 9(99):6645. doi:10.21105/joss.06645
- and the Zenodo record for the software. To cite the correct version, follow these steps:
First visit the DOI for the overall Zenodo record: 10.5281/zenodo.12728362. This DOI represents all versions, and will always resolve to the latest one.
When you are viewing the record, look for the Versions box in the right-sidebar. Here are listed all versions (including older versions).
Select and click the version-specific DOI that matches the specific version of scribl that you used for your analysis.
Once you are visiting the Zenodo record for the specific version, under the Citation box in the right-sidebar, select the citation format you wish to use and click to copy the citation. It will contain link to the version-specific DOI, and be of the form:
Webster GD, Lancaster, AK. (YYYY) "scribl: A system for the semantic capture of relationships in biological literature" (Version X.Y.Z) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.XXXXX
Note that citation metadata for the current Zenodo record is also stored in CITATION.cff
Quickstart install and test
We recommend installing in a virtual environment.
Install from PyPI
shell
pip install scribl
Install via GitHub repo
shell
git clone https://github.com/amberbiology/scribl.git
cd scribl
pip install .
This will pull in all relevant Python dependencies, including pyparsing, pyzotero and others within your virtual environment
Test your installation
When scribl is installed, it contains a command-line program scribl
that can perform some common API tasks, and to help the user jumpstart
exploring using the system. After installation here are two example
command-line you can cut-and-paste and run from the terminal to check
installation:
Check version
shell
scribl --version
Check outputs
First create a new, empty directory and change into it, e.g.:
shell
mkdir /tmp/scribl-test
cd /tmp/scribl-test
Then run:
shell
scribl -g new_graphdb --zotero-library 5251557:group --networkx-fig graphdb-visual.png --graphmlfile graphdb.xml --cyphertextfile graphdb.cypher
This will read from a public Zotero group collection we have created
(ID 5251557) for testing purposes, that includes citations with
scribl tags. This example run generates the following outputs:
new_graphdb- directory containing the new scribl databasegraphdb.cypher- a plain text file in the Cypher query languagegraphdb.xml- an XML file in GraphML formatgraphdb-visual.png- a visualization of the GraphML graph in PNG format:

see below for a more detailed description of visualization options.
Visualization of the scribl database
As mentioned previously, a scribl database can be exported for use
in a graph database platform for further interactive exploration. One
of the most developed graph visualization platforms is as
Neo4j which reads the output in Cypher query
language . Unfortunately Neo4j isn't
directly available via Python packaging system. The instructions for
installation and configuration is beyond the scope of this
documentation, but they are available for most
platforms.
The main scribl documentation describes how to take the
exported Cypher text generated by a command like the above into Neo4j
and dynamically update the Neo4j database.
In the interests getting up and running quickly with visualizations,
but staying purely within the Python ecosystem, scribl also supports
outputs other than Cypher. The above mentioned
GraphML backend generates
visualizations using a combination of
NetworkX and matplotlib. This feature enables
generation of static visualizations of the current scribl database.
Visualizations generated via the GraphML/NetworkX are best suited for
small networks and to get a feel for the scribl database structure
with a minimum of fuss. For "production" level visualization as well
as interactive analyses, however, you are likely to want to try Neo4j.
Examples of scribl command-line program
Here are some common scribl tasks you can do using the scribl
command-line:
Create a new database from Zotero library
shell
scribl -g new_graphdb --zotero-library <LIBRARY_ID>:<TYPE> --zotero-api-key <API_KEY>
A scribl database is created in current directory with the name
new_graphdb using the Zotero LIBRARY_ID with specified TYPE
(either user or group). An API_KEY is only needed in the case of
accessing private libraries, or non-public group libraries. The
pyzotero library has
documentation on
how to find out your library ID and creating an API key.
Create a new database from a local Zotero CSV file
shell
scribl -g new_graphdb --zoterofile <ZOTERO_CSV>
This populates the database from a local CSV file ZOTERO_CSV exported
from a Zotero library. Note that the above two commands, just create (or
read from an existing) database, but generate no output. (Note that the
new_graphdb database is not overwritten if it already exists, unless the
--overwrite flag is also supplied)
To generate outputs, there are a number of options. All assume at least one Zotero database has been imported, either from a local CSV file, or via the Zotero library API (i.e. at least one of the import steps above have been run at least once). Outputs will be skipped if the database empty.
Output representation of graph in GraphML format
shell
scribl -g new_graphdb --graphmlfile <OUTPUT_XML>
This generates output in GraphML XML format (e.g. OUTPUT_XML should
be supplied with the appropriate extension, e.g. graphdb.xml).
Output a NetworkX visualization
shell
scribl -g new_graphdb --networkx-fig <OUTPUT_IMAGE>
This generates and saves the visualization as as one of the
matplotlib supported OUTPUT_IMAGE formats, using data from current
scribl database (for example, using graphdb-visual.pdf would
generate it in PDF format). The visualizations are produced from
GraphML XML output that is rendered using Python's NetworkX
library.
Output representation of graph as a Cypher file
shell
scribl -g new_graphdb --cyphertextfile <OUTPUT_CYPHER>
The above outputs a Cypher text representation of the database in the
file OUTPUT_CYPHER, as a plain text file in the previously mentioned
Cypher query language, suitable for import
into Neo4J.
For a full description of all command-line arguments, run:
shell
scribl --help
Using the scribl API directly in your programs
You can write your own Python program using the API directly. For
example, the code below, generates the same outputs as the check
outputs example above. It creates a new database,
populates it from the same remote Zotero library, and then generates
graphdb.xml (GraphML), graphdb.cypher (Cypher) and
graphdb-visual.png (PNG).
```Python from scribl.manage_graphdb import GraphDBInstance
gdb = GraphDBInstance("newgraphdb") gdb.setmetadata("Test Scribl DB", "User name", "Small test") gdb.importzoterolibrary(5251557, "group") gdb.loadzoterocsv() gdb.exportgraphmltext(filepath="graphdb.xml") gdb.exportgraphmlfigure(filepath="graphdb-visual.png") gdb.exportcyphertext(filepath="graphdb.cypher") ```
The code for the scribl command contains more
examples of use of the API. In addition, more details can be found
in the main documentation.
Running unit tests with pytest
- clone the repo:
shell
git clone https://github.com/amberbiology/scribl.git
- install with
pipwith the optiontestpackage :
shell
pip install .[test]
- run
pytest
Note that pytest creates output sandbox test files
in the system temporary directory,
e.g. on Linux pytest creates a directory structure
/tmp/pytest-of-USER/pytest-NUM/scribl_sandboxNUM where USER is
the current user and NUM is incremented on each run. Only the
last three directories are retained.
Development and contributing to scribl
scribl is completely open-source and being developed by Amber Biology LLC (@amberbiology).
See CHANGELOG.md for a history of changes.
If you're interested in contributing, please read our CONTRIBUTING guide.
Copyright and license
scribl is Copyright (C) 2023, 2024. Amber Biology LLC
scribl is distributed under the terms of AGPL-3.0 license
Acknowledgements
The development of the scribl platform was made possible by the funding and expertise provided by the Association for Frontotemporal Degeneration (AFTD - https://www.theaftd.org/) whose mission is to improve the quality of life of people affected by Frontotemporal Degeneration and drive research to a cure. We are particularly grateful to AFTD leadership team members Debra Niehoff and Penny Dacks for their invaluable direction and guidance, and for getting us access to leading researchers in the field of neurodegenerative disease - all of whose insights and advice were pivotal in the development of this free, open-source research tool.
Owner
- Name: Amber Biology
- Login: amberbiology
- Kind: organization
- Email: info@amberbiology.com
- Location: Cambridge MA
- Website: http://amberbiology.com
- Twitter: amberbiology
- Repositories: 1
- Profile: https://github.com/amberbiology
Amber Biology's Code Repository
JOSS Publication
scribl: A system for the semantic capture of relationships in biological literature
Authors
Tags
systems biology network biology literature mining bioinformatics computational biology Zotero graph databaseCitation (CITATION.cff)
cff-version: 1.2.0
abstract: >-
scribl (Semantic Capture of Relationships in Biological Literature) is a
Python software pipeline and API that transforms a Zotero literature database
with entries annotated with tags in the **scribl** language to create a graph
data structure, suitable for import into a graph database platform, such as
Neo4j or Python's NetworkX. The scribl language was designed for the curation
from scientific articles, to capture the relationships between the various
biological agents and the processes that they describe in a connected network.
authors:
- family-names: Webster
given-names: Gordon
affiliation: Amber Biology LLC
orcid: https://orcid.org/0009-0009-2862-0467
- family-names: Lancaster
given-names: Alexander K.
orcid: https://orcid.org/0000-0002-0002-9263
affiliation: >-
Amber Biology LLC and Institute for Globally Distributed Open Research and
Education (IGDORE)
title: >-
scribl: A system for the semantic capture of relationships in biological
literature
url: https://amberbiology.com
repository-code: https://github.com/amberbiology/scribl
type: software
keywords:
- systems biology
- network biology
- literature mining
- bioinformatics
- computational biology
license: AGPL-3.0-or-later
version: v0.8.3
doi: 10.5281/zenodo.16892066
message: >-
If you use this software, please cite our article in the Journal of Open
Source Software.
preferred-citation:
authors:
- family-names: Webster
given-names: Gordon D.
orcid: https://orcid.org/0009-0009-2862-0467
- family-names: Lancaster
given-names: Alexander K.
orcid: https://orcid.org/0000-0002-0002-9263
date-published: 2024-07-17
doi: 10.21105/joss.06645
issn: 2475-9066
issue: 99
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6645
title: >-
scribl: A system for the semantic capture of relationships in biological
literature
type: article
url: https://joss.theoj.org/papers/10.21105/joss.06645
volume: 9
GitHub Events
Total
- Release event: 6
- Watch event: 1
- Delete event: 42
- Issue comment event: 5
- Push event: 93
- Pull request event: 74
- Pull request review event: 31
- Create event: 45
Last Year
- Release event: 6
- Watch event: 1
- Delete event: 42
- Issue comment event: 5
- Push event: 93
- Pull request event: 74
- Pull request review event: 31
- Create event: 45
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alex Lancaster | a****r | 38 |
| dependabot[bot] | 4****] | 35 |
| pre-commit-ci[bot] | 6****] | 19 |
| ci-scribl | u****e | 4 |
| zenodraft/action | 3 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 123
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Total issue authors: 1
- Total pull request authors: 4
- Average comments per issue: 0.0
- Average comments per pull request: 0.13
- Merged pull requests: 100
- Bot issues: 0
- Bot pull requests: 99
Past Year
- Issues: 0
- Pull requests: 78
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.1
- Merged pull requests: 65
- Bot issues: 0
- Bot pull requests: 72
Top Authors
Issue Authors
- benlansdell (1)
Pull Request Authors
- dependabot[bot] (70)
- pre-commit-ci[bot] (39)
- alexlancaster (25)
- atrisovic (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 147 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
pypi.org: scribl
A system for the semantic capture of relationships in biological literature
- Homepage: http://amberbiology.com
- Documentation: https://scribl.readthedocs.io/
- License: GNU AGPLv3+
-
Latest release: 0.8.3
published 6 months ago
Rankings
Maintainers (1)
Dependencies
- matplotlib <= 3.8.2
- networkx <= 3.2.1
- pandas <= 2.1.4
- pyparsing <= 3.1.1
- pyzotero <= 1.5.18
