https://github.com/acdh-oeaw/acdh-cidoc-pyutils
Helper functions for the generation of CIDOC CRMish RDF
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary
Repository
Helper functions for the generation of CIDOC CRMish RDF
Basic Info
- Host: GitHub
- Owner: acdh-oeaw
- License: mit
- Language: Python
- Default Branch: main
- Size: 130 KB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 30
Metadata Files
README.md
acdh-cidoc-pyutils
Helper functions for the generation of CIDOC CRMish RDF (from XML/TEI data)
Installation
- install via
pip install acdh-cidoc-pyutils
Examples
- For 'real-world-examples' see e.g. semantic-kraus project
- also take a look into testcidocpyutils.py
extract cidoc:P14i_performed FRBROO:F51_ Pursuit triples from tei:person/tei:occupation nodes
```python
import lxml.etree as ET
from rdflib import URIRef
rom acdhcidocpyutils import makeoccupations, NSMAP
sample = """
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix xsd: http://www.w3.org/2001/XMLSchema# .
https://foo/bar/DWpers0091 ns1:P14i_performed https://foo/bar/DWpers0091/occupation/3, https://foo/bar/DWpers0091/occupation/franzi, https://foo/bar/DWpers0091/occupation/hansi, https://foo/bar/DWpers0091/occupation/sumsi .
https://foo/bar/DWpers0091/occupation/3 a http://iflastandards.info/ns/fr/frbr/frbroo#F51 ; rdfs:label "Bäckerin"@de .
https://foo/bar/DWpers0091/occupation/franzi a http://iflastandards.info/ns/fr/frbr/frbroo#F51 ; rdfs:label "Sängerin"@de .
https://foo/bar/DWpers0091/occupation/hansi a http://iflastandards.info/ns/fr/frbr/frbroo#F51 ; rdfs:label "Bürgermeister"@it ; ns1:P4hastime-span https://foo/bar/DWpers0091/occupation/hansi/time-span .
https://foo/bar/DWpers0091/occupation/hansi/time-span a ns1:E52Time-Span ; rdfs:label "1900-12 - 2000"^^xsd:string ; ns1:P82abeginofthebegin "1900-12"^^xsd:gYearMonth ; ns1:P82bendofthe_end "2000"^^xsd:gYear .
https://foo/bar/DWpers0091/occupation/sumsi a http://iflastandards.info/ns/fr/frbr/frbroo#F51 ; rdfs:label "Tischlermeister/Fleischhauer"@de ; ns1:P4hastime-span https://foo/bar/DWpers0091/occupation/sumsi/time-span .
https://foo/bar/DWpers0091/occupation/sumsi/time-span a ns1:E52Time-Span ; rdfs:label "1233-02-03 - 1233-02-03"^^xsd:string ; ns1:P82abeginofthebegin "1233-02-03"^^xsd:date ; ns1:P82bendofthe_end "1233-02-03"^^xsd:date . ```
extract birth/death triples from tei:person
```python import lxml.etree as ET from rdflib import URIRef from acdhcidocpyutils import makebirthdeath_entities, NSMAP
sample = """
doc = ET.fromstring(sample) x = doc.xpath(".//tei:person[1]", namespaces=NSMAP)[0] xmlid = x.attrib["{http://www.w3.org/XML/1998/namespace}id"].lower() itemid = f"https://foo/bar/{xmlid}" subj = URIRef(itemid) eventgraph, birthuri, birthtimestamp = makebirthdeathentities( subj, x, placeidxpath="//tei:placeName[1]/@key ) eventgraph, birthuri, birthtimestamp = makebirthdeathentities( subj, x, eventtype="death", verbose=True, datenodexpath="/tei:date[1]", placeidxpath="//tei:settlement[1]/@key" ) eventgraph.serialize(format="turtle")
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
birth example
https://foo/bar/dwpers0091/birth a ns1:E67Birth ; rdfs:label "Geburt von Gulbransson, Olaf Leonhard"@fr ; ns1:P4hastime-span https://foo/bar/dwpers0091/birth/time-span ; ns1:P7tookplaceat https://foo/bar/DWplace00139 ; ns1:P98broughtinto_life https://foo/bar/dwpers0091 .
https://foo/bar/dwpers0091/birth/time-span a ns1:E52Time-Span ; rdfs:label "1873-05-26 - 1873-05-26"^^xsd:string ; ns1:P82abeginofthebegin "1873-05-26"^^xsd:date ; ns1:P82bendofthe_end "1873-05-26"^^xsd:date .
death example
https://foo/bar/dwpers0091/death a ns1:E69Death ; rdfs:label "Geburt von Gulbransson, Olaf Leonhard"@fr ; ns1:P100wasdeathof https://foo/bar/dwpers0091 ; ns1:P7tookplaceat https://foo/bar/pmb50 ns1:P4has_time-span https://foo/bar/dwpers0091/death/time-span .
https://foo/bar/dwpers0091/death/time-span a ns1:E52Time-Span ; rdfs:label "1905-07-04 - 2000"^^xsd:string ; ns1:P82abeginofthebegin "1905-07-04"^^xsd:date ; ns1:P82bendofthe_end "2000"^^xsd:gYear . ```
create ns1:P168_place_is_defined_by "Point(456 123)"^^<geo:wktLiteral> . from tei:coords
```python
import lxml.etree as ET
from rdflib import Graph, URIRef, RDF
from acdhcidocpyutils import coordinatestop168, NSMAP, CIDOC
sample = """
doc = ET.fromstring(sample) g = Graph() for x in doc.xpath(".//tei:place", namespaces=NSMAP): xmlid = x.attrib["{http://www.w3.org/XML/1998/namespace}id"].lower() itemid = f"https://foo/bar/{xmlid}" subj = URIRef(itemid) g.add((subj, RDF.type, CIDOC["E53Place"])) g += coordinatesto_p168(subj, x) print(g.serialize())
returns
ttl
...
ns1:P168placeisdefinedby "Point(456 123)"^^geo:wktLiteral .
...
``
* Function parameterverboseprints information in case the given xpath does not return expected results which is a text node with two numbers separated by a given separator (default value isseparator=" ")
* Function parameterinverse(default:inverse=False`) changes the order of the coordinates.
date-like-string to casted rdflib.Literal
```python from acdhcidocpyutils import datetoliteral d dates = [ "1900", "1900-01", "1901-01-01", "foo", ] for x in dates: dateliteral = datetoliteral(x) print((dateliteral.datatype))
returns
http://www.w3.org/2001/XMLSchema#gYear
http://www.w3.org/2001/XMLSchema#gYearMonth
http://www.w3.org/2001/XMLSchema#date
http://www.w3.org/2001/XMLSchema#string
```
make some random URI
```python from acdhcidocpyutils import make_uri
domain = "https://hansi4ever.com/" version = "1" prefix = "sumsi" uri = make_uri(domain=domain, version=version, prefix=prefix) print(uri)
https://hansi4ever.com/1/sumsi/6ead32b8-9713-11ed-8065-65787314013c
uri = make_uri(domain=domain) print(uri)
https://hansi4ever.com/8b912e66-9713-11ed-8065-65787314013c
```
create an E52_Time-Span graph
```python from acdhcidocpyutils import createe52, makeuri uri = makeuri() e52 = createe52(uri, beginofbegin="1800-12-12", endofend="1900-01") print(e52.serialize())
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix xsd: http://www.w3.org/2001/XMLSchema# .
https://hansi4ever.com/387fb457-971b-11ed-8065-65787314013c a ns1:E52_Time-Span ;
rdfs:label "1800-12-12 - 1900-01"^^xsd:string ;
ns1:P82abeginofthebegin "1800-12-12"^^xsd:date ;
ns1:P82bendoftheend "1900-01"^^xsd:gYearMonth .
```
creates E42 from tei:org|place|person
takes a tei:person|place|org node, extracts their @xml:id and all tei:idno elements, derives idoc:E42_Identifier triples and relates them to a passed in subject via cidoc:P1_is_identified_by
```python
import lxml.etree as ET
from rdflib import Graph, URIRef, RDF
from acdhcidocpyutils import makee42identifiers, NSMAP, CIDOC
sample = """
doc = ET.fromstring(sample) g = Graph() for x in doc.xpath(".//tei:place|tei:org|tei:person|tei:bibl", namespaces=NSMAP): xmlid = x.attrib["{http://www.w3.org/XML/1998/namespace}id"].lower() itemid = f"https://foo/bar/{xmlid}" subj = URIRef(itemid) g.add((subj, RDF.type, CIDOC["E53Place"])) g += makee42identifiers( subj, x, typedomain="http://hansi/4/ever", default_lang="it", ) print(g.serialize(format="turtle"))
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix owl: http://www.w3.org/2002/07/owl# .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
https://foo/bar/dwplace00092 a ns1:E53Place ; ns1:P1isidentifiedby https://foo/bar/dwplace00092/identifier/DWplace00092, https://foo/bar/dwplace00092/identifier/idno/0, https://foo/bar/dwplace00092/identifier/idno/1, https://foo/bar/dwplace00092/identifier/idno/2 ; owl:sameAs https://pmb.acdh.oeaw.ac.at/entity/42085/, https://www.geonames.org/588409 .
http://hansi/4/ever/idno/URI/geonames a ns1:E55_Type .
http://hansi/4/ever/idno/foobarid a ns1:E55_Type .
http://hansi/4/ever/idno/pmb a ns1:E55_Type .
http://hansi/4/ever/xml-id a ns1:E55_Type .
https://foo/bar/dwplace00092/identifier/DWplace00092 a ns1:E42Identifier ; rdfs:label "Identifier: DWplace00092"@it ; rdf:value "DWplace00092"; ns1:P2has_type http://hansi/4/ever/xml-id .
https://foo/bar/dwplace00092/identifier/idno/0 a ns1:E42Identifier ; rdfs:label "Identifier: https://pmb.acdh.oeaw.ac.at/entity/42085/"@it ; rdf:value "https://pmb.acdh.oeaw.ac.at/entity/42085/"; ns1:P2has_type http://hansi/4/ever/idno/pmb .
https://foo/bar/dwplace00092/identifier/idno/1 a ns1:E42Identifier ; rdfs:label "Identifier: https://www.geonames.org/588409"@it ; rdf:value "https://www.geonames.org/588409" ns1:P2has_type http://hansi/4/ever/idno/URI/geonames .
https://foo/bar/dwplace00092/identifier/idno/2 a ns1:E42Identifier ; rdfs:label "Identifier: 12345"@it ; rdf:value "12345"; ns1:P2has_type http://hansi/4/ever/idno/foobarid . ```
creates appellations from tei:org|place|person
takes a tei:person|place|org node, extracts persName, placeName and orgName texts, @xml:lang and custom type values and returns cidoc:E33_41 and cidoc:E55 nodes linked via cidoc:P1_is_identified_by and cidoc:P2_has_type
```python import lxml.etree as ET from rdflib import Graph, URIRef, RDF from acdhcidocpyutils import make_appellations, NSMAP, CIDOC
sample = """
doc = ET.fromstring(sample) g = Graph() for x in doc.xpath(".//tei:place|tei:org|tei:person|tei:bibl", namespaces=NSMAP): xmlid = x.attrib["{http://www.w3.org/XML/1998/namespace}id"].lower() itemid = f"https://foo/bar/{xmlid}" subj = URIRef(itemid) g.add((subj, RDF.type, CIDOC["E53Place"])) g += makeappellations( subj, x, typedomain="http://hansi/4/ever", defaultlang="it" )
g.serialize(format="ttl")
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
https://foo/bar/dwplace00092 a ns1:E53Place ; ns1:P1isidentifiedby https://foo/bar/dwplace00092/appellation/0, https://foo/bar/dwplace00092/appellation/1, https://foo/bar/dwplace00092/appellation/2 .
http://hansi/4/ever/alt-label a ns1:E55Type ; rdfs:label "altlabel" .
http://hansi/4/ever/orig-name a ns1:E55Type ; rdfs:label "origname" .
http://hansi/4/ever/simple-name a ns1:E55Type ; rdfs:label "simplename" .
https://foo/bar/dwplace00092/appellation/0 a ns1:E33E41LinguisticAppellation ; rdfs:label "Reval (Tallinn)"@it ; ns1:P2has_type http://hansi/4/ever/orig-name .
https://foo/bar/dwplace00092/appellation/1 a ns1:E33E41LinguisticAppellation ; rdfs:label "Reval"@de ; ns1:P2has_type http://hansi/4/ever/simple-name .
https://foo/bar/dwplace00092/appellation/2 a ns1:E33E41LinguisticAppellation ; rdfs:label "Tallinn"@und ; ns1:P2has_type http://hansi/4/ever/alt-label . ```
connects to places (E53Place) with P89falls_within
python
domain = "https://foo/bar/"
subj = URIRef(f"{domain}place__237979")
sample = """
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<place xml:id="place__237979">
<placeName>Lerchenfelder Gürtel 48</placeName>
<desc type="entity_type">Wohngebäude (K.WHS)</desc>
<desc type="entity_type_id">36</desc>
<location type="coords">
<geo>48,209035 16,339257</geo>
</location>
<location>
<placeName ref="place__50">Wien</placeName>
<geo>48,208333 16,373056</geo>
</location>
</place>
</TEI>"""
doc = ET.fromstring(sample)
node = doc.xpath(".//tei:place[1]", namespaces=NSMAP)[0]
g = p89_falls_within(
subj, node, domain, location_id_xpath="./tei:location/tei:placeName/@ref"
)
result = g.serialize(format="ttl")
returns
```ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
https://foo/bar/place__237979 ns1:P89fallswithin https://foo/bar/place__50 . ```
creates E66Formation and E68Dissolution events
```python from acdhcidocpyutils import p95iwasformed_by from rdflib import Graph, URIRef
g = Graph()
subj = URIRef("https://wienerschnitzler.org")
label = "Wiener Moderne Verein"
g += p95iwasformedby(
subj, startdate="2023-10-14", enddate="2025-12-31", label=f"{label} wurde gegründet", labellang="de"
)
result = g.serialize(format="ttl")
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix xsd: http://www.w3.org/2001/XMLSchema# .
https://wienerschnitzler.org ns1:P95iwasformed_by https://wienerschnitzler.org/formation-event .
https://wienerschnitzler.org/dissolution-event a ns1:E68Dissolution ; rdfs:label "Institution wurde aufgelöst"@de ; ns1:P4has_time-span https://wienerschnitzler.org/dissolution-event/dissolution-time-span .
https://wienerschnitzler.org/dissolution-event/dissolution-time-span a ns1:E52Time-Span ; rdfs:label "2025-12-31"^^xsd:string ; ns1:P82abeginofthebegin "2025-12-31"^^xsd:date ; ns1:P82bendofthe_end "2025-12-31"^^xsd:date .
https://wienerschnitzler.org/formation-event a ns1:E66Formation ; rdfs:label "Wiener Moderne Verein wurde gegründet"@de ; ns1:P4has_time-span https://wienerschnitzler.org/formation-event/formation-time-span .
https://wienerschnitzler.org/formation-event/formation-time-span a ns1:E52Time-Span ; rdfs:label "2023-10-14"^^xsd:string ; ns1:P82abeginofthebegin "2023-10-14"^^xsd:date ; ns1:P82bendofthe_end "2023-10-14"^^xsd:date . ```
normalize_string
```python from acdhcidocpyutils import normalizestring string = """\n\nhallo mein schatz ich liebe dich du bist die einzige für mich """ print(normalizestring(string))
returns
hallo mein schatz ich liebe dich du bist die einzige für mich
```
extract date attributes (begin, end)
expects typical TEI date attributes like @when, @when-iso, @notBefore, @notAfter, @from, @to, ... and returns a tuple containg start- and enddate values. If only @when or @when-iso or only @notBefore or @notAfter are provided, the returned values are the same, unless the default parameter fill_missing is set to False.
```python from lxml.etree import Element from acdhcidocpyutils import extractbeginend
datestring = "1900-12-12" dateobject = Element("{http://www.tei-c.org/ns/1.0}tei") dateobject.attrib["when-iso"] = datestring print(extractbeginend(date_object))
returns
('1900-12-12', '1900-12-12')
datestring = "1900-12-12" dateobject = Element("{http://www.tei-c.org/ns/1.0}tei") dateobject.attrib["when-iso"] = datestring print(extractbeginend(dateobject, fillmissing=False))
returns
('1900-12-12', None)
dateobject = Element("{http://www.tei-c.org/ns/1.0}tei") dateobject.attrib["notAfter"] = "1900-12-12" dateobject.attrib["notBefore"] = "1800" print(extractbeginend(dateobject))
returns
('1800', '1900-12-12')
```
Convert a TEI document into an RDF graph representing a CIDOC CRM F24 Publication Expression.
```python from acdhcidocpyutils import teidocasf24publicationexpression
file_path = "L02643.xml" domain = "https://schnitzler-briefe.acdh.oeaw.ac.at"
uri, g, mentions = teidocasf24publicationexpression(
filepath, domain, ".//tei:titleStmt/tei:title[@level='a']"
)
g.serialize(filename.replace(".xml", ".ttl"))
returns
ttl
@prefix ns1: http://www.cidoc-crm.org/cidoc-crm/ .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
https://schnitzler-briefe.acdh.oeaw.ac.at/L02643.xml a http://iflastandards.info/ns/fr/frbr/frbroo/F24_Publication_Expression ; rdfs:label "Paul Goldmann an Arthur Schnitzler, 6. 8. 1889"@de ; ns1:P1isidentifiedby https://schnitzler-briefe.acdh.oeaw.ac.at/L02643.xml/appellation ; ns1:P67refers_to https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb11485, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb12698, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb169237, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb2121, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb213, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb29698, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb50, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb52510, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb53101, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb53104, https://schnitzler-briefe.acdh.oeaw.ac.at/#pmb88392 .
https://pfp-schema.acdh.oeaw.ac.at/types/tei-document a ns1:E55_Type ; rdfs:label "A TEI/XML encoded text"@en .
https://schnitzler-briefe.acdh.oeaw.ac.at/L02643.xml/appellation a ns1:E33E41LinguisticAppellation ; rdfs:label "Paul Goldmann an Arthur Schnitzler, 6. 8. 1889"@de ; ns1:P2has_type https://pfp-schema.acdh.oeaw.ac.at/types/tei-document . ```
development
pip install -r requirements_dev.txtflake8-> lintingcoverage run -m pytest-> runs tests and creates coverage stats
Owner
- Name: Austrian Centre for Digital Humanities & Cultural Heritage
- Login: acdh-oeaw
- Kind: organization
- Email: acdh@oeaw.ac.at
- Location: Vienna, Austria
- Website: https://www.oeaw.ac.at/acdh
- Repositories: 476
- Profile: https://github.com/acdh-oeaw
GitHub Events
Total
- Create event: 24
- Issues event: 25
- Release event: 12
- Delete event: 11
- Issue comment event: 1
- Push event: 47
- Pull request event: 20
Last Year
- Create event: 24
- Issues event: 25
- Release event: 12
- Delete event: 11
- Issue comment event: 1
- Push event: 47
- Pull request event: 20
Committers
Last synced: over 3 years ago
All Time
- Total Commits: 77
- Total Committers: 1
- Avg Commits per committer: 77.0
- Development Distribution Score (DDS): 0.0
Top Committers
| Name | Commits | |
|---|---|---|
| csae8092 | p****r@o****t | 77 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 53
- Total pull requests: 57
- Average time to close issues: 1 day
- Average time to close pull requests: 17 minutes
- Total issue authors: 4
- Total pull request authors: 3
- Average comments per issue: 0.17
- Average comments per pull request: 0.09
- Merged pull requests: 54
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 15
- Pull requests: 21
- Average time to close issues: 35 minutes
- Average time to close pull requests: 5 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.07
- Average comments per pull request: 0.0
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- csae8092 (45)
- linxOD (4)
- BOberreither (3)
- cfhaak (1)
Pull Request Authors
- csae8092 (49)
- linxOD (8)
- cfhaak (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 430 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 43
- Total maintainers: 2
pypi.org: acdh-cidoc-pyutils
Helper functions for the generation of CIDOC CRMish RDF (from XML/TEI data)
- Homepage: https://github.com/acdh-oeaw/acdh-cidoc-pyutils
- Documentation: https://acdh-cidoc-pyutils.readthedocs.io/
- License: MIT license
-
Latest release: 1.8.1
published over 1 year ago
Rankings
Maintainers (2)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- py-actions/flake8 v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- black * development
- coverage >=6.4.4,<7 development
- flake8 >=5.0.4,<6 development
- pytest >=7.1.3,<8 development
- wheel * development