org.corpus-tools
A Pepper module providing import functionality for SIL Fieldworks (FLEx) XML
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Keywords from Contributors
Repository
A Pepper module providing import functionality for SIL Fieldworks (FLEx) XML
Basic Info
- Host: GitHub
- Owner: sdruskat
- License: apache-2.0
- Language: Java
- Default Branch: master
- Homepage: http://corpus-tools.org
- Size: 412 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
FLExText Modules for the Pepper conversion framework for linguistic data
How to cite
If you publish research for which this software has been used, you are required to cite the software. The respective metadata can be found in the file CITATION.cff.
General information
Pepper is a conversion framework for linguistic data. pepperModules-FLExModules is a plugin for Pepper and provides an importer for FLEx XML, i.e., the XML export format from SIL Fieldworks Language Explorer. The format is used frequently for persisting language documentation data.
With the pepperModules-FLExModules importer, the data stored in FLEx XML interlinear text files can be transferred to another format. This way, the data can be re-used for other purposes (such as adding different annotation types), or visualized and analyzed, e.g., in ANNIS, a search and visualization platform for linguistic data. For a list of available format converters for Pepper, see the list of known Pepper modules.
Context
The development of pepperModules-ToolboxTextModules has been initiated in the MelaTAMP research project.
Requirements
Pepper >= 3.2.7
Usage
- Create a Pepper workflow
file for the
conversion, with the importer set to
FLExImporter. Configure #properties as needed. - Download Pepper, and run it with the workflow file.
Importer
Requirements, assumptions, behaviour
Annotation mapping
FLEx XML has features that necessitate a certain importer behaviour with regard to annotation namespace and names.
In Salt, the data model onto which data is mapped during import, annotations
can have a namespace, and a name. In FLEx XML, one and the same annotation
name, i.e., the 'type' of an <item> can be used on different levels, i.e.,
<phrase>, <word> or <morph>, etc. Additionally, an <item> also has a
'lang', so 3 attributes in FLEx XML (level, 'lang', 'item') must be
mapped onto 2 attributes in Salt annotations.
To preserve the level information of annotation during conversion, the
FLExImporter maps it by adding the container (node/edge) of the annotation
to a layer with the name of the level, i.e., phrase, word, and morph.
Annotations on the document (FLEx level interlinear-text) are being made
on the Salt document (SDocument), which itself cannot be added to a layer -
the layer is a node in an SDocument's graph. Instead, all annotations on the
document itself can be assumed to belong the interlinear-text level.
At the same time, the 'lang' information is recorded in the namespace of the Salt annotation.
Therefore, if clients such as exporters need to re-combine this information,
they need to retrieve language information from the namespace, and type
information from the name of the annotation, and the level of the annotation
from the layer name of the layer included in the set of layers which the
container of the annotation is a part of, or the information whether an
annotation is attached to an SDocument. The importer will create exactly one
layer for each level, which will be named phrase, word, morph (according
to the XML schema XSD file supplied by SIL, paragraphs cannot have annotations).
Properties
| Property | Description | Example | | |
|-------------------|----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| languageMap | A map with original 'lang' strings and the target strings the original should be changed to during conversion. | <property key="languageMap">ENGLISH=en,NORTH-AMBRYM=mmg</property> | | |
| typeMap | A map with original 'type' strings and the target strings the original should be changed to during conversion. | <property key="typeMap">txt=tx,gls=ge</property> | | |
| dropAnnotations | A list of annotations that should be ignored during conversion. Annotations are defined as {phrase\|word\|morph}::{language}:name, of which the layer (the first) and the language (the second) element are optional. languages is a reserved name and will drop all language meta annotations from the child elements of <languages/>. | <property key="dropAnnotations">languages,morph::en:hn,fr:gls,morph::dro,xxx</property> |
| annotationMap | A map whose keys are FLEx annotation and whose values are annotations they should be mapped to. | <property key="annotationMap">word::en:gls=ge,morph::en:gls=ps</property>|
One document per file
As FLExText files can contain n documents (corresponding to the XML element interlinear-text).
However, files with more than one interlinear-text element cannot currently
be processed by the FLExImporter.
Development workflow
The development workflow for this project uses
Gitflow and the
JGit-Flow Maven plugin, which
solves a lot of the headache provided by the
Maven Release Plugin,
e.g., SNAPSHOTs in the master branch.
Features
Features are developed as usual in feature branches and merged back onto
develop once they are finished.
Releases
Releases are tagged as such on GitHub and must be released to Maven Central.
This is done by running mvn jgitflow:release-start and
mvn jgitflow:release-finish on development. The JGit-Flow plugin takes
care of following the Gitflow workflow while performing a release to
Maven Central at the same time.
Note that the staged release will still have to be released manually through https://oss.sonatype.org/.
Add anything that's needed to the GitHub release, update the DOI in the README (prereserve on Zenodo), publish the GitHub release, and update the Zenodo release.
Javadoc Documentation
The Javadoc documentation can be found at https://sdruskat.github.io/pepperModules-FLExModules.
Owner
- Name: Stephan Druskat
- Login: sdruskat
- Kind: user
- Location: Berlin
- Company: German Aerospace Center (DLR)
- Website: http://sdruskat.net
- Twitter: stdruskat
- Repositories: 12
- Profile: https://github.com/sdruskat
Software Engineering PhD candidate @DLR-SC, Research Software Engineer (https://hexatomic.github.io)
Citation (CITATION.cff)
cff-version: 1.0.3
message: If you use FLExModules, please cite it as below.
authors:
- family-names: Druskat
given-names: Stephan
orcid: https://orcid.org/0000-0003-4925-7248
title: FLExModules
version: 1.0.0
doi: 10.5281/zenodo.1492292
date-released: 2018-11-20
GitHub Events
Total
Last Year
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Stephan Druskat | m****l@s****t | 128 |
| dependabot[bot] | 4****] | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 5
- Total pull requests: 2
- Average time to close issues: 6 days
- Average time to close pull requests: 1 minute
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.4
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sdruskat (5)
Pull Request Authors
- dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 8
repo1.maven.org: org.corpus-tools:pepperModules-FLExModules
A Pepper module providing an importer for FLEx XML.
- Homepage: https://github.com/sdruskat/pepperModules-FLExModules
- Documentation: https://appdoc.app/artifact/org.corpus-tools/pepperModules-FLExModules/
- License: Apache License, Version 2.0
-
Latest release: 1.0.8
published about 7 years ago
Rankings
Dependencies
- ch.qos.logback:logback-classic 1.2.13
- org.hamcrest:hamcrest-all 1.3 test