https://github.com/cdcgov/hl7-pet
Library to help with HL7 parsing, validation, transformation
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Keywords
Repository
Library to help with HL7 parsing, validation, transformation
Statistics
- Stars: 1
- Watchers: 4
- Forks: 4
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
HL7-PET
Introduction
This library is built in scala and the main purpose is to (P)arse, (E)xtract, (T)ransform and Validate HL7 v2 messages.
HL7 profiles can be defined in JSON and loaded to the appropriate classes to be used for better understanding.
By default, if not specified, the current code loads the PhinGuideProfile.json with the DefaultFieldsProfile.json
The validation is still a work in puprogress. But prime time, you can debatch messages and use the extractor to retrieve values from specific fields of a given HL7 message natively, i.e., without having to transform the raw pipe-delimited message into another object oriented format. Because its using scala under the hood, this approach seems to be working very effectively and can return values very fast.
Video tutorials
Several video tutorials are available: - Querying HL7-PET - Using Extraction 101 - Redacting Messages
Build Instructions
Prerequisites
- Java Development Kit (JDK) 11
- Scala (Version) 2.13.10
- SBT (Scala Build Tool)
- GitHub Personal Access Token
Installation
- Clone the repository: https://github.com/CDCgov/hl7-pet.git
- Create GITHUB_TOKEN from your personal GithHub Settings>Developer Settings->Personal Access Tokens>Generate New Token(with any scope), set it as environment variable and update the build.sbt file accordingly
- Build the project
- from sbt terminal run 'package' command for jar file to be generated
Extracting Values from a HL7 message.
For a code example on how to initialize the parser, look intot he ExampleApp under test/scala. that code loads a demonstration file and allows the user to type paths to retrieve values on the command line.
The path follows this schema: SEG[index]-FIELD[index].COMPONENT.SUBCOMPONENT
where:
- SEG is the three-letter semgent name. SEG is the only required piece of a valid path.
- Ex.: MSH, PID, OBR, OBX, PV1, etc...
- SEG[index] specifies a specific segment we are interested in, in case the segment is a repeating one.
Ex.:- OBR[1], retrieves the first occurrence of an OBR
- OBR[2], retrieves the second occurrence of an OBR.
segment index can have the format of a Filter criteria in the following format:
@Field.Component.subcomponent='VALUE' (Only @Field is required)
Ex.:
* OBR[@4.1='698775-3'], retrieves any OBR segment where the 1st component of the 4th field has a value of
698775-3.
* OBX[@4='2'], retrieves any OBX segment where the 4th field has a value of 2.
- FIELD is the number of the field we are interested in.
FIELD[index] is for repeating fields, where we are interested on a specific field repetition. This index only accepts a integer.
Ex.:- MSH[1]-12, retrieves the value of the 12th field of the MSH segment.
- MSH[1]-21[3], retrieves the value of the 3rd entry for MSH field 21 (Message Profile Identifier)
COMPONENT, specifies a specific component when the field can be divided further into components.
Ex.:- MSH[1]-21[3].1, retrieves the first component of the 3rd entry of MSH 21.
SUBCOMPONENT, specifies a specific subcomponent when the component can be further divided.
Note: The parser is very lenient in the sense that it tries its best to retrieve a value and not crash.
If the path is mispelled or does not follow the appropriate syntax, it usually simply returns an empty result.
Also, it always considers that a field has at least one compoenent and one subcomponent.
Ex.:
* MSH[1]-12[1].1.1 is the same as MSH-12, because there's only one MSH, field 12 is a non repeating
field and you can't break down field 12 into components and subcomponents. The first path is a
very specific path and in case MSH-12 has some weird values, or is repeated, that specification will
avoid retrieving other values.
Validation
HL7-PET can also perform structure validation. See #StructureValidator for more info. StructureValidatorTests runs a simple structure validation with the profiles provided (based on ORU 2.5.1 messages)
Structure validation cvan perform type, cardinality, usage and other types of structure validation.
Restrictions
Currently, HL7 pet has some restrictions:
* It only supports messages where MSH-1 is defined as a pipe ('|') and MSH-2 as ^~&
* The parser does not understand escaped characters as of the current version (1.2.2)
* Segment filters used for index only supports a single equality comparison to a single string.
Ex.: MSH[@21.1='VALUE'].
Other uses
This project holds several utility classes and applications to help with development and data preparation
Here's the list of available tools:
(More to Come, this file should be updated as new features become available.)
HL7FileUtils:
- this class provides several utilities to handle HL7 Messages- A method to split several MSH messages in one file into it's own files.
- A method to generate clean text files (No HL7 markup) from OBX fields.
DeIdentifier:
- This class provides functionality to replace entire lines within a HL7 message with a version that is de-identified.BatchValidator:
- This class can validate BATCH segments (FHS, BHS, BTS, FTS) to make sure they follow very specific rules needed for a specific project. If another project needs different rules for batch validation, This will have to be modified, or enhanced to be configuration driven. (Currently they are hardcoded)- A Batched message must contain all 4 Batching segments - FHS, BHS, BTS, FTS) and they can only be present once.
- The count of FTS-2 must be 1
- The count on BTS-2 must be the exact numbers of Messages on the entire batch
- FHS must be the first segment on the message
- BHS must be the second segment on the message
- BTS must be the penultimate segment on the message
- FTS must be the last segment on the message.
FileUtils:
Simple class to help read and write filesCSVReader:
Simple class to read comma-delimited files (or any delimiter for that matter)ConsoleProgress:
Class that shows dots in the console as feedback for the user that the app still working.
Owner
- Name: Centers for Disease Control and Prevention
- Login: CDCgov
- Kind: organization
- Email: data@cdc.gov
- Location: Atlanta, GA
- Website: http://open.cdc.gov/
- Twitter: CDCgov
- Repositories: 114
- Profile: https://github.com/CDCgov
CDC's collaborative software projects to protect America from health, safety, and security threats, both foreign and in the U.S.
GitHub Events
Total
- Delete event: 1
- Issue comment event: 5
- Push event: 5
- Pull request event: 1
- Pull request review event: 3
- Fork event: 2
- Create event: 1
Last Year
- Delete event: 1
- Issue comment event: 5
- Push event: 5
- Pull request event: 1
- Pull request review event: 3
- Fork event: 2
- Create event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Marcelo Caldas | m****1@c****v | 42 |
| Seun Soyinka | s****a@M****l | 6 |
| Seun0925 | 1****5 | 5 |
| TeaSmith7 | 1****7 | 3 |
| Mounika Pacha | 1****4 | 2 |
| Cosmin Potocean | c****x | 1 |
| Boris Ning | 4****s | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 12
- Average time to close issues: N/A
- Average time to close pull requests: 5 days
- Total issue authors: 0
- Total pull request authors: 5
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 8
- Average time to close issues: N/A
- Average time to close pull requests: 5 days
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 1.13
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- Seun0925 (8)
- TeaSmith7 (6)
- ucf4 (4)
- cosmycx (2)
- boris-ning-usds (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3.5.0 composite
- actions/setup-java v3.10.0 composite