PDF Entity Annotation Tool (PEAT)
PDF Entity Annotation Tool (PEAT) - Published in JOSS (2025)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Repository
Basic Info
- Host: GitHub
- Owner: USEPA
- License: other
- Language: JavaScript
- Default Branch: main
- Size: 57.4 MB
Statistics
- Stars: 4
- Watchers: 6
- Forks: 4
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
PEAT
PEAT (PDF Entity Annotation Tool) is a portable, standalone application built off the Electron software framework and can be used on all major operating systems (Windows, Linux, and Macintosh) and provides an interface for users to annotate PDFs.
PEAT was designed to take advantage of the latest advancements in PDF text extraction methods while also allowing the user to annotate and label the data directly in PDF format. This approach allows a user to work in a document structure they are familiar with, improving the user experience and facilitating the creation of labeled data for machine consumption and training of future machine learning models.
The application allows users to load PDFs directly from their file system along with data annotation forms with standard or customizable annotation types, labels, entities, and other features such as custom color highlighting. The application also includes features for users to edit and import/export data extraction schemas, export annotations of X and Y PDF coordinal structure (based on the image layer of the PDF), search and manipulate annotations, and save/load progress. Once a user has completed document annotation, the labeled data, full text, and all associated metadata is exportable in JSON format that can be processed by a variety of NLP model building applications such as Spacy or PyTorch.
Installation
Windows
A compiled Windows binary is created and added to every release in GitHub. 1. Download latest release from https://github.com/USEPA/peat/releases/. It should be zip file named `peat-windows.zip` 1. Unzip the zip file. ---  --- 1. Double click the PEAT installer to install the application 1. After installation, a `PEAT` shortcut should be added to your desktop and Start Menu. 1. Select the PDF and Schema (example files are available; [tags.json](https://github.com/USEPA/peat/blob/main/public/tags.json) and [test.pdf](https://github.com/USEPA/peat/blob/main/public/test.pdf)) and click _Load_  --- The application is installed in `%USERPROFILE%\AppData\Local\Programs\peat`. An uninstaller is also available in this location. You can also follow the instructions below to build the application from source.Linux/Mac
1. Clone the repo `git clone https://github.com/USEPA/peat.git` 2. Install the following prerequisites: * NodeJS: https://nodejs.org/ * Yarn: https://yarnpkg.com/ 3. In the PEAT directory run yarn to download the dependencies: `yarn` 4. Run the application: `yarn start` 5. To build a standalone application for you system run: `yarn package` This will create a release folder providing multiple application versions.Application Usage
Load PDF
1. Click _File_ in the menu bar and select _Load PDF_. ---  --- 2. Select the PDF file from your computer and click _Open_. ---  --- ---  --- ---Annotate PDF
1. Highlight text you wish to annotate and select _Add Annotation_. ---  --- --- 2. Select the annotation type. ---  --- --- 3. Hit save ---  --- --- ---  --- ---Save Annotations
1. Click _File_ in the menu bar and select _Save Annotations_. ---  --- --- 2. Select a save location on your computer and click _Save Annot File_. ---  --- ---Load Annotations
1. Click *File* in the menu bar and select *Load Annotations*. ---  --- --- 2. Select an annotation file and click _Open_ ---  --- --- ---  --- ---Delete Annotations
1\. Select annotation you wish to delete from the table in the side bar. ---  --- --- 2\. Click _Delete selected row_ button. ---  --- ---Edit Schema
1\. Click *Edit Schema* hyper-link ---  --- --- - Change existing entity - Click the text of any entity to edit that entities type. - Click the color selector to change the annotation color. - Click the trash can icon to delete that entity. - Add new entity type - Click Add Entity Type to add a new entity. - Save changes - Click the Save button.Auto Annotation
1\. Type word or phrase to be searched for in *Find in document* search bar ---  --- --- 2\. Using the arrows (Up or Down) a yellow highlight will cycle through matches found in the document. 3\. Select entity type from the dropdown box. ---  --- --- 4\. Click Annotate to add an annotation for the current selection. ---  --- ---Annotation Output
Example JSON
This is an annotated sample of exported annotation data. ```json5 { "text": "This is the text of the document", // Text version of the PDF file, contains full text of the document as a string. "relationships": [], // Not yet implimented, experimental feature for creating relational contructs between annotations. "schema": { // Schema used to annotate the document "annotation_types": [ { "id": "foo", // Unique ID "name": "foo", // Text name "color": "#ce11dd" // HTML display color } ], "relationship_types": [] // Not yet implimented }, "highlights": [ { "content": { "text": "text of the annotation" // Text of the annotation highlight as a string. }, "position": { // Bounding box position of the highlight within the PDF coordinates. "boundingRect": { "x1": 66.8515625, "y1": 250.1328125, "x2": 205.14230346679688, "y2": 263.1328125, "width": 763, "height": 1079.0995605399849 }, "rects": [ // Can have multiple rects if text spans lines { "x1": 66.8515625, "y1": 250.1328125, "x2": 205.14230346679688, "y2": 263.1328125, "width": 763, "height": 1079.0995605399849, "background": "#70f07b" // Highlight color } ], "pageNumber": 1 }, "comment": { "text": "foo", // Annotation ID "relationship": "", "begin": 267, // Offset coordinates within the document text. "end": 298 }, "userName": "your_username", "timestamp": 1710438139123, "id": "34752752411373633" // highlight ID } ] } ```Contributing
There are many ways you can contribute to PEAT, such as:
- Reporting bugs or suggesting enhancements
- Improving the documentation or the user interface
- Adding new features or functionalities
- Writing tests or fixing issues
- Reviewing or commenting on pull requests or issues
To get started, you will need to fork the PEAT repository and clone it to your local machine. You will also need to install Node.js and Yarn to run and build the application. Please follow the instructions in the README file for more details.
If you encounter a bug or have a suggestion for an enhancement, please open an issue on git. Please provide as much information as possible to help us reproduce and resolve the issue. Please also check if there are any existing issues or pull requests that are similar to yours before opening a new one.
Disclaimer
The United States Environmental Protection Agency (EPA) GitHub project code and binary files are provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
Owner
- Name: U.S. Environmental Protection Agency
- Login: USEPA
- Kind: organization
- Location: United States of America
- Website: https://www.epa.gov
- Twitter: EPA
- Repositories: 449
- Profile: https://github.com/USEPA
JOSS Publication
PDF Entity Annotation Tool (PEAT)
Authors
Tags
Python annotation text extraction pdfGitHub Events
Total
- Release event: 1
- Watch event: 4
- Delete event: 3
- Member event: 1
- Issue comment event: 7
- Push event: 12
- Pull request event: 12
- Pull request review event: 3
- Fork event: 3
- Create event: 4
Last Year
- Release event: 1
- Watch event: 4
- Delete event: 3
- Member event: 1
- Issue comment event: 7
- Push event: 12
- Pull request event: 12
- Pull request review event: 3
- Fork event: 3
- Create event: 4
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| chrstahl | s****g@o****v | 12 |
| Andy Shapiro | s****y@e****v | 4 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 17
- Average time to close issues: 23 days
- Average time to close pull requests: about 1 month
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 2.0
- Average comments per pull request: 0.94
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 12
- Average time to close issues: N/A
- Average time to close pull requests: 4 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.67
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- RuneBlaze (3)
Pull Request Authors
- chrstahl (14)
- shapiromatron (6)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- electron ^9.1.2 development
- electron-builder ^22.11.7 development
- foreman ^2.0.0 development
- jest-puppeteer ^4.4.0 development
- puppeteer ^2.0.0 development
- typescript ^3.9.7 development
- @eastdesire/jscolor ^2.4.5
- @popperjs/core ^2.9.2
- @testing-library/jest-dom ^4.2.4
- @testing-library/react ^9.3.2
- @testing-library/user-event ^7.1.2
- bootstrap ^4.6.0
- bootstrap-select ^1.13.18
- colorpicker ^2.0.0
- datatables ^1.10.18
- datatables.net ^1.11.4
- datatables.net-dt ^1.11.4
- http2 ^3.3.7
- jquery ^3.6.0
- jscolor ^0.3.0
- lodash ^4.17.10
- path ^0.12.7
- path-browserify ^1.0.1
- pdfjs-dist 2.2.228
- popper ^1.0.1
- popper.js ^1.16.1
- prop-types ^15.7.2
- react ^16.12.0
- react-dom ^16.12.0
- react-pointable ^1.1.1
- react-rnd ^7.1.5
- react-scripts *
- split.js ^1.6.4
- url-search-params ^1.1.0
- webpack-dev-middleware ^5.3.3
- datatables.net >=1.11.3
- jquery >=1.7
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/upload-artifact v4 composite
