Recent Releases of corpus_text_processor
corpus_text_processor - 1.0.14
Summary
This fixes a bug on Windows versions of this application where the Standardization processor couldn't process certain characters (Cyrillic) in UTF-8.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer about 3 years ago
corpus_text_processor - 1.0.13
Summary
This release adds a new option to the "Convert to plaintext" processor, allowing users to control whether or not duplicate line breaks should be removed. Some PDF creation tools add line breaks within paragraphs, and this option will remove those. However, for other files, preservation of line breaks may be desired.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 3 years ago
corpus_text_processor - 1.0.12
This release is currently under review. Do not use this release yet
This release fixes an issue where linefeed (LF) line breaks in text files created on the Windows operating system are processed with an extra carriage return (CR) during the "Standardization" operation.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 3 years ago
corpus_text_processor - 1.0.11
This release provides adds a new option to the "Standardization" process, allowing users to choose whether or not to remove non-English characters.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 3 years ago
corpus_text_processor - 1.0.10
This release provides additional support for converting some types of PDFs to plaintext. It includes updates to third-party packages pdf2image (1.9.0 -> 1.16.0), pdf2docx (0.5.2 -> 0.5.6), and PyPDF3 (1.0.1 -> 1.0.6).
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 3 years ago
corpus_text_processor - 1.0.9
Release notes
- This makes minor improvements to processing text files in the "Convert to Plaintext" operation.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 3 years ago
corpus_text_processor - 1.0.8
Release notes
- This updates a the version of pdf2docx to 0.5.2 due to issues with 0.5.1.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 4 years ago
corpus_text_processor - 1.0.4
Release notes
- Following up on the 1.0.3 release, this improves on issues with linebreaks in some contexts
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer over 4 years ago
corpus_text_processor - 1.0.3
Release notes
- Fixes issues with linebreaks in some PDFs
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer almost 5 years ago
corpus_text_processor - 1.0.2
Release notes
This release fixes an issue related to applications built on PySimpleGUI not working in MacOS BigSur, as reported in https://stackoverflow.com/questions/64818879/is-there-any-solution-regarding-to-pyqt-library-doesnt-work-in-mac-os-big-sur
Download
Trouble installing?
- See Mac Installation
- Python
Published by markfullmer almost 5 years ago
corpus_text_processor - 1.0.1
This release adds better output of skipped files.
Previously, files that had filetypes that did not match 'eligible' extensions were simply skipped and not counted in the total files listed in the debugging output.
Now, all files in the specified input directory are delineated and counted in the debugging output.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer almost 6 years ago
corpus_text_processor - v1.0-beta3
Summary
This release tests distribution of the package with a code signing certificate.
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer almost 6 years ago
corpus_text_processor - v1.0-beta2
Summary
This release adds support for processing .rtf file types, as well as minor help text changes:
- [closed] Add support for RTF file types #9
- [closed] Clarify processor descriptions #5
Download
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer almost 6 years ago
corpus_text_processor - Beta 1
Summary
This release includes all planned processors for the 1.0 release, namely, a plaintext converter for various filetypes, a UTF-8 encoder, a character standardizer, and a PDF metadata remover. Bugs may still exist in the code, but the functionality represents what will be included in the 1.0 release.
Trouble installing?
- See Mac Installation
- See Windows Installation
- Python
Published by markfullmer about 6 years ago
corpus_text_processor - Alpha 5 Release
Simplifies plaintext processor to accommodate Windows idiosyncrasies. This current supports docx, doc, html, and pdf file formats.
- Python
Published by markfullmer about 6 years ago
corpus_text_processor - Alpha 6 Release
Fixes issues with OS-specific limitations. Only python packages are sourced, thus making it distributable via pyinstallers. Supported file formats are .docx, .pdf, .html, .pptx, and .txt.
- Python
Published by markfullmer about 6 years ago
corpus_text_processor - Alpha 9 Release : Fix PDF parsing for Windows
Note: If you’re trying to use the MacOS.CorpusTextProcessor.pkg on a Mac & get an error message about an unidentified developer, see the instructions at https://docs.google.com/document/d/1Vl3L1LaaQHuNb5xhfy2ZN2utHGFBLfo7Q3tSPjXetf0/edit
- Python
Published by markfullmer over 6 years ago
corpus_text_processor - Alpha 8 Release : Add PDF Cleaner
- Adds ability to clean metadata from PDFs in bulk
- Python
Published by markfullmer over 6 years ago
corpus_text_processor - Alpha 7 Release
- Compatibility with Windows
- Allow for user input for output directory
- Python
Published by markfullmer over 6 years ago
corpus_text_processor - Alpha 4 Release
- Adds a plaintext processor!
- Python
Published by markfullmer over 6 years ago