Recent Releases of htmldate
htmldate - htmldate-1.9.3
- extraction: add heuristics (#173)
- maintenance: explicitly support Python 3.13 (#172)
- tests: better coverage (#175)
- docs: update images and contributing (#180)
Scientific Software - Peer-reviewed
- Python
Published by adbar 12 months ago
htmldate - htmldate-1.9.2
- maintenance: explicit re-export and code quality (#168)
- setup: remove pytest.ini (#167)
- update dependencies
Scientific Software - Peer-reviewed
- Python
Published by adbar about 1 year ago
htmldate - htmldate-1.9.1
- fix: more robust copyright parsing (#165)
- cleaning fix: safer element removal (2735620)
Scientific Software - Peer-reviewed
- Python
Published by adbar about 1 year ago
htmldate - htmldate-1.9.0
- focus on Python 3.8+, use pyproject.toml file and update setup (#150, #153, #160)
- revamp tests and evaluation (#151)
- simplify code parts (#152)
- docs: convert readme to markdown (#147)
Scientific Software - Peer-reviewed
- Python
Published by adbar over 1 year ago
htmldate - htmldate-1.8.1
- fix: more restrictive YYYYMM pattern to prevent ValueError with @b3n4kh (#145)
- maintenance: add pre-commit with checks with @nadasuhailAyesh12 (#142)
Scientific Software - Peer-reviewed
- Python
Published by adbar over 1 year ago
htmldate - htmldate-1.8.0
- change license to Apache 2.0 (#140)
- compile XPath expressions (#136)
- update docs with @EkaterineSheshelidze (#135)
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 2 years ago
htmldate - htmldate-1.7.0
- fix meta property updated vs. original behavior (#121)
- support for LXML version 5.0+ (#127)
- fix image links in Readme
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 2 years ago
htmldate - htmldate-1.6.1
- fix for MacOS: pin LXML dependency with @adamh-oai
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 2 years ago
htmldate - htmldate-1.6.0
- focus on precision, stricter extraction patterns (#103, #105, #106, #112)
- simplified code base (#108, #109)
- replaced lxml.html.Cleaner (#104)
- extended evaluation
Full Changelog: https://github.com/adbar/htmldate/compare/v1.5.2...v1.6.0
Scientific Software - Peer-reviewed
- Python
Published by adbar about 2 years ago
htmldate - htmldate-1.5.2
- fix for missing months keys in custom extractor (#100)
- fix for None in
try_date_expr()(#101)
Scientific Software - Peer-reviewed
- Python
Published by adbar about 2 years ago
htmldate - hmldate-1.5.1
- fix regression for fast extraction introduced in e8b3538 (#96)
- fix setup by making backports-datetime-fromisoformat optional (#95)
Scientific Software - Peer-reviewed
- Python
Published by adbar over 2 years ago
htmldate - htmldate-1.5.0
- slightly higher accuracy with revised heuristics
- simplified code structure for better performance
- setup: support for 3.12, fromisoformat backport if applicable
- HTML parsing fixes: more lenient parsing, pinned LXML version for MacOS
Scientific Software - Peer-reviewed
- Python
Published by adbar over 2 years ago
htmldate - htmldate-1.4.3
- maintenance release: upgrade
urllib3dependency
Scientific Software - Peer-reviewed
- Python
Published by adbar over 2 years ago
htmldate - htmldate-1.4.2
- support mindate/maxdate as datetimes or datetime strings with @kernc (#73)
- add date attributes to HTML extraction with @kernc (#74)
- fix for extraction of updated and original dates in time elements
- code refactoring and maintenance
Full Changelog: https://github.com/adbar/htmldate/compare/v1.4.1...v1.4.2
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 3 years ago
htmldate - htmldate-1.4.1
- better coverage of relevant HTML attributes
- automatically define upper time bound at each function call (#70)
- reviewed and simplified extraction code
- cache validation for format diverging from
%Y-%m-%d - updated dependencies and removed real-world tests from package
Full Changelog: https://github.com/adbar/htmldate/compare/v1.4.0...v1.4.1
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 3 years ago
htmldate - htmldate-1.4.0
- additional search of free text in whole document (#67)
- optional parameter for subdaily precision with @getorca (#66)
- fix for HTML doctype parsing (#44)
- cleaner code for multilingual month expressions
- extended expressions for extraction in HTML meta fields
- update of dependencies and evaluation
Scientific Software - Peer-reviewed
- Python
Published by adbar about 3 years ago
htmldate - htmldate-1.3.2
- technical release: explicit support for Python 3.11 and logo
Scientific Software - Peer-reviewed
- Python
Published by adbar about 3 years ago
htmldate - htmldate-1.3.1
- fix for use of
min_date&max_date(#62) - simplified code & updated setup
Scientific Software - Peer-reviewed
- Python
Published by adbar over 3 years ago
htmldate - htmldate-1.3.0
- Entirely type-checked code base
- New function
clear_caches()(#57) - Slightly more efficient code (about 5% faster)
Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.3...v1.3.0
Scientific Software - Peer-reviewed
- Python
Published by adbar over 3 years ago
htmldate - htmldate-1.2.3
- fix for memory leak (#56)
- docs updated
Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.2...v1.2.3
Scientific Software - Peer-reviewed
- Python
Published by adbar over 3 years ago
htmldate - htmldate-1.2.2
- slightly higher accuracy & faster extensive extraction
- maintenance: code base simplified, more tests
- bugs addressed: #51, #54
- docs: fix by @MSK1582
Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.1...v1.2.2
Scientific Software - Peer-reviewed
- Python
Published by adbar over 3 years ago
htmldate - htmldate-1.2.1
- speed and accuracy gains
- better extraction coverage, simpler code
- bug fixed (typo in variable)
Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.0...v1.2.1
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 4 years ago
htmldate - htmldate-1.2.0
- better performance
- remove unnecessary ciso8601 dependency
- temporary fix for scrapinghub/dateparser#1045 bug
Full Changelog: https://github.com/adbar/htmldate/compare/v1.1.1...v1.2.0
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 4 years ago
htmldate - htmldate-1.1.1
- bugfix: input encoding
- improved extraction coverage (#47) by @liulinlin90
Full Changelog: https://github.com/adbar/htmldate/compare/v1.1.0...v1.1.1
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 4 years ago
htmldate - htmldate-1.1.0
- better handling of file encodings
- slight increase in accuracy, more efficient code
Full Changelog: https://github.com/adbar/htmldate/compare/v1.0.1...v1.1.0
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 4 years ago
htmldate - htmldate-1.0.1
- maintenance release, code base cleaned
- command-line interface:
--versionadded - file parsing reviewed
Full Changelog: https://github.com/adbar/htmldate/compare/v1.0.0...v1.0.1
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 4 years ago
htmldate - htmldate-1.0.0
- faster and more accurate encoding detection
- simplified code base
- include support for Python 3.10 and dropped support for Python 3.5
Scientific Software - Peer-reviewed
- Python
Published by adbar about 4 years ago
htmldate - htmldate-0.9.1
- improved generic date parsing (thanks @RadhiFadlillah)
- specific support for French and Indonesian (thanks @RadhiFadlillah)
- additional evaluation for English news sites (kudos to @coreydockser & @rahulbot)
- bugs fixed
Scientific Software - Peer-reviewed
- Python
Published by adbar over 4 years ago
htmldate - htmldate-0.9.0
- improved exhaustive search
- simplified code
- bug fixes
- removed support for Python 3.4
Scientific Software - Peer-reviewed
- Python
Published by adbar over 4 years ago
htmldate - htmldate-0.8.1
- bugfixes
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 5 years ago
htmldate - htmldate-0.8.0
dateparserandregexmodules fully integrated- patterns added for coverage
- smarter HTML doc loading
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 5 years ago
htmldate - htmldate-0.7.3
- dependencies updated and reduced: switch from
requeststo bareurllib3, makechardetstandard andcchardetoptional - fixes: downloads,
OverflowErrorin extraction
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 5 years ago
htmldate - htmldate-0.7.2
- compatibility with Python 3.9
- better speed and accuracy
Scientific Software - Peer-reviewed
- Python
Published by adbar about 5 years ago
htmldate - htmldate-0.7.1
- technical release: package requirements and docs wording
Scientific Software - Peer-reviewed
- Python
Published by adbar over 5 years ago
htmldate - htmldate-0.7.0
- code base and performance improved
- minimum date available as option
- support for Turkish patterns and CMS idiosyncrasies (thanks @evolutionoftheuniverse)
Scientific Software - Peer-reviewed
- Python
Published by adbar over 5 years ago
htmldate - htmldate-0.6.3
- more efficient code
- additional evaluation data
Scientific Software - Peer-reviewed
- Python
Published by adbar over 5 years ago
htmldate - htmldate-0.6.2
Scientific Software - Peer-reviewed
- Python
Published by adbar over 5 years ago
htmldate - htmldate-0.6.1
htmldate finds original and updated publication dates of any web page. All the steps needed from web page download to HTML parsing, scraping and text analysis are included.
In a nutshell, with Python:
from htmldate import finddate finddate('http://blog.python.org/2016/12/python-360-is-now-available.html') '2016-12-23' finddate('https://netzpolitik.org/2016/die-cider-connection-abmahnungen-gegen-nutzer-von-creative-commons-bildern/', originaldate=True) '2016-06-23'
On the command-line:
$ htmldate -u http://blog.python.org/2016/12/python-360-is-now-available.html '2016-12-23'
Releases used in production and meant to be archived on Zenodo for reproducibility and citability.
For more information see htmldate.readthedocs.io
Scientific Software - Peer-reviewed
- Python
Published by adbar almost 6 years ago
htmldate - First stable release for Zenodo
First release used in production and meant to be archived on Zenodo for reproducibility and citability.
Scientific Software - Peer-reviewed
- Python
Published by adbar over 6 years ago