Recent Releases of htmldate

htmldate - htmldate-1.9.3

  • extraction: add heuristics (#173)
  • maintenance: explicitly support Python 3.13 (#172)
  • tests: better coverage (#175)
  • docs: update images and contributing (#180)

Scientific Software - Peer-reviewed - Python
Published by adbar 12 months ago

htmldate - htmldate-1.9.2

  • maintenance: explicit re-export and code quality (#168)
  • setup: remove pytest.ini (#167)
  • update dependencies

Scientific Software - Peer-reviewed - Python
Published by adbar about 1 year ago

htmldate - htmldate-1.9.1

  • fix: more robust copyright parsing (#165)
  • cleaning fix: safer element removal (2735620)

Scientific Software - Peer-reviewed - Python
Published by adbar about 1 year ago

htmldate - htmldate-1.9.0

  • focus on Python 3.8+, use pyproject.toml file and update setup (#150, #153, #160)
  • revamp tests and evaluation (#151)
  • simplify code parts (#152)
  • docs: convert readme to markdown (#147)

Scientific Software - Peer-reviewed - Python
Published by adbar over 1 year ago

htmldate - htmldate-1.8.1

  • fix: more restrictive YYYYMM pattern to prevent ValueError with @b3n4kh (#145)
  • maintenance: add pre-commit with checks with @nadasuhailAyesh12 (#142)

Scientific Software - Peer-reviewed - Python
Published by adbar over 1 year ago

htmldate - htmldate-1.8.0

  • change license to Apache 2.0 (#140)
  • compile XPath expressions (#136)
  • update docs with @EkaterineSheshelidze (#135)

Scientific Software - Peer-reviewed - Python
Published by adbar almost 2 years ago

htmldate - htmldate-1.7.0

  • fix meta property updated vs. original behavior (#121)
  • support for LXML version 5.0+ (#127)
  • fix image links in Readme

Scientific Software - Peer-reviewed - Python
Published by adbar almost 2 years ago

htmldate - htmldate-1.6.1

  • fix for MacOS: pin LXML dependency with @adamh-oai

Scientific Software - Peer-reviewed - Python
Published by adbar almost 2 years ago

htmldate - htmldate-1.6.0

  • focus on precision, stricter extraction patterns (#103, #105, #106, #112)
  • simplified code base (#108, #109)
  • replaced lxml.html.Cleaner (#104)
  • extended evaluation

Full Changelog: https://github.com/adbar/htmldate/compare/v1.5.2...v1.6.0

Scientific Software - Peer-reviewed - Python
Published by adbar about 2 years ago

htmldate - htmldate-1.5.2

  • fix for missing months keys in custom extractor (#100)
  • fix for None in try_date_expr() (#101)

Scientific Software - Peer-reviewed - Python
Published by adbar about 2 years ago

htmldate - hmldate-1.5.1

  • fix regression for fast extraction introduced in e8b3538 (#96)
  • fix setup by making backports-datetime-fromisoformat optional (#95)

Scientific Software - Peer-reviewed - Python
Published by adbar over 2 years ago

htmldate - htmldate-1.5.0

  • slightly higher accuracy with revised heuristics
  • simplified code structure for better performance
  • setup: support for 3.12, fromisoformat backport if applicable
  • HTML parsing fixes: more lenient parsing, pinned LXML version for MacOS

Scientific Software - Peer-reviewed - Python
Published by adbar over 2 years ago

htmldate - htmldate-1.4.3

  • maintenance release: upgrade urllib3 dependency

Scientific Software - Peer-reviewed - Python
Published by adbar over 2 years ago

htmldate - htmldate-1.4.2

  • support mindate/maxdate as datetimes or datetime strings with @kernc (#73)
  • add date attributes to HTML extraction with @kernc (#74)
  • fix for extraction of updated and original dates in time elements
  • code refactoring and maintenance

Full Changelog: https://github.com/adbar/htmldate/compare/v1.4.1...v1.4.2

Scientific Software - Peer-reviewed - Python
Published by adbar almost 3 years ago

htmldate - htmldate-1.4.1

  • better coverage of relevant HTML attributes
  • automatically define upper time bound at each function call (#70)
  • reviewed and simplified extraction code
  • cache validation for format diverging from %Y-%m-%d
  • updated dependencies and removed real-world tests from package

Full Changelog: https://github.com/adbar/htmldate/compare/v1.4.0...v1.4.1

Scientific Software - Peer-reviewed - Python
Published by adbar almost 3 years ago

htmldate - htmldate-1.4.0

  • additional search of free text in whole document (#67)
  • optional parameter for subdaily precision with @getorca (#66)
  • fix for HTML doctype parsing (#44)
  • cleaner code for multilingual month expressions
  • extended expressions for extraction in HTML meta fields
  • update of dependencies and evaluation

Scientific Software - Peer-reviewed - Python
Published by adbar about 3 years ago

htmldate - htmldate-1.3.2

  • technical release: explicit support for Python 3.11 and logo

Scientific Software - Peer-reviewed - Python
Published by adbar about 3 years ago

htmldate - htmldate-1.3.1

  • fix for use of min_date & max_date (#62)
  • simplified code & updated setup

Scientific Software - Peer-reviewed - Python
Published by adbar over 3 years ago

htmldate - htmldate-1.3.0

  • Entirely type-checked code base
  • New function clear_caches() (#57)
  • Slightly more efficient code (about 5% faster)

Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.3...v1.3.0

Scientific Software - Peer-reviewed - Python
Published by adbar over 3 years ago

htmldate - htmldate-1.2.3

  • fix for memory leak (#56)
  • docs updated

Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.2...v1.2.3

Scientific Software - Peer-reviewed - Python
Published by adbar over 3 years ago

htmldate - htmldate-1.2.2

  • slightly higher accuracy & faster extensive extraction
  • maintenance: code base simplified, more tests
  • bugs addressed: #51, #54
  • docs: fix by @MSK1582

Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.1...v1.2.2

Scientific Software - Peer-reviewed - Python
Published by adbar over 3 years ago

htmldate - htmldate-1.2.1

  • speed and accuracy gains
  • better extraction coverage, simpler code
  • bug fixed (typo in variable)

Full Changelog: https://github.com/adbar/htmldate/compare/v1.2.0...v1.2.1

Scientific Software - Peer-reviewed - Python
Published by adbar almost 4 years ago

htmldate - htmldate-1.2.0

  • better performance
  • remove unnecessary ciso8601 dependency
  • temporary fix for scrapinghub/dateparser#1045 bug

Full Changelog: https://github.com/adbar/htmldate/compare/v1.1.1...v1.2.0

Scientific Software - Peer-reviewed - Python
Published by adbar almost 4 years ago

htmldate - htmldate-1.1.1

  • bugfix: input encoding
  • improved extraction coverage (#47) by @liulinlin90

Full Changelog: https://github.com/adbar/htmldate/compare/v1.1.0...v1.1.1

Scientific Software - Peer-reviewed - Python
Published by adbar almost 4 years ago

htmldate - htmldate-1.1.0

  • better handling of file encodings
  • slight increase in accuracy, more efficient code

Full Changelog: https://github.com/adbar/htmldate/compare/v1.0.1...v1.1.0

Scientific Software - Peer-reviewed - Python
Published by adbar almost 4 years ago

htmldate - htmldate-1.0.1

  • maintenance release, code base cleaned
  • command-line interface: --version added
  • file parsing reviewed

Full Changelog: https://github.com/adbar/htmldate/compare/v1.0.0...v1.0.1

Scientific Software - Peer-reviewed - Python
Published by adbar almost 4 years ago

htmldate - htmldate-1.0.0

  • faster and more accurate encoding detection
  • simplified code base
  • include support for Python 3.10 and dropped support for Python 3.5

Scientific Software - Peer-reviewed - Python
Published by adbar about 4 years ago

htmldate - htmldate-0.9.1

  • improved generic date parsing (thanks @RadhiFadlillah)
  • specific support for French and Indonesian (thanks @RadhiFadlillah)
  • additional evaluation for English news sites (kudos to @coreydockser & @rahulbot)
  • bugs fixed

Scientific Software - Peer-reviewed - Python
Published by adbar over 4 years ago

htmldate - htmldate-0.9.0

  • improved exhaustive search
  • simplified code
  • bug fixes
  • removed support for Python 3.4

Scientific Software - Peer-reviewed - Python
Published by adbar over 4 years ago

htmldate - htmldate-0.8.1

  • bugfixes

Scientific Software - Peer-reviewed - Python
Published by adbar almost 5 years ago

htmldate - htmldate-0.8.0

  • dateparser and regex modules fully integrated
  • patterns added for coverage
  • smarter HTML doc loading

Scientific Software - Peer-reviewed - Python
Published by adbar almost 5 years ago

htmldate - htmldate-0.7.3

  • dependencies updated and reduced: switch from requests to bare urllib3, make chardet standard and cchardet optional
  • fixes: downloads, OverflowError in extraction

Scientific Software - Peer-reviewed - Python
Published by adbar almost 5 years ago

htmldate - htmldate-0.7.2

  • compatibility with Python 3.9
  • better speed and accuracy

Scientific Software - Peer-reviewed - Python
Published by adbar about 5 years ago

htmldate - htmldate-0.7.1

  • technical release: package requirements and docs wording

Scientific Software - Peer-reviewed - Python
Published by adbar over 5 years ago

htmldate - htmldate-0.7.0

  • code base and performance improved
  • minimum date available as option
  • support for Turkish patterns and CMS idiosyncrasies (thanks @evolutionoftheuniverse)

Scientific Software - Peer-reviewed - Python
Published by adbar over 5 years ago

htmldate - htmldate-0.6.3

  • more efficient code
  • additional evaluation data

Scientific Software - Peer-reviewed - Python
Published by adbar over 5 years ago

htmldate - htmldate-0.6.2

Scientific Software - Peer-reviewed - Python
Published by adbar over 5 years ago

htmldate - htmldate-0.6.1

htmldate finds original and updated publication dates of any web page. All the steps needed from web page download to HTML parsing, scraping and text analysis are included.

In a nutshell, with Python:

from htmldate import finddate finddate('http://blog.python.org/2016/12/python-360-is-now-available.html') '2016-12-23' finddate('https://netzpolitik.org/2016/die-cider-connection-abmahnungen-gegen-nutzer-von-creative-commons-bildern/', originaldate=True) '2016-06-23'

On the command-line:

$ htmldate -u http://blog.python.org/2016/12/python-360-is-now-available.html '2016-12-23'

Releases used in production and meant to be archived on Zenodo for reproducibility and citability.

For more information see htmldate.readthedocs.io

Scientific Software - Peer-reviewed - Python
Published by adbar almost 6 years ago

htmldate - First stable release for Zenodo

First release used in production and meant to be archived on Zenodo for reproducibility and citability.

Scientific Software - Peer-reviewed - Python
Published by adbar over 6 years ago