Recent Releases of pyparsing
pyparsing - Pyparsing 3.2.3
- Fixed bug released in 3.2.2 in which
nested_exprcould overwrite parse actions for defined content, and could truncate list of items within a nested list. Fixes Issue #600, reported by hoxbro and luisglft, with helpful diag logs and repro code.
- Python
Published by ptmcg 11 months ago
pyparsing - Pyparsing 3.2.2
The upcoming version 3.3.0 release will begin emitting DeprecationWarnings for pyparsing methods that have been renamed to PEP8-compliant names (introduced in pyparsing 3.0.0, in August, 2021, with legacy names retained as aliases). In preparation, I have added in pyparsing 3.2.2 a utility for finding and replacing the legacy method names with the new names. This utility is located at pyparsing/tools/cvt_pep8_names.py. This script will scan all Python files specified on the command line, and if the -u option is selected, will replace all occurrences of the old method names with the new PEP8-compliant names, updating the files in place.
Here is an example that converts all the files in the pyparsing /examples directory:
python -m pyparsing.tools.cvt_pyparsing_pep8_names -u examples/*.py
The new names are compatible with pyparsing versions 3.0.0 and later.
Released
cvt_pyparsing_pep8_names.pyconversion utility to upgrade pyparsing-based programs and libraries that use legacy camelCase names to use the new PEP8-compliant snake_case method names. The converter can also be imported into other scripts asfrom pyparsing.tools.cvt_pyparsing_pep8_names import pep8_converterFixed bug in
nested_exprwhere nested contents were stripped of whitespace when the default whitespace characters were cleared (raised in this StackOverflow question https://stackoverflow.com/questions/79327649 by Ben Alan). Also addressed bug in resolving PEP8 compliant argument name and legacy argument name.Fixed bug in
rest_of_lineand the underlyingRegexclass, in which matching a pattern that could match an empty string (such as".*"or"[A-Z]*"would not raise aParseExceptionat or beyond the end of the input string. This could cause an infinite parsing loop when parsingrest_of_lineat the end of the input string. Reported by user Kylotan, thanks! (Issue #593)Enhancements and extra input validation for
pyparsing.util.make_compressed_re- see usage inexamples/complex_chemical_formulas.pyand result in the generated railroad diagramexamples/complex_chemical_formulas_diagram.html. Properly escapes characters like "." and "*" that have special meaning in regular expressions.Fixed bug in
one_of()to properly escape characters that are regular expression markers (such as '*', '+', '?', etc.) before building the internal regex.Better exception message for
MatchFirstandOrexpressions, showing all alternatives rather than just the first one. Fixes Issue #592, reported by Focke, thanks!Added return type annotation of "-> None" for all
__init__()methods, to satisfymypy --stricttype checking. PR submitted by FeRD, thank you!Added optional argument
show_hiddentocreate_diagramto show elements that are used internally by pyparsing, but are not part of the actual parser grammar. For instance, theTagclass can insert values into the parsed results but it does not actually parse any input, so by default it is not included in a railroad diagram. By callingcreate_diagramwithshow_hidden=True, these internal elements will be included. (You can see this in the tag_metadata.py script in the examples directory.)Fixed bug in
number_words.pyexample. Also addedebnf_number_words.pyto demonstrate using theebnf.pyEBNF parser generator to build a similar parser directly from EBNF.Fixed syntax warning raised in
bigquery_view_parser.py, invalid escape sequence "\s". Reported by sameer-google, nice catch! (Issue #598)Added support for Python 3.14.
- Python
Published by ptmcg 11 months ago
pyparsing - Pyparsing 3.2.1
Updated generated railroad diagrams to make non-terminal elements links to their related sub-diagrams. This greatly improves navigation of the diagram, especially for large, complex parsers.
Simplified railroad diagrams emitted for parsers using
infix_notation, by hiding lookahead terms. Renamed internally generated expressions for clarity, and improved diagramming.Improved performance of
cpp_style_comment,c_style_comment,common.fnumberandcommon.ieee_floatRegex expressions. PRs submitted by Gabriel Gerlero, nice work, thanks!Add missing type annotations to
match_only_at_col,replace_with,remove_quotes,with_attribute, andwith_class. Issue #585 reported by rafrafrek.Added generated diagrams for many of the examples.
Replaced old examples/0README.html file with examples/README.md file.
- Python
Published by ptmcg about 1 year ago
pyparsing - pyparsing 3.2.0
Version 3.2.0 - October, 2024
Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the
typingmodule (e.g.,list[str]vsList[str]). - Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts (including removal of uses of
OrderedDict). - Changed
pdb.set_trace()call inParserElement.set_break()tobreakpoint(). - Converted
typing.NamedTupletodataclasses.dataclassin railroad diagramming code. - Added
from __future__ import annotationsto clean up some type annotations. (with assistance from ISyncWithFoo, issue #535, thanks for the help!)
- Updated type annotations to use built-in container types instead of names imported from the
POSSIBLE BREAKING CHANGES
The following bugfixes may result in subtle changes in the results returned or exceptions raised by pyparsing.
- Fixed code in
ParseElementEnhancesubclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.
If your code has conditional logic based on the message content in raised
ParseExceptions, this bugfix may require changes in your code. - Fixed bug intransform_string()where whitespace in the input string was not properly preserved in the output string.If your code uses
transform_string, this bugfix may require changes in your code. - Fixed bug where anIndexErrorraised in a parse action was incorrectly handled as anIndexErrorraised as part of theParserElementparsing methods, and reraised as aParseException. Now anIndexErrorthat raises inside a parse action will properly propagate out as anIndexError. (Issue #573, reported by August Karlstedt, thanks!)If your code raises
IndexErrors in parse actions, this bugfix may require changes in your code.- Fixed code in
FIXES AND NEW FEATURES
- Added type annotations to remainder of
pyparsingpackage, and addedmypyrun totox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks! Exception message format can now be customized, by overriding
ParseBaseException.format_message:def customexceptionmessage(exc) -> str: foundphrase = f", found {exc.found}" if exc.found else "" return f"{exc.lineno}:{exc.column} {exc.msg}{foundphrase}"
ParseBaseException.formattedmessage = customexception_message
(PR #571 submitted by Odysseyas Krystalakos, nice work!) -
run_testsnow detects if an exception is raised in a parse action, and will report it with an enhanced error message, with the exception type, string, and parse action name. -QuotedStringnow handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters. - Fixed the displayed output ofRegexterms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams. - Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton! - Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element. - Defined a more performant regular expression used internally bycommon_html_entity. -Regexinstances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser. - Added optionalflattenBoolean argument toParseResults.as_list(), to return the parsed values in a flattened list. - Addedindentandbase_1arguments topyparsing.testing.with_line_numbers. When usingwith_line_numbersinside a parse action, setbase_1=False, since the reportedlocvalue is 0-based.indentcan be a leading string (typically of spaces or tabs) to indent the numbered string passed towith_line_numbers. Added while working on #557, reported by Bernd Wechner.- Added type annotations to remainder of
NEW/ENHANCED EXAMPLES
- Added query syntax to
mongodb_query_expression.pywith:- better support for array fields ("contains", "contains all", "contains any", and "contains none")
- "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching
- text search using "search for"
- dates and datetimes as query values
a[0]style array referencing
- Added
lox_parser.pyexample, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth. - Added
complex_chemical_formulas.pyexample, to add parsing capability for formulas such as "3(C₆H₅OH)₂". - Updated
tag_emitter.pyto use newTagclass, introduced in pyparsing 3.1.3.
- Added query syntax to
- Python
Published by ptmcg over 1 year ago
pyparsing - pyparsing 3.2.0rc1
Changes since 3.2.0b3:
- Fixed handling of IndexError raised in a parse action.
- QuotedString parser now handles \xnn, \ooo, and \unnnn characters when convert_whitespace_escapes is True.
- Reformatted CHANGES file for final release.
All changes in 3.2.0:
Version 3.2.0 - October, 2024
Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the
typingmodule (e.g.,list[str]vsList[str]). - Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts (including removal of uses of
OrderedDict). - Changed
pdb.set_trace()call inParserElement.set_break()tobreakpoint(). - Converted
typing.NamedTupletodataclasses.dataclassin railroad diagramming code. - Added
from __future__ import annotationsto clean up some type annotations. (with assistance from ISyncWithFoo, issue #535, thanks for the help!)
- Updated type annotations to use built-in container types instead of names imported from the
POSSIBLE BREAKING CHANGES
The following bugfixes may result in subtle changes in the results returned or exceptions raised by pyparsing.
- Fixed code in
ParseElementEnhancesubclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.
If your code has conditional logic based on the message content in raised
ParseExceptions, this bugfix may require changes in your code. - Fixed bug intransform_string()where whitespace in the input string was not properly preserved in the output string.If your code uses
transform_string, this bugfix may require changes in your code. - Fixed bug where anIndexErrorraised in a parse action was incorrectly handled as anIndexErrorraised as part of theParserElementparsing methods, and reraised as aParseException. Now anIndexErrorthat raises inside a parse action will properly propagate out as anIndexError. (Issue #573, reported by August Karlstedt, thanks!)If your code raises
IndexErrors in parse actions, this bugfix may require changes in your code.- Fixed code in
FIXES AND NEW FEATURES
- Added type annotations to remainder of
pyparsingpackage, and addedmypyrun totox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks! Exception message format can now be customized, by overriding
ParseBaseException.format_message:def customexceptionmessage(exc) -> str: foundphrase = f", found {exc.found}" if exc.found else "" return f"{exc.lineno}:{exc.column} {exc.msg}{foundphrase}"
ParseBaseException.formattedmessage = customexception_message
(PR #571 submitted by Odysseyas Krystalakos, nice work!) -
run_testsnow detects if an exception is raised in a parse action, and will report it with an enhanced error message, with the exception type, string, and parse action name. -QuotedStringnow handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters. - Fixed the displayed output ofRegexterms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams. - Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton! - Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element. - Defined a more performant regular expression used internally bycommon_html_entity. -Regexinstances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser. - Added optionalflattenBoolean argument toParseResults.as_list(), to return the parsed values in a flattened list. - Addedindentandbase_1arguments topyparsing.testing.with_line_numbers. When usingwith_line_numbersinside a parse action, setbase_1=False, since the reportedlocvalue is 0-based.indentcan be a leading string (typically of spaces or tabs) to indent the numbered string passed towith_line_numbers. Added while working on #557, reported by Bernd Wechner.- Added type annotations to remainder of
NEW/ENHANCED EXAMPLES
- Added query syntax to
mongodb_query_expression.pywith:- better support for array fields ("contains", "contains all", "contains any", and "contains none")
- "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching
- text search using "search for"
- dates and datetimes as query values
a[0]style array referencing
- Added
lox_parser.pyexample, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth. - Added
complex_chemical_formulas.pyexample, to add parsing capability for formulas such as "3(C₆H₅OH)₂". - Updated
tag_emitter.pyto use newTagclass, introduced in pyparsing 3.1.3.
- Added query syntax to
- Python
Published by ptmcg over 1 year ago
pyparsing - pyparsing 3.2.0b3
(This is the final beta release before 3.2.0.)
QuotedStringnow handles translation of escaped integer, hex, octal, and Unicode sequences to their corresponding characters.
- Python
Published by ptmcg over 1 year ago
pyparsing - pyparsing 3.2.0b2
Added type annotations to remainder of
pyparsingpackage, and addedmypyrun totox.ini, so that type annotations are now run as part of pyparsing's CI. Addresses Issue #373, raised by Iwan Aucamp, thanks!Exception message format can now be customized, by overriding
ParseBaseException.format_message:def customexceptionmessage(exc) -> str: foundphrase = f", found {exc.found}" if exc.found else "" return f"{exc.lineno}:{exc.column} {exc.msg}{foundphrase}"
ParseBaseException.formattedmessage = customexception_message
(PR #571 submitted by Odysseyas Krystalakos, nice work!)
- POSSIBLE BREAKING CHANGE: Fixed bug in
transform_string()where whitespace in the input string was not properly preserved in the output string.
If your code uses transform_string, this bugfix may require changes in your code.
Fixed railroad diagrams that get generated with a parser containing a Regex element defined using a verbose pattern - the pattern gets flattened and comments removed before creating the corresponding diagram element.
Defined a more performant regular expression used internally by
common_html_entity.Regexinstances can now be created using a callable that takes no arguments and just returns a string or a compiled regular expression, so that creating complex regular expression patterns can be deferred until they are actually used for the first time in the parser.Added optional
flattenBoolean argument toParseResults.as_list(), to return the parsed values in a flattened list.
- Python
Published by ptmcg over 1 year ago
pyparsing - Pyparsing 3.2.0b1
Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names imported from the
typingmodule (e.g.,list[str]vsList[str]). - Reworked portions of the packrat cache to leverage insertion-preserving ordering in dicts.
- Changed
pdb.set_trace()call inParserElement.set_break()tobreakpoint(). - Converted
typing.NamedTupletodataclasses.dataclassin railroad diagramming code. - Added
from __future__ import annotationsto clean up some type annotations.
- Updated type annotations to use built-in container types instead of names imported from the
POSSIBLE BREAKING CHANGE: Fixed code in
ParseElementEnhancesubclasses that replaced detailed exception messages raised in contained expressions with a less-specific and less-informative generic exception message and location.
If your code has conditional logic based on the message content in raised ParseExceptions, this bugfix may require changes in your code.
Fixed the displayed output of
Regexterms to deduplicate repeated backslashes, for easier reading in debugging, printing, and railroad diagrams.Fixed (or at least reduced) elusive bug when generating railroad diagrams, where some diagram elements were just empty blocks. Fix submitted by RoDuth, thanks a ton!
Added
indentandbase_1arguments topyparsing.testing.with_line_numbers. When usingwith_line_numbersinside a parse action, setbase_1=False, since the reportedlocvalue is 0-based.indentcan be a leading string (typically of spaces or tabs) to indent the numbered string passed towith_line_numbers. Added while working on #557, reported by Bernd Wechner.Added query syntax to
mongodb_query_expression.pywith better support for array fields ("contains", "contains all", "contains any", and "contains none"); and "like" and "not like" operators to support SQL "%" wildcard matching and "=~" operator to support regex matching. Also:- added support for dates and datetimes as query values
- added support for
a[0]style array referencing
Added
lox_parser.pyexample, a parser for the Lox language used as a tutorial in Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). With helpful corrections from RoDuth.Added
complex_chemical_formulas.pyexample, to add parsing capability for formulas such as "3(C₆H₅OH)₂".
- Python
Published by ptmcg over 1 year ago
pyparsing - Pyparsing 3.1.4
- Fixed a regression introduced in pyparsing 3.1.3, addition of a type annotation that referenced
re.Pattern. Since this type was introduced in Python 3.7, using this type definition broke Python 3.6 installs of pyparsing 3.1.3. PR submitted by Felix Fontein, nice work!
- Python
Published by ptmcg over 1 year ago
pyparsing - Pyparsing 3.1.3
- Added new
TagParserElement, for inserting metadata into the parsed results. This allows a parser to add metadata or annotations to the parsed tokens. TheTagelement also accepts an optionalvalueparameter, defaulting toTrue. See the newtag_metadata.pyexample in theexamplesdirectory.
Example:
# add tag indicating mood
end_punc = "." | ("!" + Tag("enthusiastic")))
greeting = "Hello" + Word(alphas) + end_punc
result = greeting.parse_string("Hello World.")
print(result.dump())
result = greeting.parse_string("Hello World!")
print(result.dump())
prints:
['Hello', 'World', '.']
['Hello', 'World', '!']
- enthusiastic: True
Added example
mongodb_query_expression.py, to convert human-readable infix query expressions (such asa==100 and b>=200) and transform them into the equivalent query argument for the pymongo package ({'$and': [{'a': 100}, {'b': {'$gte': 200}}]}). Supports many equality and inequality operators - see the docstring for thetransform_queryfunction for more examples.Fixed issue where PEP8 compatibility names for
ParserElementstatic methods were not themselves defined asstaticmethods. When called using aParserElementinstance, this resulted in aTypeErrorexception. Reported by eylenburg (#548).To address a compatibility issue in RDFLib, added a property setter for the
ParserElement.nameproperty, to callParserElement.set_name.Modified
ParserElement.set_name()to accept a None value, to clear the defined name and corresponding error message for aParserElement.Updated railroad diagram generation for
ZeroOrMoreandOneOrMoreexpressions withstop_onexpressions, while investigating #558, reported by user Gu_f.Added
<META>tag to HTML generated for railroad diagrams to force UTF-8 encoding with older browsers, to better display Unicode parser characters.Fixed some cosmetics/bugs in railroad diagrams:
- fixed groups being shown even when
show_groups=False - show results names as quoted strings when
show_results_names=True - only use integer loop counter if repetition > 2
- fixed groups being shown even when
Some type annotations added for parse action related methods, thanks August Karlstedt (#551).
Added exception type to
trace_parse_actionexception output, while investigating SO question posted by medihack.Added
set_namecalls to internal expressions generated ininfix_notation, for improved railroad diagramming.delta_time,lua_parser,decaf_parser, androman_numeralsexamples cleaned up to use latest PEP8 names and add minor enhancements.Fixed bug (and corresponding test code) in
delta_timeexample that did not handle weekday references in time expressions (like "Monday at 4pm") when the weekday was the same as the current weekday.Minor performance speedup in
trim_arity, to benefit any parsers using parse actions.Added early testing support for Python 3.13 with JIT enabled.
- Python
Published by ptmcg over 1 year ago
pyparsing - Pyparsing 3.1.2
Support for Python 3.13.
Added
ieee_floatexpression topyparsing.common, which parses float values, plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).Updated pep8 synonym wrappers for better type checking compatibility. PR submitted by Ricardo Coccioli (#507).
Fixed empty error message bug, PR submitted by InSync (#534). This should return pyparsing's exception messages to a former, more helpful form. If you have code that parses the exception messages returned by pyparsing, this may require some code changes.
Added unit tests to test for exception message contents, with enhancement to
pyparsing.testing.assertRaisesParseExceptionto accept an expected exception message.Updated example
select_parser.pyto use PEP8 names and added Groups for better retrieval of parsed values from multiple SELECT clauses.Added example
email_address_parser.py, as suggested by John Byrd (#539).Added example
directx_x_file_parser.pyto parse DirectX template definitions, and generate a Pyparsing parser from a template to parse .x files.Some code refactoring to reduce code nesting, PRs submitted by InSync.
All internal string expressions using '%' string interpolation and
str.format()converted to f-strings.
- Python
Published by ptmcg almost 2 years ago
pyparsing - Pyparsing 3.1.1
Fixed regression in
Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)Fixed bug in bad exception messages raised by
Forwardexpressions. PR submitted by Kyle Sunden, thanks for your patience and collaboration on this (#493).Fixed regression in
SkipTo, where ignored expressions were not checked when looking for the target expression. Reported by catcombo, Issue #500.Fixed type annotation for
enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
- Python
Published by ptmcg over 2 years ago
pyparsing - Pyparsing 3.1.0
NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString) will start to raise DeprecationWarnings. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.
Version 3.1.0 - June, 2023
API CHANGES
A slight change has been implemented when unquoting a quoted string parsed using the
QuotedStringclass. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '\' escaping would be done on the resulting string. This would parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks!Reworked
delimited_listfunction into the newDelimitedListclass.DelimitedListhas the same constructor interface asdelimited_list, and in this release,delimited_listchanges from a function to a synonym forDelimitedList.delimited_listand the olderdelimitedListmethod will be deprecated in a future release, in favor ofDelimitedList.ParserElement.validate()is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such asParserElement.set_debug()andParserElement.create_diagram(). (Raised in Issue #444, thanks Andrea Micheli!)
NEW FEATURES AND ENHANCEMENTS
Optional(expr)may now be written asexpr | ""
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
- Literal("") now internally generates an Empty() (and no longer raises an exception)
- Empty is now a subclass of Literal
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
- Added new class method
ParserElement.using_each, to simplify code that creates a sequence ofLiterals,Keywords, or otherParserElementsubclasses.
For instance, to define suppressible punctuation, you would previously write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:
algebra_var = MatchFirst(
Char.using_each(string.ascii_lowercase, as_keyword=True)
)
Added new builtin
python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.)Extended
expr[]notation for repetition ofexprto accept a slice, where the slice's stop value indicates astop_onexpression:test = "BEGIN aaa bbb ccc END" BEGIN, END = Keyword.usingeach("BEGIN END".split()) bodyword = Word(alphas)
expr = BEGIN + Group(bodyword[...:END]) + END # equivalent to # expr = BEGIN + Group(ZeroOrMore(bodyword, stop_on=END)) + END
print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
Added named field "url" to
pyparsing.common.url, returning the entire parsed URL string.Added bool
embedargument toParserElement.create_diagram(). When passed as True, the resulting diagram will omit the<DOCTYPE>,<HEAD>, and<BODY>tags so that it can be embedded in other HTML source. (Useful when embedding a call tocreate_diagram()in a PyScript HTML page.)Added
recurseargument toParserElement.set_debugto set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399.Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
ParseResultsnow has a new methoddeepcopy(), in addition to the currentcopy()method.copy()only makes a shallow copy - any containedParseResultsare copied as references - changes in the copy will be seen as changes in the original. In many cases, a shallow copy is sufficient, but some applications require a deep copy.deepcopy()makes a deeper copy: any containedParseResultsor other mappings or containers are built with copies from the original, and do not get changed if the original is later changed. Addresses issue #463, reported by Bryn Pickering.Added new class property
identifierto all Unicode set classes inpyparsing.unicode, using the class's values forcls.identcharsandcls.identbodychars. Now Unicode-aware parsers that formerly wrote:ppu = pyparsing.unicode ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier
# or
# ident = ppu.Ελληνικά.identifier
- Error messages from
MatchFirstandOrexpressions will try to give more details if one of the alternatives matches better than the others, but still fails. Question raised in Issue #464 by msdemlei, thanks!
BUG FIXES AND GENERAL CHANGES
Added support for Python 3.12.
Updated
ci.ymlpermissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much!Updated
create_diagram()code to be compatible withrailroad-diagramspackage version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short.Fixed bug in
NotAny, where parse actions on the negated expr were not being run. This could causeNotAnyto incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you!Fixed
create_diagram()to accept keyword args, to be passed through to thetemplate.render()method to generate the output HTML (PR submitted by Aussie Schnore, good catch!)Fixed bug in
python_quoted_stringregex.Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:
expr = Literal("X").addparseaction(lambda tokens: "")("value") result = expr.parse_string("X") print(result["value"])
would raise a KeyError. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you.
Fixed bug in
SkipTowhere ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!).Fixed bug in
Wordwhenmax=2. Also added performance enhancement when specifyingexactargument. Reported in issue #409 by panda-34, nice catch!Wordarguments are now validated ifminandmaxare both given, thatmin<=max; raisesValueErrorif values are invalid.Fixed bug in srange, when parsing escaped '/' and '\' inside a range set.
Fixed exception messages for some
ParserElementswith custom names, which instead showed their contained expression names.Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.
Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!
Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.
General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!
EXAMPLE UPDATES
Added
tag_emitter.pyto examples. This example demonstrates how to insert tags into your parsed results that are not part of the original parsed text.Added
bf.pyBrainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.invRegex.pyexample renamed toinv_regex.pyand updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!Removed examples
sparser.pyandpymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.Updated the
lucene_grammar.pyexample (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
- Python
Published by ptmcg over 2 years ago
pyparsing - Pyparsing 3.1.0b2
Updated
create_diagram()code to be compatible with railroad-diagrams package version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short.Fixed bug in
NotAny, where parse actions on the negated expr were not being run. This could causeNotAnyto incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you!Fixed
create_diagram()to accept keyword args, to be passed through to thetemplate.render()method to generate the output HTML (PR submitted by Aussie Schnore, good catch!)Fixed bug in
python_quoted_stringregex.Added
examples/bf.pyBrainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.
- Python
Published by ptmcg almost 3 years ago
pyparsing - Pyparsing 3.1.0b1
Added support for Python 3.12.
API CHANGE: A slight change has been implemented when unquoting a quoted string parsed using the QuotedString class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '\' escaping would be done on the resulting string. This would parse "\n" as "<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks!
Added named field "url" to pyparsing.common.url, returning the entire parsed URL string.
Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:
expr = Literal("X").addparseaction(lambda tokens: "")("value") result = expr.parse_string("X") print(result["value"])
would raise a KeyError. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you.
Fixed bug in
SkipTowhere ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!).Updated ci.yml permissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much!
Updated the lucene_grammar.py example (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
- Python
Published by ptmcg almost 3 years ago
pyparsing - Pyparsing 3.1.0a1
NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString) will start to raise DeprecationWarnings. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.
- API ENHANCEMENT:
Optional(expr)may now be written asexpr | ""
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
- Literal("") now internally generates an Empty() (and no longer raises an exception)
- Empty is now a subclass of Literal
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
Added new class property
identifierto all Unicode set classes inpyparsing.unicode, using the class's values forcls.identcharsandcls.identbodychars. Now Unicode-aware parsers that formerly wrote:ppu = pyparsing.unicode ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier
# or
# ident = ppu.Ελληνικά.identifier
Reworked
delimited_listfunction into the newDelimitedListclass.DelimitedListhas the same constructor interface asdelimited_list, and in this release,delimited_listchanges from a function to a synonym forDelimitedList.delimited_listand the olderdelimitedListmethod will be deprecated in a future release, in favor ofDelimitedList.Added new class method
ParserElement.using_each, to simplify code that creates a sequence ofLiterals,Keywords, or otherParserElementsubclasses.
For instance, to define suppressable punctuation, you would previously write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:
algebra_var = MatchFirst(
Char.using_each(string.ascii_lowercase, as_keyword=True)
)
Added new builtin
python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.)Extended
expr[]notation for repetition ofexprto accept a slice, where the slice's stop value indicates astop_onexpression:test = "BEGIN aaa bbb ccc END" BEGIN, END = Keyword.usingeach("BEGIN END".split()) bodyword = Word(alphas)
expr = BEGIN + Group(bodyword[:END]) + END # equivalent to # expr = BEGIN + Group(ZeroOrMore(bodyword, stop_on=END)) + END
print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
ParserElement.validate()is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such asParserElement.set_debug()andParserElement.create_diagram(). (Raised in Issue #444, thanks Andrea Micheli!)Added bool
embedargument toParserElement.create_diagram(). When passed as True, the resulting diagram will omit the<DOCTYPE>,<HEAD>, and<BODY>tags so that it can be embedded in other HTML source. (Useful when embedding a call tocreate_diagram()in a PyScript HTML page.)Added
recurseargument toParserElement.set_debugto set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399.Added '·' (Unicode MIDDLE DOT) to the set of
pp.unicode.Latin1.identbodychars.Fixed bug in
Wordwhenmax=2. Also added performance enhancement when specifyingexactargument. Reported in issue #409 by panda-34, nice catch!Wordarguments are now validated ifminandmaxare both given, thatmin<=max; raisesValueErrorif values are invalid.Fixed bug in
srange, when parsing escaped '/' and '\' inside a range set.Fixed exception messages for some
ParserElementswith custom names, which instead showed their contained expression names.Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.
Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!
Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.
General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!
invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!
Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.
- Python
Published by ptmcg almost 3 years ago
pyparsing - pyparsing 3.0.9
Added Unicode set
BasicMultilingualPlane(may also be referenced asBMP) representing the Basic Multilingual Plane (Unicode characters up to code point 65535). Can be used to parse most language characters, but omits emojis, wingdings, etc. Raised in discussion with Dave Tapley (issue #392).To address mypy confusion of
pyparsing.Optionalandtyping.Optionalresulting inerror: "_SpecialForm" not callablemessage reported in issue #365, fixed the import in exceptions.py. Nice sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you! (Removed definitions ofOptionalType,DictType, andIterableTypeand replaced them withtyping.Optional,typing.Dict, andtyping.Iterablethroughout.)Fixed typo in jinja2 template for railroad diagrams, thanks for the catch Nioub (issue #388).
Removed use of deprecated
pkg_resourcespackage in railroad diagramming code (issue #391).Updated bigqueryviewparser.py example to parse examples at https://cloud.google.com/bigquery/docs/reference/legacy-sql
- Python
Published by ptmcg almost 4 years ago
pyparsing - pyparsing 3.0.8
Version 3.0.8 -
API CHANGE: modified pyproject.toml to require Python version 3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6 fail in evaluating the
version_infoclass (implemented usingtyping.NamedTuple). If you are using an earlier version of Python 3.6, you will need to use pyparsing 2.4.7.Improved pyparsing import time by deferring regex pattern compiles. PR submitted by Anthony Sottile to fix issue #362, thanks!
Updated build to use flit, PR by Michał Górny, added BUILDING.md doc and removed old Windows build scripts - nice cleanup work!
More type-hinting added for all arithmetic and logical operator methods in
ParserElement. PR from Kazantcev Andrey, thank you.Fixed
infix_notation's definitions oflparandrpar, to accept parse expressions such that they do not get suppressed in the parsed results. PR submitted by Philippe Prados, nice work.Fixed bug in railroad diagramming with expressions containing
Combineelements. Reported by Jeremy White, thanks!Added
show_groupsargument tocreate_diagramto highlight grouped elements with an unlabeled bounding box.Added
unicode_denormalizer.pyto the examples as a demonstration of how Python's interpreter will accept Unicode characters in identifiers, but normalizes them back to ASCII so that identifiersprintand𝕡𝓻ᵢ𝓃𝘁and𝖕𝒓𝗂𝑛ᵗare all equivalent.Removed imports of deprecated
sre_constantsmodule for catching exceptions when compiling regular expressions. PR submitted by Serhiy Storchaka, thank you.
- Python
Published by ptmcg almost 4 years ago
pyparsing - pyparsing 3.0.7
Fixed bug #345, in which delimitedList changed expressions in place using expr.streamline(). Reported by Kim Gräsman, thanks!
Fixed bug #346, when a string of word characters was passed to WordStart or WordEnd instead of just taking the default value. Originally posted as a question by Parag on StackOverflow, good catch!
Fixed bug #350, in which White expressions could fail to match due to unintended whitespace-skipping. Reported by Fu Hanxi, thank you!
Fixed bug #355, when a QuotedString is defined with characters in its quoteChar string containing regex-significant characters such as ., *, ?, [, ], etc.
Fixed bug in ParserElement.runtests where comments would be displayed using withline_numbers.
Added optional "min" and "max" arguments to
delimited_list. PR submitted by Marius, thanks!Added new API change note in
whats_new_in_pyparsing_3_0_0, regarding a bug fix in thebool()behavior ofParseResults.
Prior to pyparsing 3.0.x, the ParseResults class implementation of __bool__ would return False if the ParseResults item list was empty, even if it contained named results. In 3.0.0 and later, ParseResults will return True if either the item list is not empty or if the named results dict is not empty.
# generate an empty ParseResults by parsing a blank string with
# a ZeroOrMore
result = Word(alphas)[...].parse_string("")
print(result.as_list())
print(result.as_dict())
print(bool(result))
# add a results name to the result
result["name"] = "empty result"
print(result.as_list())
print(result.as_dict())
print(bool(result))
Prints:
[]
{}
False
[]
{'name': 'empty result'}
True
In previous versions, the second call to bool() would return False.
Minor enhancement to Word generation of internal regular expression, to emit consecutive characters in range, such as "ab", as "ab", not "a-b".
Fixed character ranges for search terms using non-Western characters in booleansearchparser, PR submitted by tc-yu, nice work!
Additional type annotations on public methods.
- Python
Published by ptmcg about 4 years ago
pyparsing - pyparsing 3.0.6
Added
suppress_warning()method to individually suppress a warning on a specificParserElement. Used to refactororiginal_text_forto preserve internal results names, which, while undocumented, had been adopted by some projects.Fix bug when
delimited_listwas called with a str literal instead of a parse expression.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.5
Added return type annotations for
col,line, andlineno.Fixed bug when
warn_ungrouped_named_tokens_in_collectionwarning was raised when assigning a results name to anoriginal_text_forexpression. (Issue #110, would raise warning in packaging.)Fixed internal bug where
ParserElement.streamline()would not return self if already streamlined.Changed
run_tests()output to default to not showing line and column numbers. If line numbering is desired, call withwith_line_numbers=True. Also fixed minor bug where separating line was not included after a test failure.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.4
Fixed bug in which
Dictclasses did not correctly return tokens as nestedParseResults, reported by and fix identified by Bu Sun Kim, many thanks!!!Documented API-changing side-effect of converting
ParseResultsto use__slots__to pre-define instance attributes. This means that code written like this (which was allowed in pyparsing 2.4.7):result = Word(alphas).parseString("abc") result.xyz = 100
now raises this Python exception:
AttributeError: 'ParseResults' object has no attribute 'xyz'
To add new attribute values to ParseResults object in 3.0.0 and later, you must assign them using indexed notation:
result["xyz"] = 100
You will still be able to access this new value as an attribute or as an indexed item.
- Fixed bug in railroad diagramming where the vertical limit would count all expressions in a group, not just those that would create visible railroad elements.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.3
Fixed regex typo in
one_offix foras_keyword=True.Fixed a whitespace-skipping bug, Issue #319, introduced as part of the revert of the
LineStartchanges. Reported by Marc-Alexandre Côté, thanks!Added header column labeling > 100 in
with_line_numbers- some input lines are longer than others.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.2
- Reverted change in behavior with
LineStartandStringStart, which changed the interpretation of when and howLineStartandStringStartshould match when a line starts with spaces. In 3.0.0, thexxxStartexpressions were not really treated like expressions in their own right, but as modifiers to the following expression when used likeLineStart() + expr, so that if there were whitespace on the line beforeexpr(which would match in versions prior to 3.0.0), the match would fail.
3.0.0 implemented this by automatically promoting LineStart() + expr to AtLineStart(expr), which broke existing parsers that did not expect expr to necessarily be right at the start of the line, but only be the first token found on the line. This was reported as a regression in Issue #317.
In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new AtLineStart and AtStringStart expression classes, so that parsers can chose whichever behavior applies in their specific instance. Specifically:
# matches expr if it is the first token on the line (allows for leading whitespace)
LineStart() + expr
# matches only if expr is found in column 1
AtLineStart(expr)
Performance enhancement to
one_ofto always generate an internalRegex, even ifcaselessoras_keywordargs are given asTrue(unless explicitly disabled by passinguse_regex=False).IndentedBlockclass now works withrecursiveflag. By default, the results parsed by anIndentedBlockare grouped. This can be disabled by constructing theIndentedBlockwithgrouped=False.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.1
Fixed bug where Word(max=n) did not match word groups less than length 'n'. Thanks to Joachim Metz for catching this!
Fixed bug where ParseResults accidentally created recursive contents. Joachim Metz on this one also!
Fixed bug where warnonmultiplestringargstooneof warning is raised even when not enabled.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.0
Version 3.0.0 -
- A consolidated list of all the changes in the 3.0.0 release can be found in docs/whatsnewin300.rst. (https://github.com/pyparsing/pyparsing/blob/master/docs/whatsnewin300.rst)
Version 3.0.0.final -
Added support for python -W warning option to call enableallwarnings() at startup. Also detects setting of PYPARSINGENABLEALLWARNINGS environment variable to any non-blank value.
Fixed named results returned by
urlto match fields as they would be parsed using urllib.parse.urlparse.Early response to
with_line_numberswas positive, with some requested enhancements: . added a trailing "|" at the end of each line (to show presence of trailing spaces); can be customized usingeol_markargument . added expandtabs argument, to control calling str.expandtabs (defaults to True to match parseString) . added markspaces argument to support display of a printing character in place of spaces, or Unicode symbols for space and tab characters . added mark_control argument to support highlighting of control characters using '.' or Unicode symbols, such as "␍" and "␊".Modified helpers commonhtmlentity and replacehtmlentity() to use the HTML entity definitions from html.entities.html5.
Updated the class diagram in the pyparsing docs directory, along with the supporting .puml file (PlantUML markup) used to create the diagram.
Added global method
autoname_elements()to callset_name()on all locally definedParserElementsthat haven't been explicitly named usingset_name(), using their local variable name. Useful for setting names on multiple elements when creating a railroad diagram.a = pp.Literal("a") b = pp.Literal("b").set_name("bbb") pp.autoname_elements()
a will get named "a", while b will keep its name "bbb".
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.0rc2
- Added
urlexpression topyparsing_common. (Sample code posted by Wolfgang Fahl, very nice!)
This new expression has been added to the urlExtractorNew.py example, to show how it extracts URL fields into separate results names.
Added method to
pyparsing_testingto help debugging,with_line_numbers. Returns a string with line and column numbers corresponding to values shown when parsing with expr.set_debug():data = """\ A 100""" expr = pp.Word(pp.alphanums).setname("word").setdebug() print(ppt.withlinenumbers(data)) expr[...].parseString(data)
prints:
1
1234567890
1: A
2: 100
Match word at loc 3(1,4)
A
^
Matched word -> ['A']
Match word at loc 11(2,7)
100
^
Matched word -> ['100']
Added new example
cuneiform_python.pyto demonstrate creating a new Unicode range, and writing a Cuneiform->Python transformer (inspired by zhpy).Fixed issue #272, reported by PhasecoreX, when LineStart() expressions would match expressions that were not necessarily at the beginning of a line.
As part of this fix, two new classes have been added: AtLineStart and AtStringStart. The following expressions are equivalent:
LineStart() + expr and AtLineStart(expr)
StringStart() + expr and AtStringStart(expr)
Fixed ParseFatalExceptions failing to override normal exceptions or expression matches in MatchFirst expressions. Addresses issue #251, reported by zyp-rgb.
Fixed bug in which ParseResults replaces a collection type value with an invalid type annotation (changed behavior in Python 3.9). Addresses issue #276, reported by Rob Shuler, thanks.
Fixed bug in ParseResults when calling
__getattr__for special double-underscored methods. Now raises AttributeError for non-existent results when accessing a name starting with '__'. Addresses issue #208, reported by Joachim Metz.Modified debug fail messages to include the expression name to make it easier to sync up match vs success/fail debug messages.
- Python
Published by ptmcg over 4 years ago
pyparsing - pyparsing 3.0.0rc1
Railroad diagrams have been reformatted: . creating diagrams is easier - call
expr.create_diagram("diagram_output.html")create_diagram()takes 3 arguments: . the filename to write the diagram HTML . optional 'vertical' argument, to specify the minimum number of items in a path to be shown vertically; default=3 . optional 'showresultsnames' argument, to specify whether results name annotations should be shown; default=False
. every expression that gets a name using setName() gets separated out as a separate subdiagram
. results names can be shown as annotations to diagram items
. Each, FollowedBy, and PrecededBy elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND] annotations
. removed annotations for Suppress elements
. some diagram cleanup when a grammar contains Forward elements
. check out the examples make_diagram.py and railroad_diagram_demo.py
Type annotations have been added to most public API methods and classes.
Better exception messages to show full word where an exception occurred.
Word(alphas)[...].parseString("abc 123", parseAll=True)
Was:
pyparsing.ParseException: Expected end of text, found '1' (at char 4), (line:1, col:5)
Now:
pyparsing.exceptions.ParseException: Expected end of text, found '123' (at char 4), (line:1, col:5)
Suppresscan be used to suppress text skipped using "...".source = "lead in START relevant text END trailing text" startmarker = Keyword("START") endmarker = Keyword("END") findbody = Suppress(...) + startmarker + ... + endmarker print(findbody.parseString(source).dump())
Prints:
['START', 'relevant text ', 'END']
- _skipped: ['relevant text ']
- New string constants
identcharsandidentbodycharsto help in defining identifierWordexpressions
Two new module-level strings have been added to help when defining identifiers, identchars and identbodychars.
Instead of writing::
import pyparsing as pp
identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
you will be able to write::
identifier = pp.Word(pp.indentchars, pp.identbodychars)
Those constants have also been added to all the Unicode string classes::
import pyparsing as pp
ppu = pp.pyparsing_unicode
cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
Added a caseless parameter to the
CloseMatchclass to allow for casing to be ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)Fixed bug in
Locatedclass when used with a results name. (Issue #294)Fixed bug in
QuotedStringclass when the escaped quote string is not a repeated character. (Issue #263)parseFile()andcreate_diagram()methods now will acceptpathlib.Patharguments.
- Python
Published by ptmcg over 4 years ago
pyparsing -
- PEP-8 compatible names are being introduced in pyparsing version 3.0!
All methods such as
parseStringhave been replaced with the PEP-8 compliant nameparse_string. In addition, arguments such asparseAllhave been renamed toparse_all. For backward-compatibility, synonyms for all renamed methods and arguments have been added, so that existing pyparsing parsers will not break. These synonyms will be removed in a future release.
In addition, the Optional class has been renamed to Opt, since it clashes with the common typing.Optional type specifier that is used in the Python type annotations. A compatibility synonym is defined for now, but will be removed in a future release.
HUGE NEW FEATURE - Support for left-recursive parsers! Following the method used in Python's PEG parser, pyparsing now supports left-recursive parsers when left recursion is enabled.
import pyparsing as pp pp.ParserElement.enable_left_recursion() # a common left-recursion definition # define a list of items as 'list + item | item' # BNF: # item_list := item_list item | item # item := word of alphas item_list = pp.Forward() item = pp.Word(pp.alphas) item_list <<= item_list + item | item item_list.run_tests("""\ To parse or not to parse that is the question """)Prints:
['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
Great work contributed by Max Fischer!
delimited_listnow supports an additional flagallow_trailing_delim, to optionally parse an additional delimiter at the end of the list. Contributed by Kazantcev Andrey, thanks!Removed internal comparison of results values against b"", which raised a BytesWarning when run with
python -bb. Fixes issue #271 reported by Florian Bruhin, thank you!Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by legrandlegrand - much better.
Python 3.5 will not be supported in the pyparsing 3 releases. This will allow for future pyparsing releases to add parameter type annotations, and to take advantage of dict key ordering in internal results name tracking.
- Python
Published by ptmcg over 4 years ago
pyparsing - Pyparsing 3.0.0b2
- API CHANGE
locatedExpris being replaced by the classLocated.Locatedhas the same constructor interface aslocatedExpr, but fixes bugs in the returnedParseResultswhen the searched expression contains multiple tokens, or has internal results names.
locatedExpr is deprecated, and will be removed in a future release.
- Python
Published by ptmcg about 5 years ago
pyparsing - Pyparsing 3.0.0b1
API CHANGE Diagnostic flags have been moved to an enum,
pyparsing.Diagnostics, and they are enabled through module-level methods:pyparsing.enable_diag()pyparsing.disable_diag()pyparsing.enable_all_warnings()
API CHANGE Most previous
SyntaxWarningsthat were warned when using pyparsing classes incorrectly have been converted toTypeErrorandValueErrorexceptions, consistent with Python calling conventions. All warnings warned by diagnostic flags have been converted fromSyntaxWarningstoUserWarnings.To support parsers that are intended to generate native Python collection types such as lists and dicts, the
GroupandDictclasses now accept an additional boolean keyword argumentaslistandasdictrespectively. See thejsonParser.pyexample in thepyparsing/examplessource directory for how to return types asParseResultsand as Python collection types, and the distinctions in working with the different types.
In addition parse actions that must return a value of list type (which would normally be converted internally to a ParseResults) can override this default behavior by returning their list wrapped in the new ParseResults.List class:
# this parse action tries to return a list, but pyparsing
# will convert to a ParseResults
def return_as_list_but_still_get_parse_results(tokens):
return tokens.asList()
# this parse action returns the tokens as a list, and pyparsing will
# maintain its list type in the final parsing results
def return_as_list(tokens):
return ParseResults.List(tokens.asList())
This is the mechanism used internally by the Group class when defined using aslist=True.
A new
IndentedBlockclass is introduced, to eventually replace the currentindentedBlockhelper method. The interface is largely the same, however, the new class manages its own internal indentation stack, so it is no longer necessary to maintain an externalindentStackvariable.API CHANGE Added
cache_hitkeyword argument to debug actions. Previously, if packrat parsing was enabled, the debug methods were not called in the event of cache hits. Now these methods will be called, with an added argumentcache_hit=True.
If you are using packrat parsing and enable debug on expressions using a custom debug method, you can add the cache_hit=False keyword argument,
and your method will be called on packrat cache hits. If you choose not to add this keyword argument, the debug methods will fail silently, behaving as they did previously.
When using
setDebugwith packrat parsing enabled, packrat cache hits will now be included in the output, shown with a leading '*'. (Previously, cache hits and responses were not included in debug output.) For those using custom debug actions, see the previous item regarding an optional API change for those methods.setDebugoutput will also show more details about what expression is about to be parsed (the current line of text being parsed, and the current parse position):Match integer at loc 0(1,1) 1 2 3 ^ Matched integer -> ['1']
The current debug location will also be indicated after whitespace has been skipped (was previously inconsistent, reported in Issue #244, by Frank Goyens, thanks!).
Modified the repr() output for
ParseResultsto include the class name as part of the output. This is to clarify for new pyparsing users who misread the repr output as a tuple of a list and a dict. pyparsing results will now read like:ParseResults(['abc', 'def'], {'qty': 100}]
instead of just:
(['abc', 'def'], {'qty': 100}]
Fixed bugs in Each when passed OneOrMore or ZeroOrMore expressions: . first expression match could be enclosed in an extra nesting level . out-of-order expressions now handled correctly if mixed with required expressions . results names are maintained correctly for these expressions
Fixed traceback trimming, and added
ParserElement.verbose_tracebacksave/restore toreset_pyparsing_context().Default string for
Wordexpressions now also include indications ofminandmaxlength specification, if applicable, similar to regex length specifications:Word(alphas) -> "W:(A-Za-z)" Word(nums) -> "W:(0-9)" Word(nums, exact=3) -> "W:(0-9){3}" Word(nums, min=2) -> "W:(0-9){2,...}" Word(nums, max=3) -> "W:(0-9){1,3}" Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
For expressions of the Char class (similar to Word(..., exact=1), the expression is simply the character range in parentheses:
Char(nums) -> "(0-9)"
Char(alphas) -> "(A-Za-z)"
Removed
copy()override inKeywordclass which did not preserve definition of ident chars from the original expression. PR #233 submitted by jgrey4296, thanks!In addition to
pyparsing.__version__, there is now also apyparsing.__version_info__, following the same structure and field names as insys.version_info.
- Python
Published by ptmcg over 5 years ago
pyparsing - Pyparsing 3.0.0a2
Version 3.0.0a2 - June, 2020
Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0" documentation.
API CHANGE Changed result returned when parsing using countedArray, the array items are no longer returned in a doubly-nested list.
An excellent new enhancement is the new railroad diagram generator for documenting pyparsing parsers:
import pyparsing as pp from pyparsing.diagram import to_railroad, railroad_to_html from pathlib import Path # define a simple grammar for parsing street addresses such # as "123 Main Street" # number word... number = pp.Word(pp.nums).setName("number") name = pp.Word(pp.alphas).setName("word")[1, ...] parser = number("house_number") + name("street") parser.setName("street address") # construct railroad track diagram for this parser and # save as HTML rr = to_railroad(parser) Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
Very nice work provided by Michael Milton, thanks a ton!
Enhanced default strings created for Word expressions, now showing string ranges if possible.
Word(alphas)would formerly print asW:(ABCD...), now prints asW:(A-Za-z).Added ignoreWhitespace(recurse:bool = True) and added a recurse argument to leaveWhitespace, both added to provide finer control over pyparsing's whitespace skipping. Also contributed by Michael Milton.
The unicode range definitions for the various languages were recalculated by interrogating the unicodedata module by character name, selecting characters that contained that language in their Unicode name. (Issue #227)
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean is also defined as a synonym for compatibility).
Enhanced ParseResults dump() to show both results names and list subitems. Fixes bug where adding a results name would hide lower-level structures in the ParseResults.
Added new
__diag__warnings:"warnonparseusingempty_Forward" - warns that a Forward has been included in a grammar, but no expression was attached to it using '<<=' or '<<'
"warnonassignmenttoForward" - warns that a Forward has been created, but was probably later overwritten by erroneously using '=' instead of '<<=' (this is a common mistake when using Forwards) (currently not working on PyPy)
Added ParserElement.recurse() method to make it simpler for grammar utilities to navigate through the tree of expressions in a pyparsing grammar.
Fixed bug in ParseResults repr() which showed all matching entries for a results name, even if listAllMatches was set to False when creating the ParseResults originally. Reported by Nicholas42 on GitHub, good catch! (Issue #205)
Modified refactored modules to use relative imports, as pointed out by setuptools project member jaraco, thank you!
Off-by-one bug found in the roman_numerals.py example, a bug that has been there for about 14 years! PR submitted by Jay Pedersen, nice catch!
A simplified Lua parser has been added to the examples (lua_parser.py).
Added make_diagram.py to the examples directory to demonstrate creation of railroad diagrams for selected pyparsing examples. Also restructured some examples to make their parsers importable without running their embedded tests.
- Python
Published by ptmcg over 5 years ago
pyparsing - Pyparsing 2.4.7
Version 2.4.7 - April, 2020
- Backport of selected fixes from 3.0.0 work: . Each bug with Regex expressions . And expressions not properly constructing with generator . Traceback abbreviation . Bug in deltatime example . Fix regexen in pyparsingcommon.real and .sci_real . Avoid FutureWarning on Python 3.7 or later . Cleanup output in runTests if comments are embedded in test string
- Python
Published by ptmcg almost 6 years ago
pyparsing - Pyparsing 2.4.6
Version 2.4.6 - December, 2019
Fixed typos in White mapping of whitespace characters, to use correct "\u" prefix instead of "u\".
Fix bug in left-associative ternary operators defined using infixNotation. First reported on StackOverflow by user Jeronimo.
Backport of pyparsingtest namespace from 3.0.0, including TestParseResultsAsserts mixin class defining unittest-helper methods: . def assertParseResultsEquals( self, result, expectedlist=None, expecteddict=None, msg=None) . def assertParseAndCheckList( self, expr, teststring, expectedlist, msg=None, verbose=True) . def assertParseAndCheckDict( self, expr, teststring, expecteddict, msg=None, verbose=True) . def assertRunTestResults( self, runtestsreport, expectedparseresults=None, msg=None) . def assertRaisesParseException(self, exctype=ParseException, msg=None)
To use the methods in this mixin class, declare your unittest classes as:
from pyparsing import pyparsing_test as ppt
class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase):
...
- Python
Published by ptmcg about 6 years ago
pyparsing - Pyparsing 2.4.5
Version 2.4.5 - November, 2019
- Fixed encoding when setup.py reads README.rst to include the project long description when uploading to PyPI. A stray unicode space in README.rst prevented the source install on systems whose default encoding is not 'utf-8'.
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.4
Check-in bug in Pyparsing 2.4.3 that raised UserWarnings was masked by stdout buffering in unit tests - fixed.
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.3
Version 2.4.3 - November, 2019
(Backport of selected critical items from 3.0.0 development branch.)
Fixed a bug in
ParserElement.__eq__that would for some parsers create a recursion error at parser definition time. Thanks to Michael Clerx for the assist. (Addresses issue #123)Fixed bug in
indentedBlockwhere a block that ended at the end of the input string could cause pyparsing to loop forever. Raised as part of discussion on StackOverflow with geckos.Backports from pyparsing 3.0.0: .
__diag__.enable_all_warnings(). Fixed bug inPrecededBywhich caused infinite recursion, issue #127 . support for usingregex-compiled RE to constructRegexexpressions
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.2
Version 2.4.2 - July, 2019
- Updated the shorthand notation that has been added for repetition
expressions: expr[min, max], with '...' valid as a min or max value:
- expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent to expr*n + ZeroOrMore(expr) (read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n) Note that expr[..., n] and expr[m, n] do not raise an exception if more than n exprs exist in the input stream. If this behavior is desired, then write expr[..., n] + ~expr.
Better interpretation of [...] as ZeroOrMore raised by crowsonkb, thanks for keeping me in line!
If upgrading from 2.4.1 or 2.4.1.1 and you have used expr[...]
for OneOrMore(expr), it must be updated to expr[1, ...].
- The defaults on all the
__diag__switches have been set to False, to avoid getting alarming warnings. To use these diagnostics, set them to True after importing pyparsing.
Example:
import pyparsing as pp
pp.__diag__.warn_multiple_tokens_in_named_alternation = True
- Fixed bug introduced by the use of getitem for repetition, overlooking Python's legacy implementation of iteration by sequentially calling getitem with increasing numbers until getting an IndexError. Found during investigation of problem reported by murlock, merci!
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.1.1
This is a re-release of version 2.4.1 to restore the release history in PyPI, since the 2.4.1 release was deleted.
There are 3 known issues in this release, which are fixed in the upcoming 2.4.2:
API change adding support for
expr[...]- the original code in 2.4.1 incorrectly implemented this as OneOrMore. Code using this feature under this relase should explicitly useexpr[0, ...]for ZeroOrMore andexpr[1, ...]for OneOrMore. In 2.4.2 you will be able to writeexpr[...]equivalent toZeroOrMore(expr).Bug if composing And, Or, MatchFirst, or Each expressions using an expression. This only affects code which uses explicit expression construction using the And, Or, etc. classes instead of using overloaded operators '+', '^', and so on. If constructing an And using a single expression, you may get an error that "cannot multiply ParserElement by 0 or (0, 0)" or a Python
IndexError. Change code likecmd = Or(Word(alphas))
to
cmd = Or([Word(alphas)])
(Note that this is not the recommended style for constructing Or expressions.)
Some newly-added
__diag__switches are enabled by default, which may give rise to noisy user warnings for existing parsers. You can disable them using:import pyparsing as pp pp.diag.warnmultipletokensinnamedalternation = False pp.diag.warnungroupednamedtokensincollection = False pp.diag.warnnamesetonemptyForward = False pp.diag.warnonmultiplestringargstooneof = False pp.diag.enabledebugonnamed_expressions = False
In 2.4.2 these will all be set to False by default.
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.2a1
Release candidate for 2.4.2:
- FIxes incorrect implementation of expr[…] as OneOrMore, changed to ZeroOrMore
- Fixes
__getitem__-induced iterability for ParserElement class __diag__flags are now all False by default
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.1
For a minor point release, this release contains many new features!
A new shorthand notation has been added for repetition expressions:
expr[min, max], with...valid as a min or max value:expr[...]is equivalent toOneOrMore(expr)expr[0, ...]is equivalent toZeroOrMore(expr)expr[1, ...]is equivalent toOneOrMore(expr)expr[n, ...]orexpr[n,]is equivalent toexpr*n + ZeroOrMore(expr)(read as "n or more instances of expr")expr[..., n]is equivalent toexpr*(0, n)expr[m, n]is equivalent toexpr*(m, n)Note thatexpr[..., n]andexpr[m, n]do not raise an exception if more than n exprs exist in the input stream. If this behavior is desired, then writeexpr[..., n] + ~expr.
...can also be used as short hand forSkipTowhen used in adding parse expressions to compose anAndexpression.Literal('start') + ... + Literal('end') And(['start', ..., 'end'])
are both equivalent to:
Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
The ... form has the added benefit of not requiring repeating the skip target expression. Note that the skipped text is returned with 'skipped' as a results name, and that the contents of `skippedwill contain a list of text from all...`s in the expression.
...can also be used as a "skip forward in case of error" expression:expr = "start" + (Word(nums).setName("int") | ...) + "end" expr.parseString("start 456 end") ['start', '456', 'end'] expr.parseString("start 456 foo 789 end") ['start', '456', 'foo 789 ', 'end'] - _skipped: ['foo 789 '] expr.parseString("start foo end") ['start', 'foo ', 'end'] - _skipped: ['foo '] expr.parseString("start end") ['start', '', 'end'] - _skipped: ['missing <int>']
Note that in all the error cases, the '_skipped' results name is present, showing a list of the extra or missing items.
This form is only valid when used with the '|' operator.
Improved exception messages to show what was actually found, not just what was expected.
word = pp.Word(pp.alphas) pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
Former exception message:
pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
New exception message:
pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
- Added diagnostic switches to help detect and warn about common parser construction mistakes, or enable additional parse debugging. Switches are attached to the
pyparsing.__diag__namespace object:warn_multiple_tokens_in_named_alternation- flag to enable warnings when a results name is defined on aMatchFirstorOrexpression with one or moreAndsubexpressions (default=True)warn_ungrouped_named_tokens_in_collection- flag to enable warnings when a results name is defined on a containing expression with ungrouped subexpressions that also have results names (default=True)warn_name_set_on_empty_Forward- flag to enable warnings whan a Forward is defined with a results name, but has no contents defined (default=False)warn_on_multiple_string_args_to_oneof- flag to enable warnings whanoneOfis incorrectly called with multiple str arguments (default=True)enable_debug_on_named_expressions- flag to auto-enable debug on all subsequent calls toParserElement.setName()(default=False)
warn_multiple_tokens_in_named_alternation is intended to help those who currently have set __compat__.collect_all_And_tokens to False as a workaround for using the pre-2.3.1 code with named MatchFirst or Or expressions containing an And expression.
Added
ParseResults.from_dictclassmethod, to simplify creation of aParseResultswith results names using a dict, which may be nested. This makes it easy to add a sub-level of named items to the parsed tokens in a parse action.Added
asKeywordargument (default=False) tooneOf, to force keyword-style matching on the generated expressions.ParserElement.runTestsnow accepts an optional 'file' argument to redirect test output to a file-like object (such as a StringIO, or opened file). Default is to write to sys.stdout.conditionAsParseActionis a helper method for constructing a parse action method from a predicate function that simply returns a boolean result. Useful for those places where a predicate cannot be added usingaddCondition, but must be converted to a parse action (such as ininfixNotation). May be used as a decorator if default message and exception types can be used. SeeParserElement.addConditionfor more details about the expected signature and behavior for predicate condition methods.While investigating issue #93, I found that
OrandaddConditioncould interact to select an alternative that is not the longest match. This is becauseOrfirst checks all alternatives for matches without running attached parse actions or conditions, orders by longest match, and then rechecks for matches with conditions and parse actions. Some expressions, when checking with conditions, may end up matching on a shorter token list than originally matched, but would be selected because of its original priority. This matching code has been expanded to do more extensive searching for matches when a second-pass check matches a smaller list than in the first pass.Fixed issue #87, a regression in indented block. Reported by Renz Bagaporo, who submitted a very nice repro example, which makes the bug-fixing process a lot easier, thanks!
Fixed MemoryError issue #85 and #91 with str generation for Forwards. Thanks decalage2 and Harmon758 for your patience.
Modified
setParseActionto acceptNoneas an argument, indicating that all previously-defined parse actions for the expression should be cleared.Modified
pyparsing_common.realandsci_realto parse reals without leading integer digits before the decimal point, consistent with Python real number formats. Original PR #98 submitted by ansobolev.Modified
runTeststo callpostParsefunction before dumping out the parsed results - allows forpostParseto add further results, such as indications of additional validation success/failure.Updated
statemachineexample: refactored state transitions to use overridden classmethods; added<statename>Mixinclass to simplify definition of application classes that "own" the state object and delegate to it to model state-specific properties and behavior.Added example
nested_markup.py, showing a simple wiki markup with nested markup directives, and illustrating the use of...for skipping over input to match the next expression. (This example uses syntax that is not valid under Python 2.)Rewrote
delta_time.pyexample (renamed fromdeltaTime.py) to fix some omitted formats and upgrade to latest pyparsing idioms, beginning with writing an actual BNF.With the help and encouragement from several contributors, including Matej Cepl and Cengiz Kaygusuz, I've started cleaning up the internal coding styles in core pyparsing, bringing it up to modern coding practices from pyparsing's early development days dating back to 2003. Whitespace has been largely standardized along PEP8 guidelines, removing extra spaces around parentheses, and adding them around arithmetic operators and after colons and commas. I was going to hold off on doing this work until after 2.4.1, but after cleaning up a few trial classes, the difference was so significant that I continued on to the rest of the core code base. This should facilitate future work and submitted PRs, allowing them to focus on substantive code changes, and not get sidetracked by whitespace issues.
NOTE: Deprecated functions and features that will be dropped in pyparsing 2.5.0 (planned next release):
- support for Python 2 - ongoing users running with Python 2 can continue to use pyparsing 2.4.1
ParseResults.asXML()- if used for debugging, switch to usingParseResults.dump(); if used for data transfer, useParseResults.asDict()to convert to a nested Python dict, which can then be converted to XML or JSON or other transfer formatoperatorPrecedencesynonym forinfixNotation- convert to callinginfixNotationcommaSeparatedList- convert to usingpyparsing_common.comma_separated_listupcaseTokensanddowncaseTokens- convert to usingpyparsing_common.upcaseTokensanddowncaseTokens__compat__.collect_all_And_tokenswill not be settable to False to revert to pre-2.3.1 results name behavior - review use of names forMatchFirstandOrexpressions containingAndexpressions, as they will return the complete list of parsed tokens, not just the first one. Use__diag__.warn_multiple_tokens_in_named_alternationto help identify those expressions in your parsers that will have changed as a result.
- Python
Published by ptmcg over 6 years ago
pyparsing - Pyparsing 2.4.0
Well, it looks like the API change that was introduced in 2.3.1 was more drastic than expected, so for a friendlier forward upgrade path, this release: . Bumps the current version number to 2.4.0, to reflect this incompatible change. . Adds a
pyparsing.__compat__object for specifying compatibility with future breaking changes. . Conditionalizes the API-breaking behavior, based on the valuepyparsing.__compat__.collect_all_And_tokens. By default, this value will be set to True, reflecting the new bugfixed behavior. To set this value to False, add to your code:import pyparsing pyparsing.__compat__.collect_all_And_tokens = False
. User code that is dependent on the pre-bugfix behavior can restore it by setting this value to False.
In 2.5 and later versions, the conditional code will be removed and setting the flag to True or False in these later versions will have no effect.
Updated unitTests.py and simpleunittests.py to be compatible with
python setup.py test. To run tests using setup, do:python setup.py test python setup.py test -s unitTests.suite python setup.py test -s simpleunittests.suite
Prompted by issue #83 and PR submitted by bdragon28, thanks.
Fixed bug in
ParserElement.runTestshandling '\n' literals in quoted strings.Added
tag_bodyattribute to the start tag expressions generated bymakeHTMLTags, so that you can avoid usingSkipToto roll your own tag body expression:a, aEnd = pp.makeHTMLTags('a') link = a + a.tagbody("displayedtext") + aEnd for t in s.searchString(htmlpage): print(t.displayedtext, '->', t.startA.href)
indentedBlockfailure handling was improved; PR submitted by TMiguelT, thanks!Address Py2 incompatibility in
simple_unit_tests, plus explain() and Forward str() cleanup; PRs graciously provided by eswald.Fixed docstring with embedded '\w', which creates SyntaxWarnings in Py3.8, issue #80.
Examples:
- Added example parser for rosettacode.org tutorial compiler.
- Added example to show how an HTML table can be parsed into a collection of Python lists or dicts, one per row.
- Updated SimpleSQL.py example to handle nested selects, reworked 'where' expression to use infixNotation.
- Added include_preprocessor.py, similar to macroExpander.py.
- Examples using makeHTMLTags use new tag_body expression when retrieving a tag's body text.
Updated examples that are runnable as unit tests:
python setup.py test -s examples.antlrgrammartests python setup.py test -s examples.test_bibparse
- Python
Published by ptmcg almost 7 years ago
pyparsing - Pyparsing 2.3.1
New features in Pyparsing 2.3.1 -
ParseException.explain() method, to convert a raw Python traceback into a list of the parse expressions leading up to a parse mismatch.
New unicode sets Latin-A and Latin-B, and the ability to define custom sets using multiple inheritance.
class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA): pass turkish_word = pp.Word(Turkish_set.alphas)State machine examples, showing how to extend Python with your own pyparsing-enabled syntax. The examples implement a 'statemachine' keyword to define a set of classes and transition attribute to implement a State pattern:
statemachine TrafficLightState: Red -> Green Green -> Yellow Yellow -> Red
Transitions can be named also:
statemachine LibraryBookState:
New -(shelve)-> Available
Available -(reserve)-> OnHold
OnHold -(release)-> Available
Available -(checkout)-> CheckedOut
CheckedOut -(checkin)-> Available
Example parser for decaf language. This language is commonly used in university CS compiler classes.
Fixup of docstrings to Sphinx format, so pyparsing docs are now available on readthedocs.com! (https://pyparsing-docs.readthedocs.io/en/latest/)
- Python
Published by ptmcg about 7 years ago
pyparsing - Pyparsing 2.3.0
- NEW SUPPORT FOR UNICODE CHARACTER RANGES This release introduces the pyparsing_unicode namespace class, defining a series of language character sets to simplify the definition of alphas, nums, alphanums, and printables in the following language sets: . Arabic . Chinese . Cyrillic . Devanagari . Greek . Hebrew . Japanese (including Kanji, Katakana, and Hirigana subsets) . Korean . Latin1 (includes 7 and 8-bit Latin characters) . Thai . CJK (combination of Chinese, Japanese, and Korean sets)
POSSIBLE API CHANGES:
- IndexErrors raised in parse actions are now wrapped in ParseExceptions
- ParseResults have had several bugfixes which remove erroneous nesting levels
See the CHANGES file for more details.
New classes:
- PrecededBy - lookbehind match
- Char - single character match (similar to Word(exact=1))
- Python
Published by ptmcg over 7 years ago
pyparsing - pyparsing_2.2.2
Version 2.2.2 - September, 2018
Fixed bug in SkipTo, if a SkipTo expression that was skipping to an expression that returned a list (such as an And), and the SkipTo was saved as a named result, the named result could be saved as a ParseResults - should always be saved as a string. Issue #28, reported by seron.
Added simpleunittests.py, as a collection of easy-to-follow unit tests for various classes and features of the pyparsing library. Primary intent is more to be instructional than actually rigorous testing. Complex tests can still be added in the unitTests.py file.
New features added to the Regex class:
- optional asGroupList parameter, returns all the capture groups as a list
- optional asMatch parameter, returns the raw re.match result
- new sub(repl) method, which adds a parse action calling re.sub(pattern, repl, parsed_result). Simplifies creating Regex expressions to be used with transformString. Like re.sub, repl may be an ordinary string (similar to using pyparsing's replaceWith), or may contain references to capture groups by group number, or may be a callable that takes an re match group and returns a string.
For instance:
expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>") expr.transformString("h1: This is the title")will return
<h1>This is the title</h1>Fixed omission of LICENSE file in source tarball, also added CODEOFCONDUCT.md per GitHub community standards. Issue #31
- Python
Published by ptmcg over 7 years ago
pyparsing - pyparsing_2.2.1
- Updates to migrate source repo to GitHub
- Fix deprecation warning in Python 3.7 re: importing collections.abc
- Fix Literal/Keyword bug raising IndexError instead of ParseException
- Python
Published by ptmcg over 7 years ago