rfc3987-syntax

Helper functions to syntactically validate strings according to RFC 3987.

https://github.com/willynilly/rfc3987-syntax

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Helper functions to syntactically validate strings according to RFC 3987.

Basic Info
  • Host: GitHub
  • Owner: willynilly
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 39.1 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 3
  • Releases: 1
Created 10 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

rfc3987-syntax

Helper functions to parse and validate the syntax of terms defined in RFC 3987 — the IETF standard for Internationalized Resource Identifiers (IRIs).

🎯 Purpose

The goal of rfc3987-syntax is to provide a lightweight, permissively licensed Python module for validating that strings conform to the ABNF grammar defined in RFC 3987. These helpers are:

  • ✅ Strictly aligned with the syntax rules of RFC 3987
  • ✅ Built using a permissive MIT license
  • ✅ Designed for both open source and proprietary use
  • ✅ Powered by Lark, a fast, EBNF-based parser

🧠 Note: This project focuses on syntax validation only. RFC 3987 specifies additional semantic rules (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.

📄 License, Attribution, and Citation

rfc3987-syntax is licensed under the MIT License, which allows reuse in both open source and commercial software.

This project:

  • ❌ Does not depend on the rfc3987 Python package (GPL-licensed)
  • ✅ Uses lark, licensed under MIT
  • ✅ Implements grammar from RFC 3987, using RFC 3986 where RFC 3987 delegates syntax

⚠️ This project is not affiliated with or endorsed by the authors of RFC 3987 or the rfc3987 Python package.

Please cite this software in accordance with the enclosed CITATION.cff file.

⚠️ Limitations

The grammar and parser enforce only the ABNF syntax defined in RFC 3987. The following are not validated and must be handled separately for full compliance:

  • ✅ Unicode Normalization Form C (NFC)
  • ✅ Bidirectional text (BiDi) constraints (RFC 3987 §4.1)
  • Port number ranges (must be 0–65535)
  • ✅ Valid IPv6 compression (only one ::, max segments)
  • ✅ Context-aware percent-encoding requirements

ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.

📦 Installation

bash pip install rfc3987-syntax

🛠 Usage

List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987

```python from rfc3987syntax import RFC3987SYNTAX_TERMS

print("Supported terms:") for term in RFC3987SYNTAXTERMS: print(term) ```

Syntactically validate a string using the general-purpose validator

```python from rfc3987syntax import isvalid_syntax

if isvalidsyntax(term='iri', value='http://github.com'): print("✓ Valid IRI syntax")

if not isvalidsyntax(term='iri', value='bob'): print("✗ Invalid IRI syntax")

if not isvalidsyntax(term='iri_reference', value='bob'): print("✓ Valid IRI-reference syntax") ```

Alternatively, use term-specific helpers to validate RFC 3987 syntax.

```python from rfc3987syntax import isvalidsyntaxiri from rfc3987syntax import isvalidsyntaxiri_reference

if isvalidsyntax_iri('http://github.com'): print("✓ Valid IRI syntax")

if not isvalidsyntax_iri('bob'): print("✗ Invalid IRI syntax")

if isvalidsyntaxirireference('bob'): print("✓ Valid IRI-reference syntax") ```

Get the Lark parse tree for a syntax validation (useful for additional semantic validation)

```python from rfc3987_syntax import parse

ptree: ParseTree = parse(term="iri", value="http://github.com")

print(ptree) ```

📚 Sources

This grammar was derived from:

  • [RFC 3987 – Internationalized Resource Identifiers (IRIs)]
    → Defines IRI syntax and extensions to URI (e.g. Unicode characters, ucschar)
    → https://datatracker.ietf.org/doc/html/rfc3987

  • [RFC 3986 – Uniform Resource Identifier (URI): Generic Syntax)]
    → Provides reusable components like scheme, authority, ipv4address, etc.
    → https://datatracker.ietf.org/doc/html/rfc3986

📝 When RFC 3986 is listed as the source, it is used in accordance with RFC 3987, which explicitly references it for foundational elements.

Rule-to-Source Mapping

| Rule/Component | Source | Notes | |----------------------|------------|-------| | iri | RFC 3987 | Top-level IRI rule | | iri_reference | RFC 3987 | Top-level IRI Reference rule | | absolute_iri | RFC 3987 | Top-level Absolute IRI rule | | scheme | RFC 3986 | Referenced by RFC 3987 §2.2 | | ihier_part | RFC 3987 | IRI-specific hierarchy | | irelative_ref | RFC 3987 | IRI-specific relative ref | | irelative_part | RFC 3987 | IRI-specific relative part | | iauthority | RFC 3986 | Standard URI authority | | ipath_abempty | RFC 3986 | Path format variant | | ipath_absolute | RFC 3986 | Absolute path | | ipath_noscheme | RFC 3986 | Path disallowing scheme prefix | | ipath_rootless | RFC 3986 | Used in non-scheme contexts | | iquery | RFC 3987 | Query extension to URI | | ifragment | RFC 3987 | Fragment extension to URI | | ipchar, isegment | RFC 3986 | Path characters and segments | | isegment_nz_nc | RFC 3987 | IRI-specific path constraint | | iunreserved | RFC 3987 | Includes ucschar | | ucschar, iprivate| RFC 3987 | Unicode support | | sub_delims | RFC 3986 | Reserved characters | | ip_literal | RFC 3986 | IPv6 or IPvFuture in [] | | ipv6address | RFC 3986 | Expanded forms only | | ipvfuture | RFC 3986 | Forward-compatible | | ipv4address | RFC 3986 | Dotted-decimal IPv4 | | ls32 | RFC 3986 | Final 32 bits of IPv6 | | h16, dec_octet | RFC 3986 | Hex and decimal chunks | | port | RFC 3986 | Optional numeric | | pct_encoded | RFC 3986 | Percent encoding (e.g. %20) | | alpha, digit, hexdig | RFC 3986 | Character classes |

Owner

  • Name: Will Riley
  • Login: willynilly
  • Kind: user
  • Location: Arnhem, The Nederlands
  • Company: Wageningen University & Research

Ph.D. in Educational Psychology (Applied Cognition and Development) from University of Georgia

Citation (CITATION.cff)

cff-version: 1.2.0
title: rfc3987-syntax
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Will
    family-names: Riley
    email: wanderingwill@gmail.com
    orcid: "https://orcid.org/0000-0003-1822-6756"
  - given-names: Jan
    family-names: Kowalleck
repository-code: >-
  https://github.com/willynilly/rfc3987-syntax
abstract: >-
  Helper functions to syntactically validate strings according to RFC 3987
keywords:
  - RFC 3987
  - RFC3987
  - validator 
  - syntax
  - parser
license: MIT
version: "1.1.0"
date-released: "2025-07-18"
references:
  - title: "abnf-to-regexp"
    type: software
    version: "1.1.3"
    license: MIT
    authors:
      - given-names: Marko
        family-names: Ristin
        email: marko@ristin.ch
        orcid: ""
      - given-names: Oliver Steensen-Bech
        family-names: Haagh
        email: oliver@dmc.international
        orcid: ""
      - given-names: Sebastian
        family-names: Heppner
        email: s.heppner@iat.rwth-aachen.de
        orcid: ""
    repository-code: https://github.com/aas-core-works/abnf-to-regexp
  - title: "lark"
    type: software
    version: 1.2.2
    license: MIT
    authors:
      - family-names: Shinan
        given-names: Erez
        email: erezshin@gmail.com
    repository-code: https://github.com/lark-parser/lark
  - title: "Internationalized Resource Identifiers (IRIs)"
    authors:
      - family-names: Dürst
        given-names: Martin
      - family-names: Suignard
        given-names: Michel
    date-released: 2005-01-01
    doi: "10.17487/RFC3987"
    url: "https://www.rfc-editor.org/info/rfc3987"
    type: standard
  - title: "ChatGPT"
    authors:
      - name: OpenAI
    type: software
    version: "GPT-4o"
    url: "https://chat.openai.com/chat"
    

GitHub Events

Total
  • Create event: 2
  • Issues event: 2
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 7
  • Pull request review event: 1
  • Pull request event: 5
Last Year
  • Create event: 2
  • Issues event: 2
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 7
  • Pull request review event: 1
  • Pull request event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jkowalleck (1)
Pull Request Authors
  • jkowalleck (4)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 20,880,293 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
pypi.org: rfc3987-syntax

Helper functions to syntactically validate strings according to RFC 3987.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 20,880,293 Last month
Rankings
Dependent packages count: 9.1%
Average: 30.3%
Dependent repos count: 51.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
pyproject.toml pypi
  • lark >=1.2.2