vs.xml
Special-purpose standalone XML parser, tree builder, and query engine for modern C++
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Special-purpose standalone XML parser, tree builder, and query engine for modern C++
Basic Info
- Host: GitHub
- Owner: lazy-eggplant
- License: lgpl-3.0
- Language: C++
- Default Branch: master
- Homepage: https://lazy-eggplant.github.io/vs.xml/
- Size: 26.7 MB
Statistics
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 11
- Releases: 11
Topics
Metadata Files
README.md
[!WARNING]
Ongoing project, the base functionality is ready, but more documentation and some advanced features have not been implemented yet.
This library offers a mostly-compliant[^1] XML parser, tree builder and several related utilities.
It is not intended as a general purpose library, which means it might not be a good fit for your project.
Please, read the rest of this readme to know more about its objectives and drawbacks.
Features
- Support for a schema-less tree structure which can be fully relocated in disk, memory or offloaded devices without impacting its binary representation.
- Linked to the previous point, pointers/iterators based on this tree structure are random access, no need to navigate the tree to reach them.
- Good memory locality of the tree representation, making many operations on sub-trees trivial
memcpy. - Configurable memory footprint, the internal representation can decrease size for most of its fields properly run on "lesser" systems or improve cache performance.
- An efficient engine to perform queries on a document, all based on lazy evaluation.
- XML serialization and de-serialization.
- Naive support for namespaces[^2].
Non objectives:
- Support for arbitrary editing operations. This library is special-purpose, so only a small number of mutable operations will be supported to keep the rest as fast as possible.
- Extended XML entities, base64, DTD... none of that is needed for the intended target of this library.
- In general, being fully XML compliant.
Quick startup
Just use it as any meson dependency by adding a wrap file to this repository.
Or installing it in your system first and using it as a system dependency.
Full code in the examples folder. You can easily build documents: ```cpp
include
include
include
include
include
include
using namespace xml;
int main(){
DocumentBuilder<{.symbols=xml::builderconfigt::COMPRESS_ALL}> bld;
bld.xml();
bld.comment("This is a comment!");
bld.begin("base-node");
bld.attr("hello", "world");
//Children after the attribute block.
bld.text("This is some text!
Serialize them:
cpp
auto document = *bld.close(); //Make sure to handle the return error if present in production code.
document.print(std::cout, {/*serialization configuration*/});
Access the tree structure: ```cpp //Show comments only for(auto& it: document.root().children() | std::views::filter({return it.type()==xml::typet::COMMENT;})){ std::print("{}\n",it.value().valueor("-- Empty node --")); }
//Example of a helper filter (defined in vs-xml/filters.hpp)
for(auto& it: document.root().children() | filters::name("base-node")){
std::print("{}\n",it.value().value_or("-- Empty node --"));
}
```
Perform queries: ```cpp auto querya = xml::query::queryt{}/"base-node"/xml::query::accept();
for(const auto& t : document.root() & query_a){ std::print("{} @ {}\n", (int)t.type(), t.addr()); }
return 0; } ```
And more, like reading and saving them from binary files (usually memory mapped).
Learn more by checking the examples.
Doxygen and the generated documentation can be found in the github pages of this project.
Supported platforms
This library is mostly standalone, but it requires support for the C standard library and a modern version of the C++ standard library.
I am working with C++23 for development, and I don't really plan on directly supporting older revisions of the language at the expense of code simplicity.
Other dependencies are only used for the test-suite and benchmarks, they are not needed to build and install vs.xml to your system.
Still, many of the standard library features can be replaced by alternatives which are more suitable for embedded systems.
You can track our efforts for embedded support in the dedicated page and tracker.
We also try our best to ensure this library (or a subset of it) can properly work on offloaded targets like GPUs via OpenMP.
To use vs.xml to its fullest extent, make sure you have a kernel image with unified memory access, and GPUs capable of that.
We have a dedicated page and tracker.
Typical applications
Examples of where this library is meant to fit:
Very big XML files
This library allows serializing XML files into a binary format for fast navigation and information linking.
It is very easy to do it once, and then load your terabyte big XML as a memory mapped file.
Since nodes are random accessible via fully relocatable addresses, you will not get the constant penalties of page misses for each nested layer you need to visit.
Patches and annotations
Annotating the tree, or even adding small patches on a huge tree can be quite easy[^3]; since addresses are all relative and stable, it is trivial to share your annotations or patches with others.
Efficient tree building
Tree building is not heap-allocating each node individually, and strings are unescaped in place when parsing a source XML file, so there are no expensive memory allocation needed for that.
But why?
You can find a FAQ page with some questions being answered. For all the others just ask :).
External dependencies
This library is fully standalone (aside from the C/C++ standard libraries).
However, examples, tests, optional utilities and benchmarks have some dependencies:
- mio a simple way to handle memory mapped files, and pretty much the intended way to use
vs-xmldownstream. - nanobench to perform benchmarks.
- pugixml since it is the one I am testing against in benchmarks; these two libraries are very different in scope, so comparative benchmarks can only be marginal and not very useful.
Also, parts of the standard library can be replaced to gain some sweet benchmarking numbers (or to gain additional functionality):
- fmt as an optional replacement of std::format and std::print, as their performance by comparison are trash.
- gtl as an optional replacement for some containers in the standard C++ library, with focus on performance and a serializable memory representation.
Licence
This library is released as LGPL3.0.
All documentation is under CC4.0 Attribution Share-Alike.
Examples are CC0, unless something else is specified, but this does not cover datasets for which you will have to individually check.
[^1]: XML 1.0 is covered as a best-effort, but there will be small things where either the official XML standard or this implementation is going to be incompatible or a superset.
For more information on compatibility and supported features, please check here where they are being tracked.
[^2]: Namespaces are supported in the sense that the namespace is split from the element or attribute name if present, but its handling, validation or whatever is left to the user.
[^3]: However, using such patches would require a downstream implementation of wrapper classes.
Owner
- Name: lazy-eggplant
- Login: lazy-eggplant
- Kind: organization
- Repositories: 1
- Profile: https://github.com/lazy-eggplant
Citation (CITATION.cff)
cff-version: 1.2.0
title: vs.xml
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: lazy-eggplant
- given-names: karurochari
affiliation: lazy-eggplant
email: public@karurochari.com
repository-code: 'https://github.com/lazy-eggplant/vs.xml'
url: 'https://lazy-eggplant.github.io/vs.xml/'
abstract: >-
XML parser and tree builder for mostly static and
relocatable schema-less documents.
keywords:
- xml
- parser
- tree-builder
license: LGPL-3.0-only
GitHub Events
Total
- Create event: 15
- Release event: 9
- Issues event: 24
- Watch event: 6
- Delete event: 10
- Issue comment event: 15
- Push event: 303
- Pull request event: 13
- Fork event: 1
Last Year
- Create event: 15
- Release event: 9
- Issues event: 24
- Watch event: 6
- Delete event: 10
- Issue comment event: 15
- Push event: 303
- Pull request event: 13
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Karuro Chari | k****i | 126 |
| checkroom | p****c@k****m | 16 |
| KaruroChori | K****i | 1 |
| Dwayne Robinson | f****r@h****m | 1 |
| Karuro Chari | s****g@w****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 11
- Total pull requests: 1
- Average time to close issues: 3 days
- Average time to close pull requests: 31 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.82
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 11
- Pull requests: 1
- Average time to close issues: 3 days
- Average time to close pull requests: 31 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.82
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- KaruroChori (20)
Pull Request Authors
- KaruroChori (7)
- fdwr (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- actions/checkout v4 composite
- peaceiris/actions-gh-pages v4 composite
- actions/checkout v4 composite
- softprops/action-gh-release v2 composite