func-adl.ast

Construct hierarchical data queries using SQL-like concepts in python

https://github.com/iris-hep/func_adl

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    4 of 6 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Construct hierarchical data queries using SQL-like concepts in python

Basic Info
  • Host: GitHub
  • Owner: iris-hep
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 597 KB
Statistics
  • Stars: 8
  • Watchers: 7
  • Forks: 4
  • Open Issues: 18
  • Releases: 71
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

func_adl

Construct hierarchical data queries using SQL-like concepts in python.

GitHub Actions Status Code Coverage

PyPI version conda-forge version Supported Python versions

func_adl Uses an SQL like language, and extracts data and computed values from a ROOT file or an ATLAS xAOD file and returns them in a columnar format. It is currently used as a central part of two of the ServiceX transformers.

This is the base package that has the backend-agnostic code to query hierarchical data. In all likelihood you will want to install one of the following packages:

  • funcadlxAOD: for running on an ATLAS & CMS experiment xAOD file hosted in ServiceX
  • funcadluproot: for running on flat root files
  • func_adl.xAOD.backend: for running on a local file using docker

See the documentation for more information on what expressions and capabilities are possible in each of these backends.

Captured Variables

Python supports closures in lambda values and functions, as well as enum's and class constants. In all cases these are captured and injected as the resolved constant.

This library will resolve those closures at the point where the select method is called. For example (where ds is a dataset):

python met_cut = 40 good_met_expr = ds.Where(lambda e: e.met > met_cut).Select(lambda e: e.met) met_cut = 50 good_met = good_met_expr.value()

The cut will be applied at 40, because that was the value of met_cut when the Where function was called. This will also work for variables captured inside functions.

Captured Functions

It is possible to capture simple functions as well.

```python def good_jet(jet): return jet.pt() > 40.0

goodmetexpr = dsofjets.Where(lambda jet: good_jet(jet)) ```

Simple means that you must restrict the function to a single return statement. No multi-statement functions are currently possible, even if they reduce to a single statement! Also, a recursive function will give you undefined (and likely ugly) behavior.

Syntatic Sugar

There are several python expressions and idioms that are translated behind your back to func_adl. Note that these must occur inside one of the ObjectStream method's lambda functions like Select, SelectMany, or Where.

|Name | Python Expression | func_adl Translation | --- | --- | --- | |List Comprehension | [j.pt() for j in jets] | jets.Select(lambda j: j.pt()) | |List Comprehension | [j.pt() for j in jets if abs(j.eta()) < 2.4] | jets.Where(lambda j: abs(j.eta()) < 2.4).Select(lambda j: j.pt()) | | Data Classes
(typed) | @dataclass
class my_data:
x: ObjectStream[Jets]

Select(lambda e: my_data(x=e.Jets()).x) | Select(lambda e: {'x': e.Jets()}.x) | | Named Tuple
(typed) | class my_data(NamedTuple):
x: ObjectStream[Jets]

Select(lambda e: my_data(x=e.Jets()).x) | Select(lambda e: {'x': e.Jets()}.x) | |List Membership|p.absPdgId() in [35, 51]|p.absPdgId() == 35 or p.absPdgId() == 51|

Note: Everything that goes for a list comprehension also goes for a generator expression.

Extensibility

There are two several extensibility points:

  • EventDataset should be sub-classed to provide an executor.
  • EventDataset can use Python's type hinting system to allow for editors and other intelligent typing systems to type check expressions. The more type data present, the more the system can help.
  • Define a function that can be called inside a LINQ expression
  • Define new stream methods
  • It is possible to insert a call back at a function or method call site that will allow for modification of the ObjectStream or the call site's ast.

EventDataSet

An example EventDataSet:

python class events(EventDataset): async def execute_result_async(self, a: ast.AST, title: Optional[str] = None): await asyncio.sleep(0.01) return a

and some func_adl code that uses it:

python r = (events() .SelectMany(lambda e: e.Jets('jets')) .Select(lambda j: j.eta()) .value())

  • When the .value() method is invoked, the execute_result_async with a complete ast representing the query is called. This is the point that one would send it to the backend to actually be processed.
  • Normally, the constructor of events would take in the name of the dataset to be processed, which could then be used in execute_result_async.

Typing EventDataset

A minor change to the declaration above, and no change to the query:

```python class dd_jet: def pt(self) -> float: ...

def eta(self) -> float:
    ...

class ddevent: def Jets(self, bank: str) -> Iterable[ddjet]: ...

def EventNumber(self, bank='default') -> int
    ...

class events(EventDataset[ddevent]): async def executeresult_async(self, a: ast.AST, title: Optional[str] = None): await asyncio.sleep(0.01) return a ```

This is not required, but when this is done:

  • Editors that use types to give one a list of options/guesses will now light up as long as they have reasonable type-checking built in.
  • If a required argument is missed, an error will be generated
  • If a default argument is missed, it will be automatically filled in.

It should be noted that the type and expression follower is not very sophisticated! While it can follow method calls, it won't follow much else!

The code should work find in python 3.11 or if from __future__ import annotations is used.

Type-based callbacks

By adding a function and a reference in the type system, arbitrary code can be executed during the traversing of the func_adl. Keeping the query the same and the events definition the same, we can add the info directly to the python type declarations using a decorator for a class definition:

```python from func_adl import ObjectStream from typing import TypeVar

Generic type is required in order to preserve type checkers ability to see

changes in the type

T = TypeVar('T')

def addmdfor_type(s: ObjectStream[T], a: ast.Call) -> Tuple[ObjectStream[T], ast.AST]: return s.MetaData({'hi': 'there'}), a

@funcadlcallback(addmdfortype) class ddevent: def Jets(self, bank: str) -> Iterable[dd_jet]: ... ```

  • When the .Jets() method is processed, the add_md_for_type is called with the current object stream and the ast.
  • add_md_for_type here adds metadata and returns the updated stream and ast.
  • Nothing prevents the function from parsing the AST, removing or adding arguments, adding more complex metadata, or doing any of this depending on the arguments in the call site.

Parameterized method calls

These are a very special form of callback that were implemented to support things like inter-op for templates in C++. It allows you to write something like:

python result = (ds .SelectMany(lambda e: e.Jets()) .Select(lambda j: j.getAttribute[float]('moment0')) .AsAwkward('moment0') )

Note the [float] in the call to getAttribute. This can only happen if the property getAttribute in the Jet class is marked with the decorator func_adl_parameterized_call:

```python T = TypeVar('T') def mycallback(s: ObjectStream[T], a: ast.Call, param1) -> Tuple[ObjectStream[T], ast.AST, Type]: ...

class Jet: @funcadlparameterized_call() @property def getAttribute(self): ... ```

Here, param_1 will be called with set to float. Note that this means at the time when this is called the parameterized values must resolve to an actual value - they aren't converted to C++. In this case, the my_callback could inject MetaData to build a templated call to getAttribute. The tuple that my_callback returns is the same as for add_md_for_type above - except that the third parameter must return the return type of the call.

If more than one argument is used (j.getAttribute['float','int'])['moment0']), then param_1 is a tuple with two items.

Function Definitions

It is useful to have functions that can be called in the backend directly - or use a function call to artificially insert something into the func_adl query stream (like MetaData). For example, the C++ backend uses this to insert inline-C++ code. The func_adl_callable decorator is used to do this:

```python def MySqrtProcessor(s: ObjectStream[T], a: ast.Call) -> Tuple[ObjectStream[T], ast.Call]: 'Can add items to the object stream' news = s.MetaData({'j': 'funcstuff'}) return new_s, a

Declare the typing and name of the function to func_adl

@funcadlcallable(MySqrtProcessor) def MySqrt(x: float) -> float: ...

r = (events() .SelectMany(lambda e: e.Jets('jets')) .Select(lambda j: MySqrt(j.eta())) .value()) ```

In the above sample, the call to MySqrt will be passed back to the backend. However, the MetaData will be inserted into the stream before the call. One can use C++ do define the MySqrt function (or similar).

Note that if MySqrt is defined always in the backend with no additional data needed, one can skip the MySqrtProcessor in the decorator call.

Adding new Collection API's

Functions like First should not be present in ObjectStream as that is the top level set of definitions. However, inside the event context, they make a lot of sense. The type following code needs a way to track these (the type hint system needs no modification, just declare your collections in your Event object appropriately).

For examples, see the test_type_based_replacement file. The class-level decorator is called register_func_adl_os_collection.

Development

After a new release has been built and passes the tests you can release it by creating a new release on github. An action that runs when a release is "created" will send it to pypi.

Citation

The preferred BibTeX entry for citation of func_adl includes both the software repository and the EPJ Web Conf. paper:

```bibtex @software{funcadl, author = {Gordon Watts}, title = "{func_adl}", url = {https://github.com/iris-hep/funcadl} }

@article{Proffitt:2021wfh, author = "Proffitt, Mason and Watts, Gordon", title = "{FuncADL: Functional Analysis Description Language}", eprint = "2103.02432", archivePrefix = "arXiv", primaryClass = "physics.data-an", doi = "10.1051/epjconf/202125103068", journal = "EPJ Web Conf.", volume = "251", pages = "03068", year = "2021" } ```

Owner

  • Name: IRIS-HEP
  • Login: iris-hep
  • Kind: organization

Institute for Research and Innovation in Software for High Energy Physics

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Please cite the following works when using this software."
type: software
authors:
- family-names: "Watts"
  given-names: "Gordon"
  orcid: "https://orcid.org/0000-0002-0753-7308"
  affiliation: "University of Washington"
title: "func_adl"
repository-code: "https://github.com/iris-hep/func_adl"
url: "https://github.com/iris-hep/func_adl"
license: "MIT"
references:
  - type: article
    authors:
    - family-names: "Proffitt"
      given-names: "Mason"
      orcid: "https://orcid.org/0000-0003-0323-8252"
      affiliation: "University of Washington"
    - family-names: "Watts"
      given-names: "Gordon"
      orcid: "https://orcid.org/0000-0002-0753-7308"
      affiliation: "University of Washington"
    title: "FuncADL: Functional Analysis Description Language"
    doi: 10.1051/epjconf/202125103068
    url: https://arxiv.org/abs/2103.02432
    year: 2021
    volume: 251
    pages: 03068
    journal: "EPJ Web Conf."

GitHub Events

Total
  • Create event: 25
  • Release event: 9
  • Issues event: 49
  • Watch event: 2
  • Delete event: 16
  • Issue comment event: 42
  • Push event: 84
  • Pull request review comment event: 4
  • Pull request review event: 5
  • Pull request event: 45
  • Fork event: 2
Last Year
  • Create event: 25
  • Release event: 9
  • Issues event: 49
  • Watch event: 2
  • Delete event: 16
  • Issue comment event: 42
  • Push event: 84
  • Pull request review comment event: 4
  • Pull request review event: 5
  • Pull request event: 45
  • Fork event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 162
  • Total Committers: 6
  • Avg Commits per committer: 27.0
  • Development Distribution Score (DDS): 0.444
Top Committers
Name Email Commits
Gordon Watts g****n@g****t 90
Gordon Watts g****s@u****u 60
Matthew Feickert m****t@c****h 8
Mason Proffitt m****t@c****h 2
Baidyanath Kundu k****9@g****m 1
David Liu d****u@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 92
  • Total pull requests: 107
  • Average time to close issues: 5 months
  • Average time to close pull requests: 18 days
  • Total issue authors: 5
  • Total pull request authors: 6
  • Average comments per issue: 0.7
  • Average comments per pull request: 1.07
  • Merged pull requests: 93
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 26
  • Pull requests: 45
  • Average time to close issues: 16 days
  • Average time to close pull requests: 10 days
  • Issue authors: 3
  • Pull request authors: 3
  • Average comments per issue: 0.23
  • Average comments per pull request: 1.16
  • Merged pull requests: 38
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gordonwatts (78)
  • masonproffitt (10)
  • alexander-held (2)
  • matthewfeickert (2)
  • BenGalewsky (1)
Pull Request Authors
  • gordonwatts (93)
  • matthewfeickert (11)
  • sudo-panda (2)
  • BenGalewsky (1)
  • RogerJanusiak (1)
  • masonproffitt (1)
Top Labels
Issue Labels
bug (39) enhancement (38) documentation (4) wontfix (1) good first issue (1)
Pull Request Labels
enhancement (41) bug (30) documentation (3) codex (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 409 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 2
  • Total versions: 54
  • Total maintainers: 1
pypi.org: func-adl.ast

Functional Analysis Description Language - Backend AST Manipulation Packages

  • Versions: 54
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 409 Last month
Rankings
Dependent packages count: 4.7%
Dependent repos count: 11.6%
Downloads: 12.6%
Average: 13.0%
Forks count: 16.9%
Stargazers count: 19.3%
Maintainers (1)
Last synced: 7 months ago

Dependencies

setup.py pypi
  • make-it-sync *
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
.github/workflows/pypi.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish v1.3.1 composite
pyproject.toml pypi