https://github.com/alvarocavalcante/airflow-custom-deferrable-dataflow-operator
Start your Dataflow jobs execution directly from the Triggerer without going to the Worker!
https://github.com/alvarocavalcante/airflow-custom-deferrable-dataflow-operator
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Repository
Start your Dataflow jobs execution directly from the Triggerer without going to the Worker!
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Airflow Custom Deferrable Dataflow Operator
Trigger Different: Cut Your AirFlow Costs By Starting From Triggerer!
Use this simple Airflow Operator to start your Dataflow jobs execution directly from the Triggerer without going to the Worker!
Contents
How It Works
The main idea of this approach is to start the task instance execution directly on the Triggerer component, bypassing the worker entirely.
This strategy is effective because, in this case, the only action the operator performs is making an HTTP request to start the external processing service and waiting for the job to complete.
For this reason, we can leverage the async design of the Triggerer for this execution, significantly reducing resource consumption. By using this architecture, the Airflow task execution proccess will be something like the following:

To know more about how the tool works, check out the Medium article.
Installation
The installation process will depend on your cloud provider or how you have set up your environment.
Regarding Google Cloud Composer, for example, the DAGs folder is not synchronized with the Airflow Triggerer, as stated in the documentation.
Consequently, just uploading your code to the DAGs folder will not work, and you'll likely face an error like this: ImportError: Module "PACKAGE_NAME" does not define a "CLASS_NAME" attribute/class
In this case, it's necessary to import the missing code from PyPI, meaning that you'll need to install the operator/trigger as a new library.
To do so, you can use the following command:
bash
pip install custom-deferrable-dataflow-operator
Usage
After installing the library, you can successfully import and use the operator in your Airflow DAGs, as shown below:
```python from deferrabledataflowoperator import DeferrableDataflowOperator
dataflowtriggererjob = DeferrableDataflowOperator(
triggerkwargs={
"projectid": GCPPROJECTID,
"region": GCPREGION,
"body": {
"jobname": MYJOBNAME,
"parameters": {
"dataflow-parameters": MYPARAMETERS
},
"environment": {**dataflowenvvars},
"containerspecgcspath": TEMPLATEGCSPATH,
}
},
startfromtrigger=True,
taskid=MYTASKID
)
In thetriggerkwargsparameter, it's important to specify your GCP project ID and region. Thebody``` parameter, on the other hand, should contain all the relevant information for your Dataflow job, as stated in the official documentation.
Contributing
This project is open to contributions! If you want to collaborate to improve the operator, please follow these steps:
- Open a new issue to discuss the feature or bug you want to address.
- Once approved, fork the repository and create a new branch.
- Implement the changes.
- Create a pull request with a detailed description of the changes.
Owner
- Name: Alvaro Leandro Cavalcante Carneiro
- Login: AlvaroCavalcante
- Kind: user
- Location: São Paulo, SP
- Company: Bank Master
- Website: https://alvarocavalcante.github.io
- Repositories: 6
- Profile: https://github.com/AlvaroCavalcante
Master's degree student and Data Engineer at Bank Master.
GitHub Events
Total
- Release event: 1
- Push event: 1
- Create event: 3
Last Year
- Release event: 1
- Push event: 1
- Create event: 3
Packages
- Total packages: 1
-
Total downloads:
- pypi 78 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 10
- Total maintainers: 1
pypi.org: custom-deferrable-dataflow-operator
Start your Dataflow jobs execution directly from the Triggerer without going to the Worker.
- Homepage: https://github.com/AlvaroCavalcante/airflow-custom-deferrable-dataflow-operator
- Documentation: https://custom-deferrable-dataflow-operator.readthedocs.io/
- License: Apache License 2.0
-
Latest release: 1.0.0
published 12 months ago
Rankings
Maintainers (1)
Dependencies
- apache-airflow >=2.10.0
- google-cloud >=0.33.0