https://github.com/adamrtalbot/tf-azure-batch-nextflow
A Terraform script for creating an Azure Batch pool compatible with Nextflow.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
A Terraform script for creating an Azure Batch pool compatible with Nextflow.
Basic Info
- Host: GitHub
- Owner: adamrtalbot
- Language: HCL
- Default Branch: main
- Size: 82 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Azure Batch Pool for Nextflow
This Terraform configuration creates an Azure Batch pool optimized for running Nextflow workflows. The pool is configured with Ubuntu containers and includes the necessary tools for Nextflow execution.
Description
This module creates an Azure Batch pool with:
- Docker-compatible nodes (Microsoft DSVM Ubuntu 22.04 LTS by default)
- Automatic scaling based on pending tasks with a 5-minute evaluation interval
- Maximum tasks per node set to match the VM's CPU core count
- Pre-installed azcopy for efficient data transfer using the startTask
- Auto-scaling formula that:
- Deploys 1 node initially
- Scales based on pending tasks
- Scales down to 50% when idle
- Respects maximum pool size limit
Usage
Create a terraform.tfvars file with your variables:
Minimal example
Here is a minimal example of the terraform.tfvars file:
```terraform
Minimal example
resourcegroupname = "mybatchaccountresourcegroup" batchaccountname = "mybatchaccount" batchpoolname = "mypool" ```
If you want to add the compute pool to Seqera Platform, you can set the following variables:
terraform
create_seqera_compute_env = true
seqera_api_endpoint = "https://cloud.your-seqera.io/api"
seqera_access_token = "eyJYOURACCESSTOKENHERE="
seqera_workspace_id = "1234567890"
seqera_work_dir = "az://azure-blob-container-name"
seqera_credentials_name = "azure-creds"
Full example
Here is a more complete example of the terraform.tfvars file, which also:
- Uses a smaller VM size
- Adds a managed identity to the pool for Entra authentication
- Allows the pool to access a private container registry
- Attaches the compute pool to a specific subnet
- Installs a more recent version of azcopy from microsoft
- Adds the compute pool to Seqera Platform
- Uses the autopool feature to allow Nextflow to create pools dynamically
- Adds a pre and post run script to the compute pool
```terraform
Required Azure details
resourcegroupname = "myresourcegroup" batchaccountname = "mybatchaccount"
Required Batch Pool details
batchpoolname = "mypool" vmsize = "StandardE2dv5" minpoolsize = 1 maxpool_size = 2
Required VM image configuration
vmimagepublisher = "microsoft-dsvm" vmimageoffer = "ubuntu-hpc" vmimagesku = "2204" vmimageversion = "latest" nodeagentsku_id = "batch.node.ubuntu 22.04"
Start task configuration, use to install the most recent version of azcopy
starttaskcommandline = "bash -c \"tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZBATCHNODESHAREDDIR/bin/ && cp azcopy*/azcopy $AZBATCHNODESHAREDDIR/bin/\"" starttaskresourcefiles = [ { url = "https://github.com/Azure/azure-storage-azcopy/releases/download/v10.28.1/azcopylinuxamd6410.28.1.tar.gz" filepath = "azcopy.tar.gz" } ] starttaskelevationlevel = "NonAdmin" starttask_scope = "Pool"
Optional networking configuration
subnetid = "/subscriptions/<subscriptionid>/resourceGroups/
Optional managed identity configuration
managedidentityname = "managed-identity-name" managedidentityresource_group = "managed-identity-resource-group"
Optional container registries configuration
Can use:
1) username AND password
2) identity_id
3) usemanagedidentity = true (pool's managed identity will be used)
containerregistries = [ { registryserver = "my-registry-server-1.azurecr.io" username = "my-username" password = "my-password" }, { registryserver = "my-registry-server-2.azurecr.io" usemanagedidentity = true } ] ```
You can configure additional compute environment settings to Seqera Platform via these variables:
terraform
create_seqera_compute_env = true
seqera_api_endpoint = "https://cloud.your-seqera.io/api"
seqera_access_token = "eyJYOURACCESSTOKENHERE="
seqera_workspace_id = "1234567890"
seqera_work_dir = "az://azure-blob-container-name"
seqera_credentials_name = "azure-creds"
seqera_pre_run_script = <<-EOT
echo 'Hello, world!'
EOT
seqera_post_run_script = <<-EOT
echo 'Goodbye, world!'
EOT
seqera_nextflow_config = <<-EOT
process.queue = "auto"
process.machineType = "Standard_D*d_v5,Standard_E*d_v5"
azure.batch.allowPoolCreation = true
azure.batch.autoPoolMode = true
azure.batch.pools.auto.autoScale = true
azure.batch.pools.auto.vmCount = 0
azure.batch.pools.auto.maxVmCount = 12
azure.batch.pools.auto.lowPriority = true
azure.batch.pools.auto.virtualNetwork = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/virtualNetworks/<vnet_name>/subnets/<subnet_name>"
EOT
Run terraform init and terraform apply to create the Batch pool. You should see the pool created in the Azure portal.
[!NOTE] For multi-line strings like
seqera_pre_run_script,seqera_post_run_script, andseqera_nextflow_config, you must use heredoc syntax (<<-EOTandEOT) as shown in the example above. See Terraform documentation here for more information.
Requirements
| Name | Version | |------|---------| | terraform | >= 1.0 | | azurerm | ~> 3.0 | | restapi | ~> 1.18 |
Providers
| Name | Version | |------|---------| | azurerm | 3.117.0 | | restapi | 1.20.0 | | terraform | n/a |
Modules
No modules.
Resources
| Name | Type | |------|------| | azurermbatchpool.pool | resource | | restapiobject.seqeracompute_env | resource | | terraformdata.computeenv_name | resource | | terraformdata.credentialsid | resource | | terraformdata.managedidentity_id | resource | | terraformdata.nextflowconfig | resource | | terraformdata.postrun_script | resource | | terraformdata.prerun_script | resource |
Inputs
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| batch_account_name | Name of the existing Batch account | string | n/a | yes |
| batch_pool_name | Name of the Batch pool to be created | string | n/a | yes |
| container_registries | List of container registries to be used in the Batch pool's container configuration. For each registry, provide either username+password OR set use_managed_identity to true. When use_managed_identity is true, the pool's managed identity will be used. |
list(object({
registryserver = string
username = optional(string)
password = optional(string)
identityid = optional(string)
usemanagedidentity = optional(bool, false)
})) | [] | no |
| <a name="inputcreateseqeracomputeenv"> create_seqera_compute_env | Whether to create a seqera compute environment | bool | false | no |
| <a name="inputmanagedidentityname"> managed_identity_name | Name of the managed identity to use with Azure Batch | string | "nextflow-id" | no |
| managed_identity_resource_group | Resource group containing the managed identity | string | null | no |
| max_pool_size | Maximum number of VMs in the pool | number | 8 | no |
| min_pool_size | Minimum number of VMs in the pool | number | 0 | no |
| node_agent_sku_id | SKU of the node agent. Must be compatible with the VM image | string | "batch.node.ubuntu 22.04" | no |
| resource_group_name | Name of the resource group of the Azure Batch account | string | n/a | yes |
| seqera_access_token | Seqera API access token which must be generated from the Seqera Platform UI. | string | null | no |
| seqera_api_endpoint | Seqera API endpoint URL. | string | "https://api.cloud.seqera.io" | no |
| seqera_compute_env_name | Name of the Seqera compute environment. Defaults to batch_pool_name if not specified | string | null | no |
| seqera_credentials_name | Name of the credentials in the workspace | string | null | no |
| seqera_nextflow_config | Optional Nextflow config content to be used in the compute environment. Can be a multi-line string using heredoc syntax. | string | null | no |
| seqera_post_run_script | Optional script to run after each task execution. Can be a multi-line string using heredoc syntax. | string | null | no |
| seqera_pre_run_script | Optional script to run before each task execution. Can be a multi-line string using heredoc syntax. | string | null | no |
| seqera_work_dir | Work directory for the Seqera compute environment which is typically an Azure Blob Storage container. Must start with 'az://' | string | null | no |
| seqera_workspace_id | Seqera workspace ID where the compute environment will be created. Can by looking at the list of workspaces within an organization on the Seqera Platform. | number | null | no |
| start_task_command_line | Command line to run on the start task | string | "bash -c \"tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy*/azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/\"" | no |
| start_task_elevation_level | Elevation level for the start task | string | "NonAdmin" | no |
| start_task_resource_files | URL to download azcopy binary | list(object({
url = string
filepath = string
})) | [| no | | start_task_scope | Scope for the start task |
{
"filepath": "azcopy",
"url": "https://nf-xpack.seqera.io/azcopy/linuxamd6410.8.0/azcopy"
}
]
string | "Pool" | no |
| subnet_id | Optional ID of the subnet to connect the pool to | string | null | no |
| vm_image_offer | Offer of the VM image | string | "ubuntu-hpc" | no |
| vm_image_publisher | Publisher of the VM image | string | "microsoft-dsvm" | no |
| vm_image_sku | SKU of the VM image | string | "2204" | no |
| vm_image_version | Version of the VM image | string | "latest" | no |
| vm_size | Size of the VM to use in the Batch pool | string | "Standard_E16d_v5" | no |
Outputs
| Name | Description | |------|-------------| | batch_pool_id | The ID of the Azure Batch pool | | batch_pool_name | The name of the Azure Batch pool | | credentials_id | The ID of the credentials | | managed_identity_client_id | The client ID of the managed identity | | seqera_compute_env_id | The ID of the Tower compute environment | <!-- ENDTFDOCS -->
Owner
- Name: Adam Talbot
- Login: adamrtalbot
- Kind: user
- Location: Warwick, UK
- Company: @seqeralabs
- Twitter: adamrtalbot
- Repositories: 48
- Profile: https://github.com/adamrtalbot
Bioinformatics Engineer at @seqeralabs
GitHub Events
Total
- Issues event: 5
- Delete event: 3
- Push event: 16
- Pull request review event: 5
- Pull request review comment event: 2
- Pull request event: 15
- Create event: 9
Last Year
- Issues event: 5
- Delete event: 3
- Push event: 16
- Pull request review event: 5
- Pull request review comment event: 2
- Pull request event: 15
- Create event: 9
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 3
- Total pull requests: 9
- Average time to close issues: 6 days
- Average time to close pull requests: 13 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 9
- Average time to close issues: 6 days
- Average time to close pull requests: 13 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- adamrtalbot (3)
Pull Request Authors
- adamrtalbot (9)