https://github.com/adamrtalbot/tf-azure-batch-nextflow

A Terraform script for creating an Azure Batch pool compatible with Nextflow.

https://github.com/adamrtalbot/tf-azure-batch-nextflow

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

A Terraform script for creating an Azure Batch pool compatible with Nextflow.

Basic Info
  • Host: GitHub
  • Owner: adamrtalbot
  • Language: HCL
  • Default Branch: main
  • Size: 82 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Azure Batch Pool for Nextflow

This Terraform configuration creates an Azure Batch pool optimized for running Nextflow workflows. The pool is configured with Ubuntu containers and includes the necessary tools for Nextflow execution.

Description

This module creates an Azure Batch pool with:

  • Docker-compatible nodes (Microsoft DSVM Ubuntu 22.04 LTS by default)
  • Automatic scaling based on pending tasks with a 5-minute evaluation interval
  • Maximum tasks per node set to match the VM's CPU core count
  • Pre-installed azcopy for efficient data transfer using the startTask
  • Auto-scaling formula that:
    • Deploys 1 node initially
    • Scales based on pending tasks
    • Scales down to 50% when idle
    • Respects maximum pool size limit

Usage

Create a terraform.tfvars file with your variables:

Minimal example

Here is a minimal example of the terraform.tfvars file:

```terraform

Minimal example

resourcegroupname = "mybatchaccountresourcegroup" batchaccountname = "mybatchaccount" batchpoolname = "mypool" ```

If you want to add the compute pool to Seqera Platform, you can set the following variables:

terraform create_seqera_compute_env = true seqera_api_endpoint = "https://cloud.your-seqera.io/api" seqera_access_token = "eyJYOURACCESSTOKENHERE=" seqera_workspace_id = "1234567890" seqera_work_dir = "az://azure-blob-container-name" seqera_credentials_name = "azure-creds"

Full example

Here is a more complete example of the terraform.tfvars file, which also:

  • Uses a smaller VM size
  • Adds a managed identity to the pool for Entra authentication
  • Allows the pool to access a private container registry
  • Attaches the compute pool to a specific subnet
  • Installs a more recent version of azcopy from microsoft
  • Adds the compute pool to Seqera Platform
  • Uses the autopool feature to allow Nextflow to create pools dynamically
  • Adds a pre and post run script to the compute pool

```terraform

Required Azure details

resourcegroupname = "myresourcegroup" batchaccountname = "mybatchaccount"

Required Batch Pool details

batchpoolname = "mypool" vmsize = "StandardE2dv5" minpoolsize = 1 maxpool_size = 2

Required VM image configuration

vmimagepublisher = "microsoft-dsvm" vmimageoffer = "ubuntu-hpc" vmimagesku = "2204" vmimageversion = "latest" nodeagentsku_id = "batch.node.ubuntu 22.04"

Start task configuration, use to install the most recent version of azcopy

starttaskcommandline = "bash -c \"tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZBATCHNODESHAREDDIR/bin/ && cp azcopy*/azcopy $AZBATCHNODESHAREDDIR/bin/\"" starttaskresourcefiles = [ { url = "https://github.com/Azure/azure-storage-azcopy/releases/download/v10.28.1/azcopylinuxamd6410.28.1.tar.gz" filepath = "azcopy.tar.gz" } ] starttaskelevationlevel = "NonAdmin" starttask_scope = "Pool"

Optional networking configuration

subnetid = "/subscriptions/<subscriptionid>/resourceGroups//providers/Microsoft.Network/virtualNetworks//subnets/"

Optional managed identity configuration

managedidentityname = "managed-identity-name" managedidentityresource_group = "managed-identity-resource-group"

Optional container registries configuration

Can use:

1) username AND password

2) identity_id

3) usemanagedidentity = true (pool's managed identity will be used)

containerregistries = [ { registryserver = "my-registry-server-1.azurecr.io" username = "my-username" password = "my-password" }, { registryserver = "my-registry-server-2.azurecr.io" usemanagedidentity = true } ] ```

You can configure additional compute environment settings to Seqera Platform via these variables:

terraform create_seqera_compute_env = true seqera_api_endpoint = "https://cloud.your-seqera.io/api" seqera_access_token = "eyJYOURACCESSTOKENHERE=" seqera_workspace_id = "1234567890" seqera_work_dir = "az://azure-blob-container-name" seqera_credentials_name = "azure-creds" seqera_pre_run_script = <<-EOT echo 'Hello, world!' EOT seqera_post_run_script = <<-EOT echo 'Goodbye, world!' EOT seqera_nextflow_config = <<-EOT process.queue = "auto" process.machineType = "Standard_D*d_v5,Standard_E*d_v5" azure.batch.allowPoolCreation = true azure.batch.autoPoolMode = true azure.batch.pools.auto.autoScale = true azure.batch.pools.auto.vmCount = 0 azure.batch.pools.auto.maxVmCount = 12 azure.batch.pools.auto.lowPriority = true azure.batch.pools.auto.virtualNetwork = "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/virtualNetworks/<vnet_name>/subnets/<subnet_name>" EOT

Run terraform init and terraform apply to create the Batch pool. You should see the pool created in the Azure portal.

[!NOTE] For multi-line strings like seqera_pre_run_script, seqera_post_run_script, and seqera_nextflow_config, you must use heredoc syntax (<<-EOT and EOT) as shown in the example above. See Terraform documentation here for more information.

Requirements

| Name | Version | |------|---------| | terraform | >= 1.0 | | azurerm | ~> 3.0 | | restapi | ~> 1.18 |

Providers

| Name | Version | |------|---------| | azurerm | 3.117.0 | | restapi | 1.20.0 | | terraform | n/a |

Modules

No modules.

Resources

| Name | Type | |------|------| | azurermbatchpool.pool | resource | | restapiobject.seqeracompute_env | resource | | terraformdata.computeenv_name | resource | | terraformdata.credentialsid | resource | | terraformdata.managedidentity_id | resource | | terraformdata.nextflowconfig | resource | | terraformdata.postrun_script | resource | | terraformdata.prerun_script | resource |

Inputs

| Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| | batch_account_name | Name of the existing Batch account | string | n/a | yes | | batch_pool_name | Name of the Batch pool to be created | string | n/a | yes | | container_registries | List of container registries to be used in the Batch pool's container configuration. For each registry, provide either username+password OR set use_managed_identity to true. When use_managed_identity is true, the pool's managed identity will be used. |

list(object({
registryserver = string
user
name = optional(string)
password = optional(string)
identityid = optional(string)
use
managedidentity = optional(bool, false)
}))
| [] | no | | <a name="inputcreateseqeracomputeenv"> create_seqera_compute_env | Whether to create a seqera compute environment | bool | false | no | | <a name="inputmanagedidentityname"> managed_identity_name | Name of the managed identity to use with Azure Batch | string | "nextflow-id" | no | | managed_identity_resource_group | Resource group containing the managed identity | string | null | no | | max_pool_size | Maximum number of VMs in the pool | number | 8 | no | | min_pool_size | Minimum number of VMs in the pool | number | 0 | no | | node_agent_sku_id | SKU of the node agent. Must be compatible with the VM image | string | "batch.node.ubuntu 22.04" | no | | resource_group_name | Name of the resource group of the Azure Batch account | string | n/a | yes | | seqera_access_token | Seqera API access token which must be generated from the Seqera Platform UI. | string | null | no | | seqera_api_endpoint | Seqera API endpoint URL. | string | "https://api.cloud.seqera.io" | no | | seqera_compute_env_name | Name of the Seqera compute environment. Defaults to batch_pool_name if not specified | string | null | no | | seqera_credentials_name | Name of the credentials in the workspace | string | null | no | | seqera_nextflow_config | Optional Nextflow config content to be used in the compute environment. Can be a multi-line string using heredoc syntax. | string | null | no | | seqera_post_run_script | Optional script to run after each task execution. Can be a multi-line string using heredoc syntax. | string | null | no | | seqera_pre_run_script | Optional script to run before each task execution. Can be a multi-line string using heredoc syntax. | string | null | no | | seqera_work_dir | Work directory for the Seqera compute environment which is typically an Azure Blob Storage container. Must start with 'az://' | string | null | no | | seqera_workspace_id | Seqera workspace ID where the compute environment will be created. Can by looking at the list of workspaces within an organization on the Seqera Platform. | number | null | no | | start_task_command_line | Command line to run on the start task | string | "bash -c \"tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy*/azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/\"" | no | | start_task_elevation_level | Elevation level for the start task | string | "NonAdmin" | no | | start_task_resource_files | URL to download azcopy binary |
list(object({
url = string
filepath = string
}))
|
[
{
"file
path": "azcopy",
"url": "https://nf-xpack.seqera.io/azcopy/linuxamd6410.8.0/azcopy"
}
]
| no | | start_task_scope | Scope for the start task | string | "Pool" | no | | subnet_id | Optional ID of the subnet to connect the pool to | string | null | no | | vm_image_offer | Offer of the VM image | string | "ubuntu-hpc" | no | | vm_image_publisher | Publisher of the VM image | string | "microsoft-dsvm" | no | | vm_image_sku | SKU of the VM image | string | "2204" | no | | vm_image_version | Version of the VM image | string | "latest" | no | | vm_size | Size of the VM to use in the Batch pool | string | "Standard_E16d_v5" | no |

Outputs

| Name | Description | |------|-------------| | batch_pool_id | The ID of the Azure Batch pool | | batch_pool_name | The name of the Azure Batch pool | | credentials_id | The ID of the credentials | | managed_identity_client_id | The client ID of the managed identity | | seqera_compute_env_id | The ID of the Tower compute environment | <!-- ENDTFDOCS -->

Owner

  • Name: Adam Talbot
  • Login: adamrtalbot
  • Kind: user
  • Location: Warwick, UK
  • Company: @seqeralabs

Bioinformatics Engineer at @seqeralabs

GitHub Events

Total
  • Issues event: 5
  • Delete event: 3
  • Push event: 16
  • Pull request review event: 5
  • Pull request review comment event: 2
  • Pull request event: 15
  • Create event: 9
Last Year
  • Issues event: 5
  • Delete event: 3
  • Push event: 16
  • Pull request review event: 5
  • Pull request review comment event: 2
  • Pull request event: 15
  • Create event: 9

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 3
  • Total pull requests: 9
  • Average time to close issues: 6 days
  • Average time to close pull requests: 13 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 9
  • Average time to close issues: 6 days
  • Average time to close pull requests: 13 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • adamrtalbot (3)
Pull Request Authors
  • adamrtalbot (9)
Top Labels
Issue Labels
Pull Request Labels