Configuration Reference¶
All plugin settings live in conf/<env>/azureml.yml. The file is parsed into KedroAzureMLConfig. For dataset configuration in catalog.yml, see the Datasets reference.
Top-level structure¶
workspace: # required
compute: # required
execution: # optional
schedules: # optional
jobs: # optional
workspace¶
Named Azure ML workspace definitions. A __default__ entry is required.
workspace:
__default__:
subscription_id: "00000000-0000-0000-0000-000000000000"
resource_group: "rg-dev"
name: "aml-dev"
prod:
subscription_id: "11111111-1111-1111-1111-111111111111"
resource_group: "rg-prod"
name: "aml-prod"
Each workspace entry (WorkspaceConfig) has the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
subscription_id |
string | yes | Azure subscription ID |
resource_group |
string | yes | Azure resource group name |
name |
string | yes | Azure ML workspace name |
Jobs reference a workspace by name via their workspace field. The __default__ is used when no workspace is specified. See Configure multiple workspaces for a walkthrough.
compute¶
Named compute cluster definitions. A __default__ entry is required.
Each compute entry (ClusterConfig) has the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
cluster_name |
string | yes | Name of the Azure ML compute cluster |
Jobs reference a compute entry by name via their compute field.
Tag-based routing¶
Kedro node tags can route nodes to specific compute clusters. When a node has a tag that matches a named compute entry, that entry is merged with __default__:
A node tagged gpu in your Kedro pipeline will run on gpu-cluster. Nodes without a matching tag fall back to __default__. Fields from the tagged entry override __default__ fields.
execution¶
Code packaging and container settings. All fields are optional.
| Field | Default | Description |
|---|---|---|
environment |
null |
Azure ML environment name (e.g. my-env@latest or my-env:3) |
code_directory |
null |
Local directory to upload as a code snapshot; null disables code upload |
working_directory |
null |
Working directory inside the compute container. Set this when your Azure ML environment expects code at a specific path (e.g. /home/kedro). When null, Azure ML uses its default working directory. |
The combination of environment and code_directory determines the deployment flow. When code_directory is set (e.g. "."), the plugin uploads a snapshot of your project and runs it inside the environment (code flow). When code_directory is null, the plugin expects the code to already be baked into the Docker image referenced by environment (image flow). See Deploy from CI/CD for guidance on choosing between the two.
schedules¶
Reusable named schedule definitions. Jobs reference them by name.
Each schedule entry has exactly one of cron or recurrence. See Schedule pipelines for end-to-end setup.
cron¶
| Field | Default | Description |
|---|---|---|
expression |
required | Cron expression (e.g. "0 2 * * *") |
time_zone |
"UTC" |
IANA time zone name (e.g. "Europe/London") |
start_time |
null |
ISO 8601 start time |
end_time |
null |
ISO 8601 end time |
recurrence¶
| Field | Default | Description |
|---|---|---|
frequency |
required | Recurrence unit: "minute", "hour", "day", "week", or "month" |
interval |
required | Number of frequency units between runs |
time_zone |
"UTC" |
IANA time zone name |
start_time |
null |
ISO 8601 start time |
end_time |
null |
ISO 8601 end time |
schedule.hours |
null |
Hours of the day to trigger |
schedule.minutes |
null |
Minutes of the hour to trigger |
schedule.week_days |
null |
Days of the week to trigger (e.g. ["Monday", "Friday"]) |
jobs¶
Named job definitions. Each job maps a Kedro pipeline to an Azure ML pipeline submission.
jobs:
training:
pipeline:
pipeline_name: "__default__"
tags: ["training"]
experiment_name: "training-experiment"
display_name: "Daily training"
compute: "gpu"
workspace: "prod"
description: "Run the training pipeline on GPU cluster"
schedule: "business_hours"
retry:
max_retries: 3
timeout: 3600
| Field | Default | Description |
|---|---|---|
pipeline |
required | Pipeline selection and filter options (see below) |
workspace |
null |
Named workspace entry; falls back to __default__ |
experiment_name |
null |
Azure ML experiment name |
display_name |
null |
Display name shown in Azure ML Studio |
compute |
null |
Named compute entry; falls back to __default__ |
schedule |
null |
Inline ScheduleConfig, named schedule string, or null for ad-hoc |
retry |
null |
Retry settings applied to every step (see below) |
description |
null |
Human-readable job description |
retry¶
Optional retry settings applied to every command step in the job. Maps to azure.ai.ml.entities.RetrySettings.
| Field | Default | Description |
|---|---|---|
max_retries |
required | Maximum number of retry attempts for failed steps (must be >= 1) |
timeout |
null |
Per-attempt timeout in seconds, or null for no limit |
pipeline filter options¶
These fields correspond to the parameters of Kedro's Pipeline.filter() method.
| Field | Default | Description |
|---|---|---|
pipeline_name |
"__default__" |
Kedro pipeline name |
from_nodes |
null |
Start from these nodes |
to_nodes |
null |
Run up to these nodes |
node_names |
null |
Run only these specific nodes |
from_inputs |
null |
Start from nodes that produce these datasets |
to_outputs |
null |
Run up to nodes that produce these datasets |
node_namespaces |
null |
Filter by namespace |
tags |
null |
Filter by tag |
Environment variables¶
The following environment variables are set automatically by the plugin during remote execution. They are reserved and should not be set directly.
| Variable | Set by | Description |
|---|---|---|
KEDRO_AZUREML_MLFLOW_ENABLED |
Pipeline generator | Set to "1" on each step during remote execution to activate MLflow integration |