Configuration Reference¶

All plugin settings live in conf/<env>/azureml.yml. The file is parsed into KedroAzureMLConfig. For dataset configuration in catalog.yml, see the Datasets reference.

Top-level structure¶

workspace:             # required
compute:               # required
execution:             # optional
schedules:             # optional
jobs:                  # optional

`workspace`¶

Named Azure ML workspace definitions. A __default__ entry is required.

workspace:
  __default__:
    subscription_id: "00000000-0000-0000-0000-000000000000"
    resource_group: "rg-dev"
    name: "aml-dev"
  prod:
    subscription_id: "11111111-1111-1111-1111-111111111111"
    resource_group: "rg-prod"
    name: "aml-prod"

Each workspace entry (WorkspaceConfig) has the following fields:

Field	Type	Required	Description
`subscription_id`	string	yes	Azure subscription ID
`resource_group`	string	yes	Azure resource group name
`name`	string	yes	Azure ML workspace name

Jobs reference a workspace by name via their workspace field. The __default__ is used when no workspace is specified. See Configure multiple workspaces for a walkthrough.

`compute`¶

Named compute cluster definitions. A __default__ entry is required.

compute:
  __default__:
    cluster_name: "cpu-cluster"
  gpu:
    cluster_name: "gpu-cluster"

Each compute entry (ClusterConfig) has the following fields:

Field	Type	Required	Description
`cluster_name`	string	yes	Name of the Azure ML compute cluster

Jobs reference a compute entry by name via their compute field.

Tag-based routing¶

Kedro node tags can route nodes to specific compute clusters. When a node has a tag that matches a named compute entry, that entry is merged with __default__:

compute:
  __default__:
    cluster_name: "cpu-cluster"
  gpu:
    cluster_name: "gpu-cluster"

A node tagged gpu in your Kedro pipeline will run on gpu-cluster. Nodes without a matching tag fall back to __default__. Fields from the tagged entry override __default__ fields.

`execution`¶

Code packaging and container settings. All fields are optional.

execution:
  environment: "my-env@latest"
  code_directory: "."
  working_directory: /home/kedro

Field	Default	Description
`environment`	`null`	Azure ML environment name (e.g. `my-env@latest` or `my-env:3`)
`code_directory`	`null`	Local directory to upload as a code snapshot; `null` disables code upload
`working_directory`	`null`	Working directory inside the compute container. Set this when your Azure ML environment expects code at a specific path (e.g. `/home/kedro`). When `null`, Azure ML uses its default working directory.

The combination of environment and code_directory determines the deployment flow. When code_directory is set (e.g. "."), the plugin uploads a snapshot of your project and runs it inside the environment (code flow). When code_directory is null, the plugin expects the code to already be baked into the Docker image referenced by environment (image flow). See Deploy from CI/CD for guidance on choosing between the two.

`schedules`¶

Reusable named schedule definitions. Jobs reference them by name.

schedules:
  business_hours:
    cron:
      expression: "0 9 * * 1-5"
      time_zone: "Europe/London"

Each schedule entry has exactly one of cron or recurrence. See Schedule pipelines for end-to-end setup.

`cron`¶

Field	Default	Description
`expression`	required	Cron expression (e.g. `"0 2 * * *"`)
`time_zone`	`"UTC"`	IANA time zone name (e.g. `"Europe/London"`)
`start_time`	`null`	ISO 8601 start time
`end_time`	`null`	ISO 8601 end time

`recurrence`¶

Field	Default	Description
`frequency`	required	Recurrence unit: `"minute"`, `"hour"`, `"day"`, `"week"`, or `"month"`
`interval`	required	Number of frequency units between runs
`time_zone`	`"UTC"`	IANA time zone name
`start_time`	`null`	ISO 8601 start time
`end_time`	`null`	ISO 8601 end time
`schedule.hours`	`null`	Hours of the day to trigger
`schedule.minutes`	`null`	Minutes of the hour to trigger
`schedule.week_days`	`null`	Days of the week to trigger (e.g. `["Monday", "Friday"]`)

`jobs`¶

Named job definitions. Each job maps a Kedro pipeline to an Azure ML pipeline submission.

jobs:
  training:
    pipeline:
      pipeline_name: "__default__"
      tags: ["training"]
    experiment_name: "training-experiment"
    display_name: "Daily training"
    compute: "gpu"
    workspace: "prod"
    description: "Run the training pipeline on GPU cluster"
    schedule: "business_hours"
    params:
      lookback_days: 30
    retry:
      max_retries: 3
      timeout: 3600

Field	Default	Description
`pipeline`	required	Pipeline selection and filter options (see below)
`workspace`	`null`	Named workspace entry; falls back to `__default__`
`experiment_name`	`null`	Azure ML experiment name
`display_name`	`null`	Display name shown in Azure ML Studio
`compute`	`null`	Named compute entry; falls back to `__default__`
`schedule`	`null`	Inline `ScheduleConfig`, named schedule string, a list of either (one trigger deployed per entry), or `null` for ad-hoc
`params`	`null`	Job-scoped runtime parameters merged into the pipeline on `compile`, `run`, and `schedule` (see below)
`retry`	`null`	Retry settings applied to every step (see below)
`description`	`null`	Human-readable job description

`params`¶

Optional job-scoped runtime parameters, equivalent to passing --params for that job but stored in config so every compile, run, and schedule of the job picks them up. When a value is also given on the command line, the CLI --params value wins for that key; remaining job-level keys are kept. This lets a job carry stable defaults while still allowing one-off overrides at submission time.

jobs:
  training:
    pipeline:
      pipeline_name: "__default__"
    params:
      lookback_days: 30
      model: "lgbm"

Job factories¶

A jobs key that contains {placeholder} markers is a job factory: a templated job entry, mirroring a Kedro dataset factory. By default the jobs are derived from your pipeline namespaces, the same way a dataset factory's concrete datasets are determined by pipeline node references. You write a few factory patterns, and the concrete jobs come from the namespaces of each factory's pipeline. No target list is required:

jobs:
  # one job per namespace of the `inference` pipeline
  "{region}-{model}-inference":
    schedule: nightly
    pipeline:
      pipeline_name: "inference"
      node_namespaces: ["{region}.{model}"]
  # a more-specific pattern overrides the schedule for one region
  "america-{model}-inference":
    schedule: "hourly"
    pipeline:
      pipeline_name: "inference"
      node_namespaces: ["{region}.{model}"]
  # literal (non-factory) jobs are kept verbatim and take precedence
  snapshot:
    pipeline: {pipeline_name: "snapshot"}

Bindings come from the pipeline. For each factory, the node_namespaces template defines the placeholder names and their namespace depth. The plugin enumerates the distinct namespaces of pipeline_name at that depth and binds the placeholders positionally (so the namespace europe.lgbm binds region=europe, model=lgbm). One job is produced per binding. Adding a variant to your pipelines makes its job appear with no azureml.yml edit. A factory name placeholder that is absent from its node_namespaces template is a configuration error. When node_namespaces holds more than one entry, only the first is the binding axis; the rest are not used for derivation but still render per job as ordinary runtime namespace filters.

Resolution is forward-only. Job names are produced only by rendering placeholders into a pattern; names are never parsed back. When more than one pattern renders the same name, the most-specific one (most literal, non-placeholder characters) supplies the config, so per-region variation such as a different schedule is expressed by a more-specific pattern rather than an override table. Literal (non-factory) jobs take precedence over any pattern.

{placeholder} (factory) and ${...} (OmegaConf) use different syntax and coexist. The namespace alone identifies the job, so no tags filter is needed. Job names use the namespace form of each placeholder verbatim (so europe.lgbm yields europe-lgbm-inference).

kedro azureml run -j <name> renders all bindings (overlaying literal jobs) and looks the requested name up; an unknown name is an error listing the available jobs.
kedro azureml resolve-patterns lists every derived job (see the CLI reference), which is how you discover the names to pass to -j.

There is no separate target list or provider key: the jobs are always derived from the pipeline namespaces, so adding a variant to your pipelines yields its job with no config edit.

For the dataset-factory analogy and why resolution is forward-only, see Job Factories; for a step-by-step recipe, see Define jobs with factories.

`retry`¶

Optional retry settings applied to every command step in the job. Maps to azure.ai.ml.entities.RetrySettings.

retry:
  max_retries: 3
  timeout: 3600

Field	Default	Description
`max_retries`	required	Maximum number of retry attempts for failed steps (must be >= 1)
`timeout`	`null`	Per-attempt timeout in seconds, or `null` for no limit

`pipeline` filter options¶

These fields correspond to the parameters of Kedro's Pipeline.filter() method.

Field	Default	Description
`pipeline_name`	`"__default__"`	Kedro pipeline name
`from_nodes`	`null`	Start from these nodes
`to_nodes`	`null`	Run up to these nodes
`node_names`	`null`	Run only these specific nodes
`from_inputs`	`null`	Start from nodes that produce these datasets
`to_outputs`	`null`	Run up to nodes that produce these datasets
`node_namespaces`	`null`	Filter by namespace
`tags`	`null`	Filter by tag

Environment variables¶

The following environment variables are set automatically by the plugin during remote execution. They are reserved and should not be set directly.

Variable	Set by	Description
`KEDRO_AZUREML_MLFLOW_ENABLED`	Pipeline generator	Set to `"1"` on each step during remote execution to activate MLflow integration

Configuration Reference¶

Top-level structure¶

workspace¶

compute¶

Tag-based routing¶

execution¶

schedules¶

cron¶

recurrence¶

jobs¶

params¶