Getting Started¶

In this tutorial we will take a Kedro project, connect it to Azure ML, and submit a pipeline run to cloud compute. Along the way we will install the plugin, configure a workspace, define a job, and see it running in Azure ML Studio.

Prerequisites¶

Before you begin, make sure you have:

Python 3.11+
An Azure ML workspace with at least one compute cluster
Azure credentials configured (az login)
An Azure ML environment created in your workspace (e.g. my-env@latest)

New to Kedro or Azure ML?

Coming from Azure ML? Kedro is a Python framework for building reproducible data science pipelines. You define nodes (functions) and a catalog (data sources), and Kedro handles execution order and data passing.

Coming from Kedro? Azure ML Pipelines let you run pipeline steps on managed cloud compute. You will need an Azure subscription, a workspace, and a compute cluster.

Step 1: Create a Kedro project¶

We will use the spaceflights-pandas starter:

kedro new --starter=spaceflights-pandas

Follow the prompts, then install the project dependencies:

pipuv

cd spaceflights-pandas
pip install -r requirements.txt

cd spaceflights-pandas
uv sync

Step 2: Install the plugin¶

pipuv

pip install kedro-azureml-pipeline

uv add kedro-azureml-pipeline

Let's verify the installation:

kedro azureml --help

You should see output starting with:

Usage: kedro azureml [OPTIONS] COMMAND [ARGS]...

Step 3: Initialize the configuration¶

The plugin ships two hooks (AzureMLLocalRunHook and MlflowAzureMLHook) that are auto-registered via Python entry points. Kedro automatically discovers them when the package is installed.

From the project root, run:

kedro azureml init

You should see:

Creating conf/base/azureml.yml...
Creating .amlignore...

Notice that two new files appeared:

conf/base/azureml.yml: plugin settings (workspace, compute, jobs)
.amlignore: controls which files are excluded from code uploads (similar to .gitignore)

The plugin supports two deployment flows. Choose the one that fits your setup:

Code upload: the plugin uploads a snapshot of your project to Azure ML on every run. Simplest way to get started.
Pre-built environment: your code is already installed inside the Azure ML environment (Docker image). Faster for large projects since nothing is uploaded.

Step 4: Configure your workspace¶

Open conf/base/azureml.yml and replace the placeholders. The execution section differs depending on which deployment flow you chose:

Code uploadPre-built environment

workspace:
  __default__:
    subscription_id: "<your-subscription-id>"
    resource_group: "<your-resource-group>"
    name: "<your-workspace-name>"

compute:
  __default__:
    cluster_name: "<your-cluster-name>"

execution:
  environment: "<your-aml-environment>@latest"
  code_directory: "."

With code_directory: ".", the plugin snapshots your project directory and uploads it to Azure ML. The .amlignore file controls which files are excluded from the upload.

workspace:
  __default__:
    subscription_id: "<your-subscription-id>"
    resource_group: "<your-resource-group>"
    name: "<your-workspace-name>"

compute:
  __default__:
    cluster_name: "<your-cluster-name>"

execution:
  environment: "<your-aml-environment>@latest"
  working_directory: "/home/kedro_docker"

With no code_directory, nothing is uploaded. The working_directory tells Azure ML where your pre-installed code lives inside the container. Your Docker image must already contain the Kedro project and all dependencies.

Finding your Azure details

In the Azure Portal, open your Azure ML workspace. The Overview page shows the subscription ID, resource group, and workspace name. Compute clusters are listed under Manage > Compute > Compute clusters.

Step 5: Define a job¶

Add a jobs section to the same azureml.yml file:

jobs:
  training:
    pipeline:
      pipeline_name: "__default__"
    experiment_name: "spaceflights-training"
    display_name: "Training pipeline"

This is a single literal job. If you later have a family of similar jobs, such as one per region or model in a namespaced pipeline, you can define them all with one templated entry instead. See Job Factories.

Step 6: Run on Azure ML¶

Now let's submit the job:

kedro azureml run -j training

After a moment, you should see a run URL printed to the terminal. Open it in your browser to see the pipeline running in Azure ML Studio:

https://ml.azure.com/runs/<run-id>?wsid=...

To block your terminal until the run completes, add --wait-for-completion:

kedro azureml run -j training --wait-for-completion

Note

Azure ML compute is billed while your job is running. The spaceflights starter pipeline typically completes in a few minutes on a small cluster.

Summary¶

You have submitted a Kedro pipeline to Azure ML managed compute. Along the way, you:

Installed kedro-azureml-pipeline
Configured a workspace, compute target, and environment in azureml.yml
Defined a job that maps a Kedro pipeline to an Azure ML pipeline submission
Submitted the job and saw it running in Azure ML Studio

Next steps¶

How to use data assets for versioned Azure ML Data Assets
How to schedule pipelines for recurring cron and recurrence schedules
Compile and inspect for verifying pipeline YAML before submitting
Deploy from CI/CD for automating submissions
How to authenticate for configuring Azure credentials in different environments
Configuration reference for all azureml.yml fields
Architecture overview for how the plugin translates Kedro to Azure ML