Skip to content

Getting Started

In this tutorial we will take a Kedro project, connect it to Azure ML, and submit a pipeline run to cloud compute. Along the way we will install the plugin, configure a workspace, define a job, and see it running in Azure ML Studio.

Prerequisites

Before you begin, make sure you have:

  • Python 3.11+
  • An Azure ML workspace with at least one compute cluster
  • Azure credentials configured (az login)
  • An Azure ML environment created in your workspace (e.g. my-env@latest)

New to Kedro or Azure ML?

Coming from Azure ML? Kedro is a Python framework for building reproducible data science pipelines. You define nodes (functions) and a catalog (data sources), and Kedro handles execution order and data passing.

Coming from Kedro? Azure ML Pipelines let you run pipeline steps on managed cloud compute. You will need an Azure subscription, a workspace, and a compute cluster.

Step 1: Create a Kedro project

We will use the spaceflights-pandas starter:

kedro new --starter=spaceflights-pandas

Follow the prompts, then install the project dependencies:

cd spaceflights-pandas
pip install -r requirements.txt
cd spaceflights-pandas
uv sync

Step 2: Install the plugin

pip install kedro-azureml-pipeline
uv add kedro-azureml-pipeline

Let's verify the installation:

kedro azureml --help

You should see output starting with:

Usage: kedro azureml [OPTIONS] COMMAND [ARGS]...

Step 3: Initialize the configuration

The plugin ships two hooks (AzureMLLocalRunHook and MlflowAzureMLHook) that are auto-registered via Python entry points. Kedro automatically discovers them when the package is installed.

From the project root, run:

kedro azureml init

You should see:

Creating conf/base/azureml.yml...
Creating .amlignore...

Notice that two new files appeared:

  • conf/base/azureml.yml: plugin settings (workspace, compute, jobs)
  • .amlignore: controls which files are excluded from code uploads (similar to .gitignore)

The plugin supports two deployment flows. Choose the one that fits your setup:

  • Code upload: the plugin uploads a snapshot of your project to Azure ML on every run. Simplest way to get started.
  • Pre-built environment: your code is already installed inside the Azure ML environment (Docker image). Faster for large projects since nothing is uploaded.

Step 4: Configure your workspace

Open conf/base/azureml.yml and replace the placeholders. The execution section differs depending on which deployment flow you chose:

workspace:
  __default__:
    subscription_id: "<your-subscription-id>"
    resource_group: "<your-resource-group>"
    name: "<your-workspace-name>"

compute:
  __default__:
    cluster_name: "<your-cluster-name>"

execution:
  environment: "<your-aml-environment>@latest"
  code_directory: "."

With code_directory: ".", the plugin snapshots your project directory and uploads it to Azure ML. The .amlignore file controls which files are excluded from the upload.

workspace:
  __default__:
    subscription_id: "<your-subscription-id>"
    resource_group: "<your-resource-group>"
    name: "<your-workspace-name>"

compute:
  __default__:
    cluster_name: "<your-cluster-name>"

execution:
  environment: "<your-aml-environment>@latest"
  working_directory: "/home/kedro_docker"

With no code_directory, nothing is uploaded. The working_directory tells Azure ML where your pre-installed code lives inside the container. Your Docker image must already contain the Kedro project and all dependencies.

Finding your Azure details

In the Azure Portal, open your Azure ML workspace. The Overview page shows the subscription ID, resource group, and workspace name. Compute clusters are listed under Manage > Compute > Compute clusters.

Step 5: Define a job

Add a jobs section to the same azureml.yml file:

jobs:
  training:
    pipeline:
      pipeline_name: "__default__"
    experiment_name: "spaceflights-training"
    display_name: "Training pipeline"

Step 6: Run on Azure ML

Now let's submit the job:

kedro azureml run -j training

After a moment, you should see a run URL printed to the terminal. Open it in your browser to see the pipeline running in Azure ML Studio:

https://ml.azure.com/runs/<run-id>?wsid=...

To block your terminal until the run completes, add --wait-for-completion:

kedro azureml run -j training --wait-for-completion

Note

Azure ML compute is billed while your job is running. The spaceflights starter pipeline typically completes in a few minutes on a small cluster.

Summary

You have submitted a Kedro pipeline to Azure ML managed compute. Along the way, you:

  • Installed kedro-azureml-pipeline
  • Configured a workspace, compute target, and environment in azureml.yml
  • Defined a job that maps a Kedro pipeline to an Azure ML pipeline submission
  • Submitted the job and saw it running in Azure ML Studio

Next steps