Environment Variables

Konduktor automatically provides a rich set of environment variables in your containers that give you access to job metadata, networking information, and system configuration. These variables are automatically set when your container starts and can be accessed using standard shell syntax.

Setting Environment Variables

Environment variables in Konduktor can come from multiple sources, depending on how your workload and configuration are defined. When the same variable name is defined in more than one place, Konduktor applies a priority order to decide which value takes precedence.

Priority Order

If the same variable appears in multiple places, the highest priority source wins. Priorities from highest to lowest:

Priority	Source	Description
1	CLI	Variables passed directly via the CLI with `konduktor launch`
2	Task (`task.yaml`)	Variables defined under the `envs:` block of a `task.yaml`
3	Environment Secrets (`kind=env`)	Variables created using `konduktor secret create --kind=env`
4	Config (`~/.konduktor/config.yaml`)	Variables defined globally in the Konduktor configuration file at `~/.konduktor/config.yaml`
5	Other/System Defaults	Variables automatically generated by Konduktor (ex. `KONDUKTOR_JOB_NAME`, `NUM_NODES`, etc.).

While the global priority is CLI > task.yaml > env secret > config.yaml > system, note that any exports (ex. export FOO=bar) done inside your run: script happens at runtime and will override the effective value for all subsequent commands in that script. So basically runtime > everything.

Examples

CLI

$ konduktor launch --env FOO=bar task.yaml

OR

$ export FOO="bar" && konduktor launch --env FOO task.yaml

task.yaml

envs:
  FOO: bar

Env Secret

$ konduktor secret create --kind=env --inline FOO=bar my-env-secret-name

config.yaml

kubernetes:
  pod_config:
    spec:
      containers:
        - name: konduktor-container
          env:
            - name: FOO
              value: bar

Accessing Environment Variables

Environment variables can be accessed in several ways:

Shell Scripts

Task.yaml:

run: |
    # List all env vars
    env

    # Using $VAR syntax (recommended)
    echo "Job name: $KONDUKTOR_JOB_NAME"
    echo "Number of nodes: $NUM_NODES"

    # Using ${VAR} syntax (useful for concatenation)
    echo "Master address: ${MASTER_ADDR}"
    echo "Local IP: ${LOCAL_ADDR}"

    # Using ${VAR:-default} for fallback values
    echo "GPU count: ${NUM_GPUS_PER_NODE:-0}"

Python

Task.yaml:

run: |
    python pythonfile.py

pythonfile.py:

import os

# Get environment variables
job_name = os.environ.get('KONDUKTOR_JOB_NAME', 'unknown')
num_nodes = int(os.environ.get('NUM_NODES', '1'))
gpu_count = int(os.environ.get('NUM_GPUS_PER_NODE', '0'))

print(f"Running job: {job_name}")
print(f"Nodes: {num_nodes}")
print(f"GPUs per node: {gpu_count}")

Node.js

Task.yaml:

run: |
    node jsfile.js

jsfile.js:

// Get environment variables
const jobName = process.env.KONDUKTOR_JOB_NAME || 'unknown';
const numNodes = parseInt(process.env.NUM_NODES) || 1;
const gpuCount = parseInt(process.env.NUM_GPUS_PER_NODE) || 0;

console.log(`Running job: ${jobName}`);
console.log(`Nodes: ${numNodes}`);
console.log(`GPUs per node: ${gpuCount}`);

Core Konduktor Environment Variables

These variables are always available in every Konduktor container:

Job Information

Variable	Description	Example Value
`KONDUKTOR_JOB_NAME`	Unique identifier for the current job	`my-training-job-a1b2`
`NUM_NODES`	Total number of nodes in the job	`2`
`NUM_GPUS_PER_NODE`	Number of GPUs allocated per node	`8`

Networking Information

Variable	Description	Example Value
`NODE_HOST_IPS`	Comma-separated list of all node hostnames	`job-123-workers-0-0.job-123,job-123-workers-0-1.job-123`
`MASTER_ADDR`	Hostname of the master/rank 0 node	`job-123-workers-0-0.job-123`
`LOCAL_ADDR`	Pod’s internal IP address	`10.104.2.17`
`RANK`	Current node’s rank (0 for master, 1+ for workers)	`0`

System Configuration

Variable	Description	Example Value
`PYTHONUNBUFFERED`	Python output buffering setting	`0`
`JOB_COMPLETION_INDEX`	Kubernetes job completion index	`0`
`RESTART_ATTEMPT`	Job restart attempt number (0 for first attempt, 1+ for retries)	`0`
`KONDUKTOR_NODENAME`	Name of the node hosting the pod	`gke-cluster-default-pool-1a2b3c4d-xyz1`

Conditional Environment Variables

These variables are only set when specific features are enabled:

Tailscale (when `tailscale.secret_name` is configured)

Variable	Description	Example Value
`TS_USERSPACE`	Tailscale userspace networking flag	`true`
`TS_AUTHKEY`	Tailscale authentication key	`tskey-auth-...`
`POD_NAME`	Kubernetes pod name	`job-123-workers-0-0-abc123`
`POD_UID`	Kubernetes pod UID	`a1b2c3d4-e5f6-7890-abcd-ef1234567890`

SSH (when `ssh.enable` is true)

Variable	Description	Example Value
`KONDUKTOR_SSHPUB`	Public SSH key for the job	`ssh-rsa AAAAB3NzaC1...`
`KONDUKTOR_SSHPRIV`	Private SSH key for the job	`-----BEGIN OPENSSH PRIVATE KEY-----`
`KONDUKTOR_SSH_PORT`	SSH port number	`2222`

Git SSH (when git-ssh secret exists)

Variable	Description	Value
`GIT_SSH_COMMAND`	SSH command for Git operations	`ssh -i /run/konduktor/git-ssh-secret/gitkey -o StrictHostKeyChecking=no`

Default Secrets (when default secrets exist)

Variable	Description	Value
`KONDUKTOR_DEFAULT_SECRETS`	Path to mounted default secrets	`/konduktor/default-secrets`

Practical Examples

Example: WANDB Integration

import os
import wandb

# Automatically name your WANDB run after the Konduktor job
job_name = os.environ['KONDUKTOR_JOB_NAME']
num_nodes = int(os.environ['NUM_NODES'])
gpu_count = int(os.environ['NUM_GPUS_PER_NODE'])

wandb.init(
    project="my-training-project",
    name=job_name,
    config={
        "nodes": num_nodes,
        "gpus_per_node": gpu_count,
        "total_gpus": num_nodes * gpu_count
    }
)

print(f"WANDB run started: {job_name}")

Example: Checkpoint Resumption With Cloud Storage

When Konduktor restarts a job (via the Kubernetes Job controller), it launches a fresh pod. All local files are wiped, including anything in /tmp, /root, or the working directory. To recover from a failure, you must save checkpoints to a persistent cloud store. Learn more about using environment variables like RESTART_ATTEMPT and JOB_COMPLETION_INDEX for checkpoint resumption here.

Troubleshooting

If an environment variable is not set as expected:

Check if the feature is enabled (e.g., SSH, Tailscale)
Verify the variable name spelling
Check the Konduktor logs for any errors
Ensure you’re running the latest version of Konduktor

Get Started

CLI

User Guides

Setting Environment Variables

Priority Order

Examples

CLI

task.yaml

Env Secret

config.yaml

Accessing Environment Variables

Shell Scripts

Task.yaml:

Python

Task.yaml:

pythonfile.py:

Node.js

Task.yaml:

jsfile.js:

Core Konduktor Environment Variables

Job Information

Networking Information

System Configuration

Conditional Environment Variables

Tailscale (when `tailscale.secret_name` is configured)

SSH (when `ssh.enable` is true)

Git SSH (when git-ssh secret exists)

Default Secrets (when default secrets exist)

Practical Examples

Example: WANDB Integration

Example: Checkpoint Resumption With Cloud Storage

Troubleshooting

Get Started

CLI

User Guides

​Setting Environment Variables

​Priority Order

​Examples

​CLI

​task.yaml

​Env Secret

​config.yaml

​Accessing Environment Variables

​Shell Scripts

​Task.yaml:

​Python

​Task.yaml:

​pythonfile.py:

​Node.js

​Task.yaml:

​jsfile.js:

​Core Konduktor Environment Variables

​Job Information

​Networking Information

​System Configuration

​Conditional Environment Variables

​Tailscale (when tailscale.secret_name is configured)

​SSH (when ssh.enable is true)

​Git SSH (when git-ssh secret exists)

​Default Secrets (when default secrets exist)

​Practical Examples

​Example: WANDB Integration

​Example: Checkpoint Resumption With Cloud Storage

​Troubleshooting

Setting Environment Variables

Priority Order

Examples

CLI

task.yaml

Env Secret

config.yaml

Accessing Environment Variables

Shell Scripts

Task.yaml:

Python

Task.yaml:

pythonfile.py:

Node.js

Task.yaml:

jsfile.js:

Core Konduktor Environment Variables

Job Information

Networking Information

System Configuration

Conditional Environment Variables

Tailscale (when `tailscale.secret_name` is configured)

SSH (when `ssh.enable` is true)

Git SSH (when git-ssh secret exists)

Default Secrets (when default secrets exist)

Practical Examples

Example: WANDB Integration

Example: Checkpoint Resumption With Cloud Storage

Troubleshooting