# Trainy Konduktor

## Docs

- [MPI](https://docs.trainy.ai/MPI.md): Example task yamls for MPI
- [Validate API key](https://docs.trainy.ai/api-reference/auth/validate-api-key.md): Validates the API key and returns organization information. Used by MCP clients to verify credentials.
- [List all projects](https://docs.trainy.ai/api-reference/projects/list-all-projects.md): Returns all projects in the organization associated with the API key.
- [Add log names to a run](https://docs.trainy.ai/api-reference/runs/add-log-names-to-a-run.md): Adds new log names (metrics, console logs, etc.) to an existing run.
- [Compare metrics across multiple runs](https://docs.trainy.ai/api-reference/runs/compare-metrics-across-multiple-runs.md): Compares statistics for a specific metric across multiple runs. Returns min/max/mean/final values and identifies the best performing run.
- [Create a new run](https://docs.trainy.ai/api-reference/runs/create-a-new-run.md): Creates a new run in the specified project. If the project doesn't exist, it will be created. If externalId is provided and a run with that ID already exists, the existing run is returned (Neptune-style resume for multi-node distributed training).
- [Create model graph](https://docs.trainy.ai/api-reference/runs/create-model-graph.md): Creates a model graph visualization for a run, including nodes and edges.
- [Get files and artifacts from a run](https://docs.trainy.ai/api-reference/runs/get-files-and-artifacts-from-a-run.md): Returns file metadata with presigned URLs for downloading. URLs are valid for 5 days.
- [Get full run details](https://docs.trainy.ai/api-reference/runs/get-full-run-details.md): Returns complete run information including config, metadata, tags, status, and available log names.
- [Get run by ID](https://docs.trainy.ai/api-reference/runs/get-run-by-id.md): Decodes a SQID-encoded run ID and returns the numeric ID.
- [Get run details by display ID](https://docs.trainy.ai/api-reference/runs/get-run-details-by-display-id.md): Resolves a human-readable display ID (e.g., 'MMP-1') to a run and returns its details.
- [Get statistics for a run's metrics](https://docs.trainy.ai/api-reference/runs/get-statistics-for-a-runs-metrics.md): Computes statistics (min, max, mean, stddev) and detects anomalies for metrics in a run. Useful for quick analysis without downloading all data points.
- [List distinct metric names in a project](https://docs.trainy.ai/api-reference/runs/list-distinct-metric-names-in-a-project.md): Returns distinct metric names from the pre-computed metric summaries table. Useful for discovering available metrics before querying leaderboard or statistics. Optionally filter by a search substring or specific run IDs.
- [List runs with optional search and tag filtering](https://docs.trainy.ai/api-reference/runs/list-runs-with-optional-search-and-tag-filtering.md): Lists runs in a project with optional search using ILIKE substring matching. Supports tag filtering with OR logic (returns runs with ANY of the specified tags).
- [Query console logs from a run](https://docs.trainy.ai/api-reference/runs/query-console-logs-from-a-run.md): Returns console logs (stdout/stderr) from ClickHouse. Supports filtering by log type and pagination.
- [Query metrics from a run](https://docs.trainy.ai/api-reference/runs/query-metrics-from-a-run.md): Returns time-series metrics from ClickHouse. Supports filtering by metric name, group, and step range. Uses reservoir sampling to limit data points.
- [Rank runs by a metric](https://docs.trainy.ai/api-reference/runs/rank-runs-by-a-metric.md): Returns runs ranked by a metric aggregation (MIN, MAX, AVG, LAST, VARIANCE) using pre-computed metric summaries. Much faster than comparing individual runs. Useful for finding the best runs in a project by loss, accuracy, or any other metric.
- [Resume an existing run](https://docs.trainy.ai/api-reference/runs/resume-an-existing-run.md): Resumes an existing run, setting its status back to RUNNING. Returns the same response format as create. Use this when you want to log additional data (e.g., evaluation metrics) to a previously completed run. Provide exactly one of: runId (numeric), displayId (e.g., 'MMP-1'), or externalId (user-pro…
- [Update run config](https://docs.trainy.ai/api-reference/runs/update-run-config.md): Merges new configuration with existing run config. New keys override existing keys.
- [Update run notes](https://docs.trainy.ai/api-reference/runs/update-run-notes.md): Updates the notes/description on a run. Set to null or empty string to clear.
- [Update run status](https://docs.trainy.ai/api-reference/runs/update-run-status.md): Updates the status of an existing run (e.g., RUNNING, COMPLETED, FAILED).
- [Update run tags](https://docs.trainy.ai/api-reference/runs/update-run-tags.md): Replaces all tags on a run with the provided tags.
- [Authentication](https://docs.trainy.ai/authentication.md): How to authenticate and connect to your Trainy managed k8s cluster.
- [CLI Reference](https://docs.trainy.ai/cli-reference.md): Reference for the Konduktor CLI
- [Common Commands](https://docs.trainy.ai/commands.md): Commands for launching and managing your Konduktor Jobs
- [Konduktor Config Yamls](https://docs.trainy.ai/config-schema.md): Schema and examples for your `~/.konduktor/config.yaml`
- [Konduktor Serve Launch Deployment Yamls](https://docs.trainy.ai/deployment-schema.md): Schema and examples for your <b>konduktor serve launch</b> `deployment.yaml`
- [Serving Deployments (Experimental)](https://docs.trainy.ai/deployments.md): Konduktor makes serving general and vLLM deployments easy
- [Complex General](https://docs.trainy.ai/deployments/complex-general.md): Example deployment yamls for general deployments with `konduktor serve`
- [Complex vLLM](https://docs.trainy.ai/deployments/complex-vllm.md): Example deployment yamls for vLLM deployments with `konduktor serve`
- [Simple General](https://docs.trainy.ai/deployments/simple-general.md): Example deployment yamls for general deployments with `konduktor serve`
- [Simple vLLM](https://docs.trainy.ai/deployments/simple-vllm.md): Example deployment yamls for vLLM deployments with `konduktor serve`
- [Distributed Multi-Node Jobs](https://docs.trainy.ai/distributed-jobs.md): Jobs deployed via Konduktor can be scaled up to run on multiple nodes.
- [Environment Variables](https://docs.trainy.ai/env.md): Example task yamls for envs
- [Environment Variables](https://docs.trainy.ai/env-vars.md): Complete guide to all environment variables available in Konduktor containers and how to use them.
- [File Sync on Launch](https://docs.trainy.ai/launch-file-sync.md): How Konduktor uploads and synchronizes your code and files before workload execution.
- [Run Lifecycle](https://docs.trainy.ai/lifecycle.md): How jobs are scheduled?
- [Many Parallel Jobs](https://docs.trainy.ai/many-jobs.md): Enqueue and execute multiple jobs in parallel.
- [AWS - s3](https://docs.trainy.ai/minimal-cloud-permissions/AWS.md): S3 storage setup on Konduktor - AWS User permissions
- [GCP - gs](https://docs.trainy.ai/minimal-cloud-permissions/GCP.md): GS storage setup on Konduktor - Google Cloud Provider permissions
- [Get Started with Grafana](https://docs.trainy.ai/observability.md): Trainy: Konduktor users can connect their telemetry to Grafana, featuring pre-configured dashboards optimized to display the metrics, events, and logs most relevant to your workloads.
- [Overview](https://docs.trainy.ai/overview.md): Trainy-Konduktor: An ML/AI GPU platform for high performance batch jobs on k8s.
- [API Reference](https://docs.trainy.ai/pluto/api-reference/introduction.md): Programmatic access to the Pluto experiment tracking API
- [Changelog](https://docs.trainy.ai/pluto/changelog.md): Release history for Pluto SDK and Pluto Server.
- [Comparing runs](https://docs.trainy.ai/pluto/comparing.md): Compare experimental runs with side-by-side views, filtering, and custom table presets.
- [Custom Dashboards](https://docs.trainy.ai/pluto/dashboards.md): Create personalized dashboards to organize and monitor the metrics that matter most
- [Distributed Logging](https://docs.trainy.ai/pluto/distributed-logging.md): Log metrics from multinode and multiprocessing training jobs with torchrun
- [Exporting Neptune Runs](https://docs.trainy.ai/pluto/exporting-neptune-runs.md): Export existing Neptune experiments to Pluto using **neptune-exporter**
- [Files](https://docs.trainy.ai/pluto/files.md): Save and view files from your experiments
- [Run Forking](https://docs.trainy.ai/pluto/forking.md): Branch from an existing run at a specific step to explore training variations without starting from scratch.
- [Experiments](https://docs.trainy.ai/pluto/introduction.md): Experiments are an instance of a single run in Pluto.
- [Keyboard Shortcuts](https://docs.trainy.ai/pluto/keyboard-shortcuts.md): Keyboard shortcuts and mouse interactions for navigating Pluto efficiently
- [Linear](https://docs.trainy.ai/pluto/linear.md)
- [Logs](https://docs.trainy.ai/pluto/logs.md): View logs from your run
- [MCP Integration](https://docs.trainy.ai/pluto/mcp.md): Connect Pluto to AI coding assistants like Claude Code via the Model Context Protocol.
- [Neptune Scale Migration](https://docs.trainy.ai/pluto/neptune-migration.md): How to migrate from Neptune Scale to Pluto with zero code changes using the compatibility layer.
- [Overview](https://docs.trainy.ai/pluto/overview.md): Pluto is our managed and cloud-hosted experiment tracking tool for people developing/deploying ML models.
- [Querying Runs](https://docs.trainy.ai/pluto/querying-runs.md): Read-only helpers for fetching run metadata, metrics, files, and logs from Python.
- [Quickstart](https://docs.trainy.ai/pluto/quickstart.md): Create an account, generate an API key, and log your first experiment.
- [Run Lifecycle](https://docs.trainy.ai/pluto/run-lifecycle.md): Start, resume, and end a Pluto run from Python.
- [Settings](https://docs.trainy.ai/pluto/settings.md): Configure pluto behavior using environment variables and runtime settings
- [Downsampling & Smoothing](https://docs.trainy.ai/pluto/smoothing.md): How Pluto applies downsampling and smoothing to line charts
- [System Metrics](https://docs.trainy.ai/pluto/sys.md): View the system metrics of your experiments
- [UI Preferences](https://docs.trainy.ai/pluto/ui-preferences.md): Customize the Pluto web interface: themes, keyboard shortcuts, and display options
- [Audio](https://docs.trainy.ai/pluto/visualizations/audio.md): Log and view audio data in Pluto
- [Histograms](https://docs.trainy.ai/pluto/visualizations/histograms.md): Log and view distribution data as histograms in Pluto
- [Images](https://docs.trainy.ai/pluto/visualizations/images.md): Log and view image data in Pluto
- [Line Plots](https://docs.trainy.ai/pluto/visualizations/lines.md): Log metric time series and interact with line charts in Pluto
- [Video](https://docs.trainy.ai/pluto/visualizations/video.md): Log and view video data in Pluto
- [Weights & Biases Migration](https://docs.trainy.ai/pluto/wandb-migration.md): How to migrate from Weights & Biases to Pluto with zero code changes using the dual-logging hook.
- [Python API](https://docs.trainy.ai/python-api.md): Work with Konduktor programmatically from Python.
- [MosaicML Composer](https://docs.trainy.ai/pytorch/composer.md): Train with MosaicML composer
- [Torchrun](https://docs.trainy.ai/pytorch/torchrun.md): Perform a multinode job with torchrun
- [Ray/VeRL - LLM Reinforcement Learning](https://docs.trainy.ai/pytorch/verl.md): Production-ready RL training library for large language models (LLMs) by VeRL.
- [Quickstart](https://docs.trainy.ai/quickstart.md): You can easily train your model with Trainy: Konduktor with just a few simple steps. Before starting, make sure you’ve set up your account
- [Return values from jobs](https://docs.trainy.ai/return-values.md): Retrieve a Python-readable result from a Konduktor job
- [File Sync during Run](https://docs.trainy.ai/runtime-file-sync.md): Understand how to persist or retrieve data while your workload is running.
- [Managing Secrets](https://docs.trainy.ai/secrets.md): Konduktor simplifies Kubernetes secret management with a CLI that makes secrets easily accessible within your workloads.
- [Default Secrets](https://docs.trainy.ai/secrets/default.md): Example task yamls for secrets
- [Env Secrets](https://docs.trainy.ai/secrets/env.md): Example task yamls for secrets
- [Git-SSH Secrets](https://docs.trainy.ai/secrets/git-ssh.md): Example task yamls for secrets
- [Setup](https://docs.trainy.ai/setup.md): Installation and Credentials Setup
- [Directory Upload](https://docs.trainy.ai/storage/directory-upload.md): Example task yamls for cloud storage
- [File + Directory Upload](https://docs.trainy.ai/storage/file-directory-upload.md): Example task yamls for cloud storage
- [Git SSH Key Upload](https://docs.trainy.ai/storage/private-repo-ssh.md): Example task yamls for cloud storage
- [Single File Upload](https://docs.trainy.ai/storage/single-file-upload.md): Example task yamls for cloud storage
- [Workdir File Upload](https://docs.trainy.ai/storage/workdir-file-upload.md): Example task yamls for cloud storage
- [Tailscale Interactive SSH (Experimental)](https://docs.trainy.ai/tailscale.md): Connect to your workloads via Tailscale SSH for interactive development and debugging
- [Konduktor Launch Task Yamls](https://docs.trainy.ai/task-schema.md): Schema and examples for your <b>konduktor launch</b> `task.yaml`
- [Troubleshooting & FAQ GCP](https://docs.trainy.ai/troubleshooting/GCP.md): Commonly asked questions and issues for Trainy on GCP/GKE

## OpenAPI Specs

- [openapi](https://docs.trainy.ai/api-reference/openapi.json)

## Optional

- [Community](https://discord.com/invite/HQUBJSVgAP)
- [Blog](https://trainy.ai/blog)