> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trainy.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Trainy-Konduktor: An ML/AI GPU platform for high performance batch jobs on k8s.

**At a glance**

* Simple resource declarations and `bash`  UX

```[expandable] theme={null}
# my_task.yaml
name: tune
num_nodes: 2 # scale up your workload

resources:
  cpus: 15
  memory: 90
  accelerators: H100:8
  image_id: gcr.io/k8s-staging-jobset/pytorch-mnist:latest
  labels:
    kueue.x-k8s.io/queue-name: user-queue
    maxRunDurationSeconds: "3200"

run: |
  set -e
  NCCL_DEBUG=INFO torchrun --rdzv_id=123 --nnodes=$NUM_NODES --nproc_per_node=1 --master_addr=$MASTER_ADDR --master_port=1234 --node_rank=$RANK /workspace/mnist.py
```

Run with:

```
$ konduktor launch my_task.yaml
```

* **Out of the box observability:** Trainy provides telemetry (prometheus metrics and logs) for tracking cluster performance, utilization, and health.

**Getting Started**

<CardGroup cols="2">
  <Card title="Setup" href="/setup">
    Installation & Configuration
  </Card>

  <Card title="Quickstart" href="/quickstart">
    Launch your first job
  </Card>
</CardGroup>

<script src="https://asciinema.org/a/1DD6A1S52rTZceoYmTJKcg5da.js" id="asciicast-1DD6A1S52rTZceoYmTJKcg5da" async="true" />

## Examples

<CardGroup cols={2}>
  <Card title="Many Parallel Jobs" icon="sparkles" href="/many-jobs">
    Schedule multiple jobs in parallel for batch inference
  </Card>

  <Card title="Multi-node Jobs" icon="sparkles" href="/distributed-jobs">
    Scale up the resources for a PyTorch distributed job with multiple machines
  </Card>

  <Card title="Interactive Workloads" icon="rectangle-terminal" href="/tailscale">
    SSH and connect VSCode to your GPU containers for debugging and dev via Tailscale
  </Card>

  <Card title="Observability" icon="microscope" href="/observability">
    Monitor and troubleshoot your workloads with our curated Grafana dashboards
  </Card>
</CardGroup>
