> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trainy.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Simple vLLM

> Example deployment yamls for vLLM deployments with `konduktor serve`

Note that some models may require authentication through Hugging Face tokens, which can be done using `konduktor secret` (see complex example [here](/deployments/complex-vllm)). The model `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` does not require one.

## Prerequisites

#### Current Working Directory

```
$ ls
deployment.yaml
```

#### Launching

```
$ konduktor serve launch deployment.yaml
```

## Deployment.yaml

```
# no autoscaling + default port (8000) + single GPU
name: serving-vllm-simple

resources:
  cpus: 4
  memory: 32
  accelerators: A100:1
  image_id: vllm/vllm-openai:v0.7.1
  labels:
    kueue.x-k8s.io/queue-name: user-queue

serving: 
  min_replicas: 1

run: |
  python3 -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
    --max-model-len 4096
```