> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trainy.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Complex vLLM

> Example deployment yamls for vLLM deployments with `konduktor serve`

This example also demonstrates the creation and use of `--kind=env` secrets using `konduktor secret create`. This is required for some models such as `meta-llama/Meta-Llama-3.1-8B-Instruct`, which require Hugging Face tokens for authentication.

## Prerequisites

#### Setup

1. Create a `--kind=env` secret for your HF token called `my-hf-token`

```
$ konduktor secret create --kind=env --inline HUGGING_FACE_HUB_TOKEN=hf_ABC123 my-hf-token
```

2. Check that the secret was properly created with:

```
$ konduktor secret list
```

For more details, check out the setup of secrets <a href="/secrets" target="_blank">here</a>.

#### Current Working Directory

```
$ ls
deployment.yaml
```

#### Launching

```
$ konduktor serve launch deployment.yaml
```

## Deployment.yaml

```
# autoscaling + custom port + multi GPU
name: serving-vllm-complex

resources:
  cpus: 4
  memory: 32
  accelerators: A100:2
  image_id: vllm/vllm-openai:v0.7.1
  labels:
    kueue.x-k8s.io/queue-name: user-queue

serving: 
  min_replicas: 0
  max_replicas: 2
  ports: 9000

run: |
  python3 -m vllm.entrypoints.openai.api_server \
    --uvicorn-log-level warning \
    --model meta-llama/Meta-Llama-3.1-8B-Instruct \
    --max-model-len 8192 \
    --tensor-parallel-size 2 \
    --dtype half
```
