Konduktor Serve Launch Deployment Yaml
Schema
Details
Generalmin_replicas (required)
max_replicas (optional)
- if min_replicas != max_replicas, autoscaling is enabled automatically
resources: image_id (required)
- only
vllm/vllm-openai:v0.7.1or other version is supported by the OpenAI API
min_replicas (required)
max_replicas (optional)
- if min_replicas != max_replicas, autoscaling is enabled automatically
probe (exclude)
- only /health is supported by the OpenAI API, so just exclude for simplicity and it will default to /health
run (required)
python3 -m vllm.entrypoints.openai.api_server(required)--model(required)- some models like Llama 3.1 require authentication through a hugging face token, which can be passed into the deployment using Konduktor Secrets
- ex.
konduktor secret create --kind=env --inline HUGGING_FACE_HUB_TOKEN=hf_ABC123 my-hf-token
--max-model-len(required)--tensor-parallel-size(required w GPUs > 1; otherwise optional)
- See here for more info on
vllm.entrypoints.openai.api_server