Developer Hub

Build with the world's
best AI models.

Access all models through a single, OpenAI-compatible gateway. Sub-millisecond routing, unified billing, and 100% SDK compatibility.

Quickstart

Get up and running in less than 60 seconds using the official OpenAI SDKs.

from openai import OpenAI

client = OpenAI(
    api_key="sk-ht-xxxx",
    base_url="https://api.heytoken.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Authentication

All API requests must include your API key in the Authorization HTTP header. You can generate and manage your keys in the API Keys dashboard.

Authorization: Bearer sk-ht-xxxxxxxxxxxx

Protect your Secret Key

Your API key carries the same privileges as your account. Never share it, commit it to version control, or use it in client-side code.

Chat Completions

POST

/v1/chat/completions

Generate intelligent responses using state-of-the-art language models. Our API supports high-performance streaming (SSE) and multimodal inputs.

Request Parameters

Parameter	Type	Description
`model`Required	string	ID of the model to use. See /v1/models for a full list.
`messages`Required	array	A list of messages comprising the conversation.
`stream`	boolean	If true, partial message deltas will be sent via SSE. Default: `false`
`temperature`	number	Sampling temperature between 0 and 2. Default: `0.7`
`max_tokens`	integer	The maximum number of tokens to generate.
`reasoning`	boolean	Enable chain-of-thought for supported models (O1/DeepSeek). Default: `false`

Response Formats

Standard JSON (Non-streaming)

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello!"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Streaming (Server-Sent Events)

# data: prefix for each chunk

data: {"choices":[{"delta":{"content":"Hello"},"index":0}]}
data: {"choices":[{"delta":{"content":" world"},"index":0}]}
data: [DONE]

Tip: Set stream: true to significantly reduce perceived latency (Time To First Token).

Embeddings

Convert text into high-dimensional vectors for semantic search, clustering, and RAG applications.

cURL Example

curl https://api.heytoken.ai/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ht-xxxx" \
  -d '{
    "input": "The food was delicious and the service was excellent.",
    "model": "text-embedding-3-small"
  }'

Image Generation

Create high-resolution images from text prompts using DALL-E 3, Midjourney, and Stable Diffusion.

POST /v1/images/generations

Key Features

Multiple aspect ratios
HD quality support
Style consistency

Example Request

{
  "model": "dall-e-3",
  "prompt": "A futuristic city at sunset",
  "size": "1024x1024",
  "quality": "hd"
}

Video Generation

Generate cinematic videos from text or images using Sora, Runway, and Kling.

POST /v1/videos/generations

Asynchronous API

1.Submit a generation task to receive a task_id.
2.Poll the status endpoint or wait for a webhook callback.
3.Download the high-quality MP4 result once completed.

Realtime Voice (Beta)

Low-latency, full-duplex voice interactions using OpenAI Realtime and Google Gemini Multimodal.

This endpoint requires a WebSocket connection. Documentation for the /v1/realtime socket protocol is available upon request for Enterprise customers.

Rate Limits

Enforced to ensure fair usage and system stability across all developers.

RPM

100

Requests per min

RPH

1,000

Requests per hour

TPM

100k

Tokens per min

Concurrency

Active streams

Errors & Handling

We use standard HTTP status codes. A successful request returns a 2xx status code.

401

Unauthorized

Invalid or missing API key. Check your Authorization header format.

402

Payment Required

Insufficient balance. Top up your credits in the Billing section.

404

Not Found

The requested model or endpoint does not exist.

429

Rate Limit

Too many requests. Please implement exponential backoff.

500

Server Error

Upstream provider is temporarily down. Our system will auto-retry another channel.

503

Overloaded

The model is currently experiencing high traffic. Try again in a few seconds.

Best Practices

1
Use streaming for a better user experience in chat applications.
2
Implement timeouts (recommended 30-60s) to handle long-running generations.
3
Set max_tokens to control costs and prevent unexpected usage spikes.
4
Cache frequent requests to reduce latency and save on token costs.

Build with the world's best AI models.