LLM API

Documentation on our cost-friendly state-of-the-art Large Language Model API

GoAPI now allows Large Language Model Inference, referred to as LLM Inference. This service allows you access to APIs of endpoints for some exciting models available. Our service and pricing model best fit users who want high throughput scenarios.

Available models:

uncensored-small-32k-20240717
gpt-3.5-turbo
gpt-4o-mini
gpt-auto*
gpt-4o-plus**
gpt-4o**
claude-3-5-sonnet-20240620***

*Note: gpt-auto is a reverse engineered version of the Dynamic tab in ChatGPT: OpenAI determines when to use gpt-4o or gpt-3.5-turbo internally. In our test, most of the responses will be generated by gpt-4o.
**Note: gpt-4o-plus and gpt-4o are available on Developer plan or above. gpt-4o-plus is a reverse engineered version of the gpt-4o tab in ChatGPT. Whereas gpt-4o remains the original OpenAI's API model gpt-4o.

***Note: claude-3-5-sonnet-20240620 are available on Developer plan or above.

Pricing

All models are cheaper than OpenAI official prices, check LLM API | PPU Quota | Endpoint Usage.

Special Note

Due to Cloudflare's setting, we recommend using Stream method for openai's completions api whenever possible.
2023/11/28 Update: If you are determined to use Non-Stream method, you can change your domain to https://proxy.goapi.xyz

Basic Completions

POST

https://api.goapi.xyz/v1/chat/completions

Creates a model response for the given chat conversation.

Parameters:

Header

Name	Type	Required	Description
Authorization	string	✔️	Your GoAPI Key used for request authorization

Body

Name	Type	Required	Description
model	string	✔️	ID of the model to use
messages	array	✔️	A list of messages comprising the conversation so far
functions	array		A list of functions the model may generate JSON inputs for.
function_call	string		Controls how the model calls functions. "none" means the model will not call a function and instead generates a message. "auto" means the model can pick between generating a message or calling a function. Specifying a particular function via {name:my_function} forces the model to call that function. "none" is the default when no functions are present. "auto" is the default if functions are present.
temperature	number		Defaults to 1. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p	number		Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
n	integer		Defaults to 1. How many chat completion choices to generate for each input message.
stream	boolean		Defaults to false. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data:[none] message
stop	string/arrary		Defaults to null. Up to 4 sequences where the API will stop generating further tokens.
max_tokens	number		Defaults to inf The maximum number of tokens to generate in the chat completion.
presence_penalty	number		Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty	number		Defaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logit_bias	map		Defaults to null. Modify the likelihood of specified tokens appearing in the completion.

Response Codes:

200: OK

Successful Response

400: Bad Request

The request format does not meet the requirements.

401: Unauthorized

The API key is incorrect

500: Internal Server Error

Service is experiencing an error

NO STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
     {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Response Example

{
  "id": "chatcmpl-83jZ61GDHtdlsFUzXDbpGeoU193Mj",
  "object": "chat.completion",
  "created": 1695900828,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 9,
    "total_tokens": 28
  }
}

STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
   ],
    "stream": true
  }'

Response Example

data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"role":"assistant","content":""},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"!"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" How"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" can"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" I"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" assist"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" you"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" today"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"?"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{},"finish_reason":"stop"}]}
 
data: [DONE]

SDXL Models - All Transcription API