Docs
LLM API

LLM API

Documentation on our cost-friendly state-of-the-art Large Language Model API

GoAPI now allows Large Language Model Inference, referred to as LLM Inference. This service allows you access to APIs of endpoints for some exciting models available. Our service and pricing model best fit users who want high throughput scenarios.

Available models:

  • uncensored-small-32k-20240717
  • gpt-3.5-turbo
  • gpt-4o-mini
  • gpt-auto*
  • gpt-4o-plus**
  • gpt-4o**
  • claude-3-5-sonnet-20240620***

*Note: gpt-auto is a reverse engineered version of the Dynamic tab in ChatGPT: OpenAI determines when to use gpt-4o or gpt-3.5-turbo internally. In our test, most of the responses will be generated by gpt-4o.
**Note: gpt-4o-plus and gpt-4o are available on Developer plan or above. gpt-4o-plus is a reverse engineered version of the gpt-4o tab in ChatGPT. Whereas gpt-4o remains the original OpenAI's API model gpt-4o.


***Note: claude-3-5-sonnet-20240620 are available on Developer plan or above.


Pricing

All models are cheaper than OpenAI official prices, check LLM API | PPU Quota | Endpoint Usage.

Special Note

Due to Cloudflare's setting, we recommend using Stream method for openai's completions api whenever possible.
2023/11/28 Update: If you are determined to use Non-Stream method, you can change your domain to https://proxy.goapi.xyz


Basic Completions

POST

https://api.goapi.xyz/v1/chat/completions

Creates a model response for the given chat conversation.

Parameters:

Header
NameTypeRequiredDescription
Authorizationstring✔️Your GoAPI Key used for request authorization
Body
NameTypeRequiredDescription
modelstring✔️ID of the model to use
messagesarray✔️A list of messages comprising the conversation so far
functionsarrayA list of functions the model may generate JSON inputs for.
function_callstringControls how the model calls functions. "none" means the model will not call a function and instead generates a message. "auto" means the model can pick between generating a message or calling a function. Specifying a particular function via {name:my_function} forces the model to call that function. "none" is the default when no functions are present. "auto" is the default if functions are present.
temperaturenumberDefaults to 1. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_pnumberDefaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
nintegerDefaults to 1. How many chat completion choices to generate for each input message.
streambooleanDefaults to false. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data:[none] message
stopstring/arraryDefaults to null. Up to 4 sequences where the API will stop generating further tokens.
max_tokensnumberDefaults to inf The maximum number of tokens to generate in the chat completion.
presence_penaltynumberDefaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penaltynumberDefaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logit_biasmapDefaults to null. Modify the likelihood of specified tokens appearing in the completion.

Response Codes:

200: OK
Successful Response
400: Bad Request
The request format does not meet the requirements.
401: Unauthorized
The API key is incorrect
500: Internal Server Error
Service is experiencing an error

NO STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
     {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Response Example

{
  "id": "chatcmpl-83jZ61GDHtdlsFUzXDbpGeoU193Mj",
  "object": "chat.completion",
  "created": 1695900828,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 9,
    "total_tokens": 28
  }
}

STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
   ],
    "stream": true
  }'

Response Example

data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"role":"assistant","content":""},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"!"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" How"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" can"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" I"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" assist"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" you"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" today"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"?"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{},"finish_reason":"stop"}]}
 
data: [DONE]