Gemini 3.1 Flash Lite - OpenAI SDK - Full Reference
- Call Gemini-3.1-flash-lite-preview model using OpenAI SDK format
- Synchronous processing mode, returns conversation content in real-time
- Plain text conversation: Single-turn or multi-turn contextual dialogue, see simple_text and multi_turn examples in code samples
- System prompt: Customize AI role and behavior, see system_prompt example in code samples
- Multimodal input: Supports text + image mixed input, see vision and multi_image examples in code samples
https://direct.evolink.ai, which has better support for text models and long-lived connections. https://api.evolink.ai is the primary endpoint for multimodal services and serves as a fallback address for text models.Authorizations
##All APIs require Bearer Token authentication##
Get API Key:
Visit API Key Management Page to get your API Key
Add to request header:
Authorization: Bearer YOUR_API_KEY
Body
Chat model name
gemini-3.1-flash-lite-preview "gemini-3.1-flash-lite-preview"
List of chat messages, supports multi-turn dialogue and multimodal input
1Whether to return response in streaming mode
true: Streaming return, receives content in real-time chunksfalse: Returns complete response at once
false
Maximum number of completion tokens for the generated response, corresponding to Gemini's maxOutputTokens.
1 <= x <= 655362000
Maximum number of tokens for the generated response, compatible with the legacy OpenAI parameter.
1 <= x <= 655362000
Sampling temperature, controls output randomness
Description:
- Lower values (e.g., 0.2): More deterministic, focused output
- Higher values (e.g., 1.5): More random, creative output
0 <= x <= 20.7
Nucleus Sampling parameter
Description:
- Controls sampling from tokens with cumulative probability
- For example, 0.9 means selecting from tokens with cumulative probability up to 90%
- Default: 1.0 (considers all tokens)
Recommendation: Do not adjust temperature and top_p simultaneously
0 <= x <= 10.9
Frequency penalty coefficient. Range: -2.0 to 2.0. Corresponds to Gemini's frequencyPenalty.
-2 <= x <= 20
Presence penalty coefficient. Range: -2.0 to 2.0. Corresponds to Gemini's presencePenalty.
-2 <= x <= 20
Stop sequences. Supports a string or string array, corresponding to Gemini's stopSequences.
Number of generated candidates.
x >= 11
Limits reasoning effort. Gemini 3 supports low/high thinking levels; medium maps to the higher level and none is not supported.
low, medium, high "medium"
Random seed used to make output as reproducible as possible, corresponding to Gemini's seed.
12345
Whether to return token logprob information, corresponding to Gemini's responseLogprobs.
true
Number of top logprob values returned for each token, corresponding to Gemini's logprobs.
0 <= x <= 205
Response format settings, supporting JSON mode and JSON Schema, corresponding to Gemini's responseMimeType, responseSchema and responseJsonSchema.
- Option 1
- Option 2
Streaming response options. Can be set when stream is true.
List of tool definitions for Function Calling.
Controls tool-calling behavior.
none, auto, required Gemini extension parameters.
Response
Chat completion generated successfully
Unique identifier for the chat completion
"chatcmpl-20251010015944503180122WJNB8Eid"
Model name actually used
"gemini-3.1-flash-lite-preview"
Response type
chat.completion "chat.completion"
Creation timestamp
1760032810
List of chat completion choices
Token usage statistics