GenerativeModel

Kotlin |Java

interface GenerativeModel

Provides an interface for performing content generation.

It supports both standard and streaming inferences, as well as methods for preparing and cleaning up model resources.

Typical usage:

val request = generateContentRequest { text("Your input text here.") }

try {
  val result = generativeModel.generateContent(request)
  println(result.text)
} catch (e: GenAiException) {
  // Handle exception
}

Summary

Public functions
`suspend @FeatureStatus Int`	`checkStatus()` Checks the current availability status of the content generation feature.
`suspend Unit`	`clearCaches()` Clears all caches created by prefix caching.
`Unit`	`close()` Releases resources associated with the content generation engine.
`suspend CountTokensResponse`	`countTokens(request: GenerateContentRequest)` Counts the number of tokens in the request.
`Flow<DownloadStatus>`	`download()` Downloads the required model assets for the content generation feature if they are not already available.
`open suspend GenerateContentResponse`	`generateContent(prompt: String)` Performs asynchronous content generation on the provided input prompt.
`suspend GenerateContentResponse`	`generateContent(request: GenerateContentRequest)` Performs asynchronous content generation on the provided input request.
`open suspend GenerateContentResponse`	`generateContent(prompt: String, streamingCallback: StreamingCallback)` Performs streaming content generation inference on the provided input prompt.
`suspend GenerateContentResponse`	`generateContent( request: GenerateContentRequest, streamingCallback: StreamingCallback )` Performs streaming content generation inference on the provided input request.
`open Flow<GenerateContentResponse>`	`generateContentStream(prompt: String)` Performs streaming content generation inference on the provided input prompt.
`Flow<GenerateContentResponse>`	`generateContentStream(request: GenerateContentRequest)` Performs streaming content generation inference on the provided input request.
`suspend String`	`getBaseModelName()` Returns the name of the base model used by this generator instance.
`suspend Int`	`getTokenLimit()` Returns total token limit for the API including both input and output tokens.
`suspend Unit`	`warmup()` Warms up the inference engine for use by loading necessary models and initializing runtime components.

Public functions

checkStatus

suspend fun checkStatus(): @FeatureStatus Int

Checks the current availability status of the content generation feature.

Returns
`@FeatureStatus Int`	a feature status indicating the feature's readiness.

clearCaches

suspend fun clearCaches(): Unit

Clears all caches created by prefix caching.

This experimental method clears all caches created by prefix caching. When promptPrefix is provided in generateContent or generateContentStream, the system caches its processing to reduce inference time for subsequent requests sharing the same prefix. This method clears all such created caches.

close

fun close(): Unit

Releases resources associated with the content generation engine.

This should be called when the GenerativeModel is no longer needed. Can be safely called multiple times.

countTokens

suspend fun countTokens(request: GenerateContentRequest): CountTokensResponse

Counts the number of tokens in the request.

The number of tokens counted includes input and output tokens. The result can be compared with getTokenLimit to check if the request is within the token limit.

Parameters
`request: GenerateContentRequest`	a non-null `GenerateContentRequest` containing input content.

Returns
`CountTokensResponse`	a `CountTokensResponse` containing the number of tokens in the request.

download

fun download(): Flow<DownloadStatus>

Downloads the required model assets for the content generation feature if they are not already available.

Use this method to proactively download models before inference. The returned Flow emits DownloadStatus to report progress and completion status.

Returns
`Flow<DownloadStatus>`	a `Flow` which will emit `DownloadStatus`s for download progress updates.

generateContent

open suspend fun generateContent(prompt: String): GenerateContentResponse

Performs asynchronous content generation on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
`prompt: String`	the input prompt text.

Returns
`GenerateContentResponse`	a `GenerateContentResponse` containing the generated content.

Throws
`GenAiException`	if the inference fails.

See also
`generateContent`	(request: GenerateContentRequest)

generateContent

suspend fun generateContent(request: GenerateContentRequest): GenerateContentResponse

Performs asynchronous content generation on the provided input request.

This is the standard, non-streaming version of inference. The full generation suggestions are returned once the model completes processing.

This method is non-blocking. To handle the result, callers should use try/catch to handle the returned GenerateContentResponse or a potential GenAiException.

The coroutine that runs generateContent is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled.

Note that inference requests may fail under certain conditions such as:

Invalid or malformed input in the GenerateContentRequest.
Exceeded usage quota or failed safety checks.

Parameters
`request: GenerateContentRequest`	a non-null `GenerateContentRequest` containing input content.

Returns
`GenerateContentResponse`	a `GenerateContentResponse` containing the generated content.

Throws
`GenAiException`	if the inference fails.

generateContent

open suspend fun generateContent(prompt: String, streamingCallback: StreamingCallback): GenerateContentResponse

Performs streaming content generation inference on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
`prompt: String`	the input prompt text.
`streamingCallback: StreamingCallback`	a non-null `StreamingCallback` for receiving streamed results.

Returns
`GenerateContentResponse`	a `GenerateContentResponse` containing the final generated content.

Throws
`GenAiException`	if the inference fails.

See also
`generateContent`	(request: GenerateContentRequest, streamingCallback: StreamingCallback)

generateContent

suspend fun generateContent(
    request: GenerateContentRequest,
    streamingCallback: StreamingCallback
): GenerateContentResponse

Performs streaming content generation inference on the provided input request.

Partial results are delivered incrementally through the provided StreamingCallback. The function suspends until all results are received and returns the complete, final GenerateContentResponse. If streaming is interrupted by a GenAiException, consider removing any already streamed partial output from the UI.

The coroutine that runs generateContent is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled.

Note that inference requests may fail under certain conditions such as:

Invalid or malformed input in the GenerateContentRequest.
Exceeded usage quota or failed safety checks.

Parameters
`request: GenerateContentRequest`	a non-null `GenerateContentRequest` containing input content.
`streamingCallback: StreamingCallback`	a non-null `StreamingCallback` for receiving streamed results.

Returns
`GenerateContentResponse`	a `GenerateContentResponse` containing the final generated content.

Throws
`GenAiException`	if the inference fails.

generateContentStream

open fun generateContentStream(prompt: String): Flow<GenerateContentResponse>

Performs streaming content generation inference on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
`prompt: String`	the input prompt text.

Returns
`Flow<GenerateContentResponse>`	a `Flow` which will emit `GenerateContentResponse`s as they are returned from the model.

Throws
`java.lang.IllegalArgumentException`	if `request.candidateCount` is greater than 1. If you need to receive multiple candidates in the final result, please use the generateContent(prompt: String, streamingCallback: StreamingCallback) method instead.

See also
`generateContentStream`	(request: GenerateContentRequest)

generateContentStream

fun generateContentStream(request: GenerateContentRequest): Flow<GenerateContentResponse>

Performs streaming content generation inference on the provided input request.

Partial results are delivered incrementally through the returned Flow. Each GenerateContentResponse contains a single Candidate. The last emitted value contains a Candidate with a FinishReason that is a non-null FinishReason (e.g., FinishReason.STOP or FinishReason.MAX_TOKENS)

This streaming mode is useful to build a more responsive UI. Streaming can be interrupted by a GenAiException. In that case, consider removing any already streamed partial output from the UI.

The coroutine collecting the Flow is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled

Note that inference requests may fail under certain conditions such as:

Invalid or malformed input in the GenerateContentRequest.
Exceeded usage quota or failed safety checks.

Important: This function currently only supports a candidateCount of 1 in the GenerateContentRequest. Providing a candidateCount greater than 1 will result in an IllegalArgumentException. If you need to receive multiple candidates in the final result, please use the generateContent(request: GenerateContentRequest, streamingCallback: StreamingCallback) method instead.

Parameters
`request: GenerateContentRequest`	a non-null `GenerateContentRequest` containing the input content.

Returns
`Flow<GenerateContentResponse>`	a `Flow` which will emit `GenerateContentResponse`s as they are returned from the model.

Throws
`java.lang.IllegalArgumentException`	if `request.candidateCount` is greater than 1.

getBaseModelName

suspend fun getBaseModelName(): String

Returns the name of the base model used by this generator instance.

The model name may be used for logging, debugging, or feature gating purposes.

Returns
`String`	a `String` representing the base model name.

getTokenLimit

suspend fun getTokenLimit(): Int

Returns total token limit for the API including both input and output tokens.

This limit can be used with countTokens to check if a request is within limits before running inference. The input size returned by countTokens plusing the output size specified by GenerateContentRequest.maxOutputTokens should be no larger than the limit returned by this method.

Returns
`Int`	token limit.

warmup

suspend fun warmup(): Unit

Warms up the inference engine for use by loading necessary models and initializing runtime components.

While calling this method is optional, we recommend invoking it well before the first inference call to reduce the latency of the initial inference.

Throws
`GenAiException`	if the preparation fails.