LanguageIdentifier

  • LanguageIdentifier is a client for identifying the language of a text string using Machine Learning Kit.

  • It provides methods to identify the most likely language and a list of possible languages with confidence scores.

  • The LanguageIdentifier can be configured with custom options or use default settings for language identification.

  • Input text longer than 200 characters is truncated for language identification as longer input doesn't improve accuracy.

  • The API doesn't support detecting multiple languages within a single piece of text.

public interface LanguageIdentifier extends Closeable, LifecycleObserver, OptionalModuleApi

A LanguageIdentification client for identifying the language of a piece of text.

A LanguageIdentifier is created via LanguageIdentification.getClient(LanguageIdentificationOptions) or LanguageIdentification.getClient() if you wish to use the default options. For example, the code below creates a LanguageIdentifier with default options.

Example:

LanguageIdentifier languageIdentifier = LanguageIdentification.getClient();
 

This class can be used from any thread.

Constant Summary

float DEFAULT_IDENTIFY_LANGUAGE_CONFIDENCE_THRESHOLD The default confidence threshold for the identifyLanguage(String) call.
float DEFAULT_IDENTIFY_POSSIBLE_LANGUAGES_CONFIDENCE_THRESHOLD The default confidence threshold for the identifyPossibleLanguages(String) call.
String UNDETERMINED_LANGUAGE_TAG The BCP 47 language tag for "undetermined language"

Public Method Summary

abstract void
abstract Task<String>
identifyLanguage(String text)
Identifies the language in a supplied String and returns the most likely language.
abstract Task<List<IdentifiedLanguage>>
identifyPossibleLanguages(String text)
Identifies the language in a supplied String and returns a list of possible languages, cutting off any languages whose confidence score falls below the threshold which is set in LanguageIdentificationOptions.Builder.setConfidenceThreshold(float).

Inherited Method Summary

Constants

public static final float DEFAULT_IDENTIFY_LANGUAGE_CONFIDENCE_THRESHOLD

The default confidence threshold for the identifyLanguage(String) call.

Constant Value: 0.5

public static final float DEFAULT_IDENTIFY_POSSIBLE_LANGUAGES_CONFIDENCE_THRESHOLD

The default confidence threshold for the identifyPossibleLanguages(String) call.

Constant Value: 0.01

public static final String UNDETERMINED_LANGUAGE_TAG

The BCP 47 language tag for "undetermined language"

Constant Value: "und"

Public Methods

public abstract void close ()

public abstract Task<String> identifyLanguage (String text)

Identifies the language in a supplied String and returns the most likely language.

Parameters
text the text for which to identify the language. Inputs longer than 200 characters are truncated to 200 characters, as longer input does not improve the detection accuracy.
Returns

public abstract Task<List<IdentifiedLanguage>> identifyPossibleLanguages (String text)

Identifies the language in a supplied String and returns a list of possible languages, cutting off any languages whose confidence score falls below the threshold which is set in LanguageIdentificationOptions.Builder.setConfidenceThreshold(float).

Note that this API assumes the text is in a single language; the returned list contains all estimations for what that language could be, along with a confidence score for each possible language. The API does not detect multiple languages in a single text.

Parameters
text the text for which to identify the language. Inputs longer than 200 characters are truncated to 200 characters, as longer input does not improve the detection accuracy.
Returns