Create a custom translation model

Train and use a custom translation model by using the Google Cloud console. The following example uses AutoML Translation to train an English-to-Spanish translation model by using a dataset that contains technology-oriented segment pairs from software localization.

Before you begin

Before you can start using AutoML Translation, your project must have the Cloud Translation API enabled, and you must have the permissions that are granted by the following roles:

  • Viewer role to view existing resources in your project
  • Cloud Translation API Editor role to create and manage datasets and models
  • Storage Admin role to upload training data to a Cloud Storage bucket

Create a translation dataset and import segment pairs

  1. Download the archive file that contains the sample data for training the model, and extract the files.

    For this tutorial, you'll use the English to Spanish TSV file.

  2. Go to the AutoML Translation console.

    Go to the Translation page

  3. From the navigation pane, click Datasets to go to the Datasets page.

  4. Click Create dataset.

  5. In the Create dataset dialog, specify details about the dataset:

    1. Enter tutorial_dataset as the name for the dataset.
    2. Select English (EN) as your source language from the drop-down list.
    3. Select Spanish (ES) as your target language.
    4. Click Create.
  6. After the dataset is created, click the dataset name to view its details.

  7. Go to the Import tab and upload the en-es.tsv dataset to Cloud Storage:

    1. Select Upload files from your computer.
    2. Click Select files, and choose the en-es.tsv file that you previously downloaded and extracted.
    3. Click Browse to select or create a new Cloud Storage bucket where your TSV is stored. The bucket region must be us-central1.
  8. Click Continue.

    AutoML Translation automatically splits your data into training, validation, and testing sets. You can view these splits and the imported sentence pairs in the Sentences tab of your dataset.

Train a model

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Datasets page.

  3. Click the tutorial_dataset dataset.

  4. Go to the Train tab.

  5. Click Start training, which opens the Train new model pane.

  6. Enter tutorial_model for the model name.

  7. Click Start training.

Training a model can take several hours to complete.

Evaluate the model

Check to see how the model compares to the default Google NMT model that is based on segment pairs from your test set.

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Models page.

  3. Click the tutorial_model model.

  4. Click the Evaluate tab.

In the Previous evaluations section, Cloud Translation shows your model's BLEU score compared to the Google NMT model. The BLEU (Bilingual Evaluation Understudy) score indicates how similar the candidate text is to the reference texts; values closer to 100 represent more similar texts.

Use the translation model

From the Google Cloud console, you can use your custom model to translate some text.

  1. Go to the AutoML Translation console.

    Go to the Translation page

  2. From the navigation pane, go to the Models page.

  3. Click the tutorial_model model.

  4. Click the Predict tab.

  5. In the English text box, enter text to translate and then click Translate.

    You can compare the results from your custom model to the Google NMT model.

Clean up

To avoid unnecessary Google Cloud charges, delete your model, dataset, and en-es.tsv file. You can also use the Google Cloud console to delete your project if you don't need it.

What's next