Search quality refers to the quality of search results in terms of ranking and recall as perceived by the user making the search query.
Ranking refers to the ordering of items and recall refers to the number of relevant items retrieved. An item (also referred to as a document) is any piece of digital content that Google Cloud Search can index. Types of items include Microsoft Office documents, PDF files, a row in a database, unique URLs, and so on. An item is comprised of:
- Structured metadata
- Indexable content
- ACLs
Cloud Search uses a variety of signals to retrieve and to rank search query results; the items resulting from a search query. You can influence Cloud Search’s signals through settings in the schema, the item's content and metadata (during indexing), and the search application. The goal of this document is to help you improve search quality through modification of these signal influencers.
For a summary of recommended and optional settings, refer to Summary of recommended and optional search quality settings.
Influence topicality score
Topicality refers to the relevance of a search result to the original query terms. Topicality of an item is calculated based on the following criteria:
- The importance of each query term.
- The number of hits (the number of times a query term appears in the item’s content or metadata).
- The type of matches the query term, and their variants, have with an item indexed in Cloud Search.
To influence a text property's topicality score,
define the RetrievalImportance
on the text property in your schema. A match on a property with high
RetrievalImportance
results in a higher score compared to a match on a
property with low RetrievalImportance
.
For example, suppose you have a data source with the following characteristics:
- The data source is used to store history for software bugs.
- Each bug has a name, description, and priority.
Most users would query this data source using the bug name, so you would
set the RetrievalImportance
on the name to HIGHEST
in the schema.
Conversely, most users may not query this data source using the description of
the bug, so, set the RetrievalImportance
on the description to DEFAULT
.
Following is sample schema containing RetrievalImportance
settings.
{
"objectDefinitions": [
{
"name": "issues",
"propertyDefinitions": [
{
"name": "summary",
"textPropertyOptions": {
"retrievalImportance": {
"importance": HIGHEST
}
}
},
{
"name": "description",
"textPropertyOptions": {
"retrievalImportance": {
"importance": DEFAULT
}
}
},
{
"name": "label",
"isRepeatable": true,
"textPropertyOptions": {
"retrievalImportance": {
"importance": DEFAULT
}
}
},
{
"name": "comments",
"textPropertyOptions": {
"retrievalImportance": {
"importance": DEFAULT
}
}
},
{
"name": "project",
"textPropertyOptions": {
"retrievalImportance": {
"importance": HIGH
}
}
},
{
"name": "duedate",
"datePropertyOptions": {
}
},
...
]
}
]
}
In the case of HTML documents, tags such as <title>
and <h1>
, along with
formatting settings such as font size and bolding, are used for determining the
importance of various terms. If the
ContentFormat
is TEXT
,
ItemContent
has DEFAULT
retrieval importance and if it is HTML, its retrieval importance
is determined on the basis of HTML properties.
Influence freshness
Freshness measures how recently an item has been modified and is determined
by the createTime
and updateTime
properties in the
ItemMetadata
.
Older items are demoted in the search results..
It is possible to influence how freshness is computed for an object by adjusting
the freshnessProperty
and freshnessDuration
of
FreshnessOptions
in the schema.
The freshnessProperty
allows you to use a date or timestamp properties for
computing freshness instead of the default updateTime
.
In our previous example of a software bug tracking system, the due date could
be used as a freshnessProperty
such that items with a due date closest to the
current date are considered “fresher” and obtain a ranking boost. Following is
sample schema containing freshnessProperty
settings:
{
"objectDefinitions": [
{
"name": "issues",
"options": {
"freshnessOptions": {
"freshnessProperty": "duedate"
}
},
"propertyDefinitions": [
{
"name": "summary",
"textPropertyOptions": {
"retrievalImportance": {
"importance": HIGHEST
}
}
},
{
"name": "duedate",
"datePropertyOptions": {
}
},
...
]
}
]
}
Use the freshnessDuration
to identify when an item is considered out-of-date.
For example, you may have a data source that is not indexed regularly or for
which you do not want freshness to influence the ranking. You can achieve this
goal by specifying a high value for freshnessDuration
.
Suppose you have a data source with employee profile information. In this
scenario, you might want a high freshnessDuration
because changes to employee
information is often not relevant to the ranking of the employee. Following is
sample schema containing freshnessDuration
setting:
{
"objectDefinitions": [
{
"name": "people",
"options": {
"freshnessOptions": {
"freshnessDuration": "315360000s", # 100 years
}
},
}
]
}
You can also set freshnessDuration
to a very small value for data sources
whose content changes rapidly, such as a data source containing news articles.
In this scenario, the most-recently created or modified documents are most relevant.
Following is sample schema containing freshnessDuration
setting for a data
source containing rapidly changing content:
{
"objectDefinitions": [
{
"name": "news",
"options": {
"freshnessOptions": {
"freshnessDuration": "259200s", # 3 days
}
},
}
]
}
Influence quality
Quality is a measurement of the accuracy and usefulness of
an item. A data source can contain multiple semantically similar documents, each
with a different level of quality. You can specify a quality value between 0 and
1 using SearchQualityMetadata
.
Items with higher values receive a ranking boost relative to items with a lower
values. Use this setting only if you need to influence or boost the quality of an
item outside of the information provided to Cloud Search.
For example, suppose you have a data source containing employee benefits
documents. You might use SearchQualityMetadata
to boost the ranking of
documents authored by Human Resources employees over documents authored by other
employees.
Following is sample schema containing SearchQualityMetadata
settings for
issues in a bug tracking system:
{
"name": "datasources/.../items/issue1",
"acl": {
...
},
"metadata": {
"title": "Issue 1"
"objectType": "issues"
},
...
}
{
"name": "datasources/.../items/issue2",
"acl": {
...
},
"metadata": {
"title": "Issue 2"
"objectType": "issues"
"searchQualityMetadata": {
"quality": 0.5
}
},
...
}
{
"name": "datasources/.../items/issue3",
"acl": {
...
},
"metadata": {
"title": "Issue 3"
"objectType": "issues"
"searchQualityMetadata": {
"quality": 1
}
},
...
}
Given this schema, when a user searches using the search term “issue,” Issue 3 in the schema (quality of 1) is ranked higher than Issue 2 (quality of .5) and Issue 1 (if nothing is specified, the default quality is 0).
Influence using field type
Cloud Search allows you to influence ranking based on the value of enum or
integer properties. For each integer or enum property, an
OrderedRanking
can be specified. This setting has the following values:
NO_ORDER
(default): The property does not affect ranking.ASCENDING
: Items with higher values of this integer or enum property receive a ranking boost compared to items with lower values.DESCENDING
: Items with lower values of the integer or enum property receive a ranking boost compared to items with higher values.
For example, suppose each bug in a bug tracking system has an enum property for
storing the priority of the bug as either HIGH
(1), MEDIUM
(2), or LOW
(3). In this scenario, setting an OrderedRanking
of DESCENDING
provides a
ranking boost to HIGH
priority bugs in comparison to LOW
priority bugs.
Following is sample schema containing OrderedRanking
settings for issues in a
bug tracking system:
{
"objectDefinitions": [
{
"name": "issues",
"options": {
"freshnessOptions": {
"freshnessProperty": "duedate",
}
},
"propertyDefinitions": [
{
"name": "summary",
"textPropertyOptions": {
"retrievalImportance": {
"importance": HIGHEST
}
}
},
{
"name": "duedate",
"datePropertyOptions": {
}
},
{
"name": "priority",
"enumPropertyOptions": {
"possibleValues": [
{
"stringValue": "HIGH",
"integerValue": 1
},
{
"stringValue": "MEDIUM",
"integerValue": 2
},
{
"stringValue": "LOW",
"integerValue": 3
}
],
"orderedRanking": DESCENDING,
}
},
...
]
}
]
}
A bug tracking system could also have an integer property called votes
used to
gather feedback from users on the relative importance of a bug. You could use
the votes
property to influence ranking by providing higher importance to the
bugs with the most votes. In this case, you could specify
OrderedRanking
as ASCENDING
for the votes
property so that issues with the most votes
receive a ranking boost. Following is sample schema containing OrderedRanking
settings for issues in a bug tracking system:
{
"objectDefinitions": [
{
"name": "issues",
"propertyDefinitions": [
{
"name": "summary",
"textPropertyOptions": {
"retrievalImportance": {
"importance": HIGHEST
}
}
},
{
"name": "description",
"textPropertyOptions": {
"retrievalImportance": {
"importance": DEFAULT
}
}
},
{
"name": "votes",
"integerPropertyOptions": {
"orderedRanking": ASCENDING,
"minimumValue": 0,
"maximumValue": 1000,
}
},
...
]
}
]
}
Influence ranking through query expansion
Query expansion refers to expanding the terms in the query, using synonyms and spelling, to retrieve better results.
Use synonyms to influence search results
Cloud Search utilizes synonyms inferred from public web content to expand the query terms. You can also define custom synonyms to capture organization-specific terminology, such as common acronyms used within an organization or industry-specific terminology.
Custom synonyms can be defined within a data source or as a separate data source. By default, synonyms are applied to all data sources across all search applications. However, you can group synonyms by data source and search application. For information on defining custom synonyms including grouping by search application, refer to Define synonyms.
Use spelling to influence search results
Cloud Search provides spelling suggestions based on models built using the
public Google Search data. If Cloud Search detects a misspelling in the context
of a query, it returns the suggested query in the
SpellResult
.
The suggested spelling can be displayed to the user as a suggestion. For
example, the user might misspell the query term “employe” and could receive the
suggestion “Did you mean employee?”
Cloud Search also uses spell corrections as synonyms to help retrieve documents that may otherwise be missed due to a spelling error.
Influencing ranking through search application settings
As mentioned in the Introduction to Google Cloud Search, a Search Application is a group of settings that, when associated with a search interface, provide contextual information about searches. The following configurations allow you to influence ranking through the search application:
- Scoring configuration
- Source configuration
The following two sections explain how these configurations are useful in influencing ranking.
Adjust the scoring configuration
For each search application, you can specify a ScoringConfig used for controlling the application of some signals during ranking. Currently, you can disable freshness and personalization.
If freshness is disabled, it is disabled for all data sources listed in the search application, regardless of the freshness options specified in the schema for the data source. Similarly, if personalization is disabled, owner boost and interaction boost doesn’t affect the ranking.
For step-by-step instructions on configuring this setting, refer to Customize the search experience in Cloud Search.
Adjust the source configuration
The source configuration allows you to specify data source-level settings in a search application. The following settings are supported:
- Source importance
- Crowding
Set source importance
Source importance refers to the relative importance of a data source within a
search application. This setting can be specified in SourceImportance
field
inside
SourceScoringConfig
.
Items from a data source with HIGH
source importance receive a ranking boost
compared to items from a data source with a DEFAULT
or a LOW
source
importance. Use this setting to influence ranking when you believe users would
prefer results from certain datasources.
For example, suppose you have a product support portal containing external and internal troubleshooting data. In this scenario, you might want to configure your search application to prioritize results from the internal data source.
For step-by-step instructions on configuring this setting, refer to Customize the search experience in Cloud Search.
Set crowding
Crowding refers to a the maximum number of results that can be returned from a
data source in a search application. This value
can be controlled using the numResults
field in
SourceCrowdingConfig
.
This value defaults to 3 which means if we have shown 3 results from a data
source Cloud Search starts presenting results from other data sources. Items
from the first data source are reconsidered only if all data sources have
reached their crowding limit or there are no more results from other data
sources.
This setting is helpful in ensuring diversity of the search results and preventing one data source from dominating the search result page.
For step-by-step instructions on configuring this setting, refer to Customize the search experience in Cloud Search.
Influencing ranking through personalization
Personalization refers to the presentation of personalized search results based on the individual user accessing the result. You can influence ranking by prioritizing items based on the following criteria:
- Item ownership
- Item interaction
- User clicks
- Item language
The following three sections address how to influence search quality based on these criteria.
Influence ranking based on item ownership
Item ownership refers to providing a ranking boost to items owned by the user
performing the search query. Each item has an
ItemAcl
with an owners
field. If the user executing a query is the owner of an item,
then, by default, that item receives a ranking boost. You can turn
off personalization in the search application.
Increase ranking based on item interaction
Item interaction refers to providing a ranking boost to items that the search query user interacted with (viewed, commented, edited, and so on).
Item interaction signals are automatically obtained for Google Workspace products such as Drive and Gmail. For other products, you can provide item-level interaction data, including the type of interaction (view, edit), the timestamp of the interaction, and the principal (user who interacted with the item). Note that items with recent interactions obtain a higher ranking boost.
Increase ranking based on user clicks
Cloud Search collects the clicks on current search results and uses it to improve ranking for future searches by boosting items clicked previously by the same user.
Influence ranking through query interpretation
Cloud Search’s query interpretation feature automatically interprets the operators and filters in a user’s query, and converts those elements into a structured, operator-based query. Query interpretation uses operators defined in the schema, together with the indexed documents, to deduce what the user's query means. This feature allows a user to search with minimal keywords, yet still obtain precise results. For further information, refer to Structure a schema for optimal query interpretation.
Increase ranking based on item language
Language refers to providing a ranking demotion to items whose language does not match the language of the query. The following factors affect the ranking of items based on language:
The query language. The auto-detected language of the search query, or the
languageCode
specified in theRequestOptions
.If you build a custom search interface, you should set the
languageCode
to the user's interface language or language preference (for example, the language of the web browser or the search interface page). The auto-detected query language takes precedence over thelanguageCode
, so that search quality is not compromised when a user types a query in a language that differs from their interface.The item language. The
contentLanguage
set inItemMetadata
at index time, or the content language automatically detected by Cloud Search.If a document's
contentLanguage
is left empty at index time, and theItemContent
is populated, Cloud Search attempts to detect the language used in theItemContent
and stores it internally. The auto-detected language is not added to thecontentLanguage
field.
If the language of the query and item match, no language demotion is applied. If
these settings do not match, then the item is demoted. Language demotion is not
applied to documents where contentLanguage
is empty and Cloud Search could not
automatically detect the language. As a result, the ranking of a document is not
impacted if Cloud Search can't detect its language.
Increase ranking based on item context
You can increase the ranking for items which are more relevant to the context
of a search query. The context
(contextAttributes
)
is a set of named attributes that you can specify during indexing, and in the
search request, to provide context for a specific search query.
For example, suppose an item, such as an employee benefit document, is more
relevant in the context of a Location
and Department
, such as a city
(San Francisco
), state (California
), country (USA
), and a Department
(Engineering
). In this case, you could index the item with the following
named attributes:
{
...
"metadata": {
"contextAttributes": [
{
name: "Location"
values: [
"San Francisco",
"California",
"USA"
],
},
{
name: "Department"
values: [
"Engineering"
],
}
],
},
...
}
When the user enters a search query of "benefits" into the search interface, you might include the user's location information and department in the search request. For example, here's a search request containing location and department information for an Engineer in Chicago:
{
...
"contextAttributes": [
{
name: "Location"
values: [
"Chicago",
"Illinois",
"USA"
],
},
{
name: "Department"
values: [
"Engineering"
],
}
],
...
}
Because both the indexed item and the search request contain the attributes of "Department=Engineering" and "Location=USA," the indexed item (an employee benefit document) appears higher in the search results.
Now suppose another user, an Engineer in India, enters a search query of "benefits" into the search interface. Here's a search request containing their location and department information:
{
...
"contextAttributes": [
{
name: "Location"
values: [
"Bengaluru",
"Karnataka",
"India"
],
},
{
name: "Department"
values: [
"Engineering"
],
}
],
...
}
Because both the indexed item and the search request only contain the attribute of "Department=Engineering," the indexed item appears only slightly higher in the search results (when compared to the first search query of "benefits" entered by an Engineer located in Chicago Illinois USA).
Following are some example contexts you might want use to increase ranking:
- Location: Items can be more relevant to users in a particular location, such as a building, a city, a country, or a region.
- Job role: Items can be more relevant to users in a particular job role, such as Technical Writer or Engineer.
- Department: Items can be more relevant to certain departments, such as Sales or Marketing.
- Job level: Items can be more relevant to certain job levels, such as Director or CEO.
- Employee type: Items can be more relevant to certain types of employees, such as part-time and full-time employees.
- Tenure: Items can be more relevant to an employee's tenure, such as a new hire.
Influencing ranking through item popularity
Cloud Search boosts popular items in ranking; that is, it boosts those items which have received clicks in recent search queries.
Influencing ranking through clickboost
Cloud Search collects the clicks on current search results and uses it to improve ranking for future searches by boosting popular items for a particular search query.
Summary of recommended and optional search quality settings
The following table lists all of the recommended and optional search quality settings. These recommendations should help you achieve the most benefit from Cloud Search's ranking models.
Setting | Location | Recommended/optional | Details |
---|---|---|---|
Schema settings | |||
ItemContent field | ItemContent | Recommended | When creating or updating your schema, populate the unstructured content of an item. This field is used for generating snippets. |
RetrievalImportance field | RetrievalImportance | Recommended | When creating or updating a schema, set for text properties which are clearly important or topical. |
FreshnessOptions | FreshnessOptions | Optional | When creating or updating a schema, set to ensure that items aren't demoted because of incorrect data or cases when data is missing. |
Indexing settings | |||
createTime /updateTime | ItemMetadata | Recommended | Populate during indexing of an item. |
contentLanguage | ItemMetadata | Recommended | Populate during indexing of an item. If absent, Cloud Search attempts to detect the language used in the ItemContent . |
owners field | ItemAcl() | Recommended | Populate during indexing of an item. |
Custom synonyms | _dictionaryEntry schema | Recommended | Define at data source-level or as separate data source during indexing. |
quality field | SearchQualityMetadata | Optional | To provide a base quality boost compared to other semantically similar items, set quality during indexing. Setting this field for all items in a data source nullifies its effect. |
item-level interaction data | interaction | Optional | If the data source records and provides access to user's interactions, populate the interactions for each item during indexing. |
integer/enum properties | OrderedRanking | Optional | When order of items is relevant, specify the ordered ranking for integer and enum properties during indexing. |
Search application settings | |||
Personalization=false | ScoringConfig or using CloudSearch admin UI | Recommended | When creating or updating the search application. Ensure you provide the correct owner information as described in Influencing ranking through personalization |
SourceImportance field | SourceCrowdingConfig | Optional | To bias the results from certain data sources, set this field. |
numResults field | SourceCrowdingConfig | Optional | To control the diversity of results, set this field. |
Next Steps
Here are a few next steps you might take:
Learn how to leverage the
_dictionaryEntry
schema to define synonyms for terms commonly used in your company. To use the_dictionaryEntry
schema, refer to Define synonyms.