You can set up Google Cloud Search to return results from your organization's SharePoint on-premises content in addition to your Google Workspace content. You use the Google Cloud Search SharePoint On-Prem connector and configure it to access a specific SharePoint data source.
Important considerations
Honored SharePoint settings
The Cloud Search SharePoint On-Prem connector always honors the Search Visibility setting on SharePoint, which can't be overridden. For draft documents, the permissions on the user account (that the connector uses to access SharePoint Online) control what draft documents are indexed and returned. If the account has only "Full Read" permissions, the connector honors the "Draft item visibility" settings on SharePoint.
You can also configure the connector to limit results based on user account access. You can use Google principals and external principals to define ACLs. To apply security trimming for SharePoint content, synchronize the following external identities with the Google Directory:
- Active Directory Users
- Active Directory Groups
- SharePoint Local Groups (with Active Directory users and groups as members)
To synchronize AD users and groups, you use Google Cloud Directory Sync, enabling identity mapped groups. To synchronize SharePoint local groups, you use the SharePoint Identity Connector.
The connector also needs to perform lookup with AD to fetch additional information to synchronize the principals. For example, lookup with AD lets the connector do the following:
- Map the SID for a domain group to the corresponding sAMAccountName.
- Map a user sAMAccountName to the email address for SharePoint local group memberships.
Search optimization
You can improve your users' experience by configuring the connector to return more relevant search results.
To use the API, set values for HTML generation parameters in the SharePoint Online connector configuration file. These parameters let you set which fields have higher or lower impact on matches.
To set up a schema, follow the instructions in Create and register a schema. When you set up a schema:
To map the names of SharePoint content types to corresponding object definitions, the connector normalizes the content type names by excluding unsupported characters. For object definitions, the Cloud Search API supports only A-Z, a-z, and 0-9 as valid characters. For example, the content type "Announcements" maps to the object definition "Announcements". The content type "News Article" maps to "NewsArticle" (no space).
When the connector can't match an object definition with an object definition, the connector uses the fallback object type (
itemMetadata.objectType
). Learn more about metadata configuration parameters.To map SharePoint property names to property definitions, the connector normalizes the property names by decoding hex-encoded characters and removing "ows_" prefixes, then excluding unsupported characters (all characters except A-Z, a-z, and 0-9 as valid characters).
Microsoft Outlook message handling
When the connector encounters Microsoft Outlook .msg files as it indexes content, it overrides the media type for the files and indexes them as application/vnd.ms-outlook.
Multi-tenant configurations
If your SharePoint is a multi-tenant deployment, where multiple customer sites are hosted on the same Web application, you need to configure site collection mode in the configuration file. In multi-tenant deployments, you get permissions only for your site collection and can't get Full Read permissions, as required by the SharePoint On-Prem connector.
To enable site collection mode:
- Give the connector user account site collection administrator permissions.
- Set
sharepoint.server
in your connector configuration file to the site collection URL, such ashttp://sharepoint.example.com/sites/sitecollection
. The URL doesn't need to use the exact same case as on SharePoint. - Set
sharepoint.siteCollectionOnly
in your connector configuration file totrue
.
If you have multiple site collections to index in a multi-tenant environment, you need to configure one connector instance for each of the site collections.
Known connector limitations
- The time it takes the connector to detect changes to items in the databases increases with the number of databases the connector monitors.
- Memory consumption increases with the number of unique users and groups that you use in ACLs for each site collection.
- You can configure the connector with identities from only one Active Directory Domain.
- Some common Active Directory and Windows principals, such as
Everyone
,BUILTIN\Users
, andAll Authenticated Users
, aren't supported. - Delete notifications are not instantaneous and it can take more than 4 hours for a connector to recognize that a user deleted content from the source repository.
System requirements
System requirements | |
---|---|
Operating system |
|
Software |
|
Authentication |
|
Deploy the connector
Prerequisites
Create a Google Workspace private key, which contains your service account ID. To learn how to get a private key, go to Configure access to the Google Cloud Search API.
Your Google Workspace administrator must add a data source to search. Record the data source ID.
If the connector returns results based on ACLs (results aren't public), your Google Workspace administrator must create two identity sources and give you their IDs:
- An identity source for syncing Active Directory users and groups.
- An Identity source for SharePoint Local groups
The admin must also get your organization's Google Workspace customer ID and give it to you.
Learn how to get these values in Map user identities in Cloud Search.
Set up a user account for the connector that has Full Read permissions to SharePoint Web Application in the user policy.
If the SharePoint Web Application doesn't have a root site collection, create one.
If any site collections are write-locked, sign in to the SharePoint server with an account that has Admin privileges and run the
PrepareWriteLockedSites.ps1
script.To get data source metrics to inform your connector configuration, sign in to the SharePoint server with an account that has farm administration privileges and run
diagnose_sp.ps1
.The output reports the numbers of web applications, documents, and user group memberships. Use this information to estimate how many connector instances you need, memory requirements, and document count.
Step 1. Install the Google Cloud Search SharePoint On-Prem connector software.
Clone the connector repository from GitHub.
$ git clone https://github.com/google-cloudsearch/sharepoint-connector.git $ cd sharepoint-connector
Check out the desired version of the connector:
$ git checkout tags/latest_version
Where:
latest_version
= a value such asv1-0.0.5
Build the connector.
$ mvn package
To skip tests when you build the connector, run
mvn package -DskipTests
instead ofmvn package
.Copy the connector zip file to your local installation directory:
$ cp target/google-cloudsearch-sharepoint-connector-latest_version.zip installation-dir $ cd installation-dir $ unzip google-cloudsearch-sharepoint-connector-latest_version.zip $ cd google-cloudsearch-sharepoint-connector-latest_version
Step 2. Create the SharePoint On-Prem connector configuration file
In the same directory as the connector installation, create a file. Google recommends that you name the file
connector-config.properties
so no additional command-line parameters are required to run the connector. If you plan to run many connector instances, add details to the name to distinguish it.Add parameters as key/value pairs to the file contents, as in the following example:
### Sharepoint On-Prem Connector configuration ### # Required parameters for data source access api.sourceId=08ef8becd116faa4546b8ca2c84b2879 api.serviceAccountPrivateKeyFile=service_account.json api.identitySourceId=08ef8becd116faa475de26d9b291fed9 # Required parameters for SharePoint on-premises access sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection sharepoint.siteCollectionOnly=true sharepoint.username=contoso\\admin sharepoint.password=pa$sw0rd sharepoint.stripDomainInUserPrincipals=true # Required parameters for AD lookup adLookup.host=dc.contoso.com adLookup.username=contoso\\admin adLookup.password=pa$sw0rd api.referenceIdentitySources=CONTOSO,contoso api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa # Optional parameters for schema mapping contentTemplate.sharepointItem.title=Title contentTemplate.sharepointItem.unmappedColumnsMode=APPEND
For detailed descriptions of each parameter, go to the configuration parameters reference.
(Optional) Configure additional connector parameters, as needed. For details, go to Google-supplied connector parameters.
Step 3. For HTTPS, add SharePoint as a trusted host
If SharePoint is configured to use HTTPS, get a SharePoint certificate to add it as a trusted host for the connector.
On the computer that will run the connector, open a browser and go to SharePoint.
In the warning page that opens, click I Understand the Risks and Add Exception. The page shows a message such as "This Connection is Untrusted" because the certificate is self-signed and not signed by a trusted Certificate Authority.
Once the View button is available, click it.
Go to the Details tab and click Export.
Save the certificate in the connector directory with the name
sharepoint.crt
.Click Close then Cancel to close the windows.
Open a command prompt and enter the following command:
$ keytool -importcert -keystore cacerts.jks -storepass changeit -file sharepoint.crt -alias sharepoint
When prompted "Trust this certificate?", answer yes.
Step 4. Set up logging
In the directory that contains the connector binary, create a folder named
logs
.In the same directory (not
logs
), create a Latin1-encoded file namedlogging.properties
.Add the following text to
logging.properties
:handlers = java.util.logging.ConsoleHandler,java.util.logging.FileHandler # Default log level .level = INFO # uncomment line below to increase logging level for SharePoint APIsa #com.google.enterprise.cloudsearch.sharepoint.level=FINE # uncomment line below to increase logging level to enable API trace #com.google.api.client.http.level = FINE java.util.logging.ConsoleHandler.level = INFO java.util.logging.FileHandler.pattern=logs/connector-sharepoint.%g.log java.util.logging.FileHandler.limit=10485760 java.util.logging.FileHandler.count=10 java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
Step 5. Configure the SharePoint On-Prem identity connector
This step is required to apply SharePoint On-Prem identity-based ACLs to search results. If you set up the connector with public ACLs, you can skip this step.
In the same directory as the SharePoint Online connector installation, create a file and name it
sharepoint-onprem-identity-connector.config
.Add parameters as key/value pairs to the file contents, as in the following example:
### SharePoint On-prem identity connector configuration ### # Required parameters for data source access api.customerId=C05d3djk8 api.serviceAccountPrivateKeyFile=service_account.json api.identitySourceId=08ef8becd116faa475de26d9b291fed9 # Required parameters for SharePoint access sharepoint.server=http://sp-2016:32967/sites/doc-center-site-collection sharepoint.siteCollectionOnly=true sharepoint.username=contoso\\admin sharepoint.password=pa$sw0rd sharepoint.stripDomainInUserPrincipals=true # Required parameters for AD lookup adLookup.host=dc.contoso.com adLookup.username=contoso\\admin adLookup.password=pa$sw0rd api.referenceIdentitySources=CONTOSO,contoso api.referenceIdentitySource.contoso.id=08ef8becd116faa5d3783f8c5a80e5aa api.referenceIdentitySource.CONTOSO.id=08ef8becd116faa5d3783f8c5a80e5aa
The values are almost the same as for the SharePoint On-Prem connector, except that instead of
api.sourceId
, the parameter isapi.customerId
. The value ofapi.customerId
is the customer ID that you got from your Google Workspace admin.
Step 6. Launch the SharePoint On-Prem connector
In the following steps, you map the principals in both the on-premises Active Directory and the SharePoint site collection to identities in the Cloud Identity service. This synchronization is done with Google Cloud Directory Sync (GCDS) and the SharePoint On-Prem identity connector.
After GCDS synchronizes users and groups, to synchronize the SharePoint site collection groups, run the SharePoint On-Prem identity connector. Lastly, run the SharePoing On-Prem connector to index and serve results to your Cloud Search users.
If you haven't already, configure and run GCDS. Make sure to enable identity mapped groups.
Run the SharePoint On-Prem identity connector:
$ java -Djava.util.logging.config.file=logging.properties -cp "google-cloudsearch-sharepoint-connector-version.jar" com.google.enterprise.cloudsearch.sharepoint.SharePointIdentityConnector -Dconfig=sharepoint-onprem-identity-connector.config
Run the SharePoint On-Prem connector. Use the command syntax for your SharePoint site security:
HTTP (no trusted host required):
$ java -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar
HTTPS (add SharePoint as the trusted host):
$ java -Djavax.net.ssl.trustStore=cacerts.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=changeit -Djava.util.logging.config.file=logging.properties -jar google-cloudsearch-sharepoint-connector-v1-version.jar
Configuration parameters reference
Data source access
Setting | Parameter |
Data source ID | api.sourceId=1234567890abcdef
Required. The Google Cloud Search data source ID set up by the Google Workspace administrator. |
Path to the service account private key file | api.serviceAccountPrivateKeyFile=PrivateKey.json
Required. The path to the Google Cloud Search service account key file. |
SharePoint on-premises access
Setting | Parameter |
SharePoint server URL | sharepoint.server=http://yoursharepoint.example.com/
Required. The URL of the SharePoint server as a fully-qualified host name, such as http://yoursharepoint.example.com/. If the host name is not fully-qualified, then you must set DNS override on the connector host. |
SharePoint username | sharepoint.username=YOURDOMAIN\\ConnectorUser
Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain. |
SharePoint password | sharepoint.password=user_password
Required when you run the connector on Linux or on a windows machine that is not part of the SharePoint Server AD domain. |
Use Live Authentication to connect to SharePoint | sharepoint.username=AdaptorUser Live Authentication Id
|
Use ADFS Authentication to connect to SharePoint | sharepoint.username=AdaptorUser@yourdomain.com
|
Site collection indexing
Setting | Parameter |
Index type | sharepoint.siteCollectionOnly=boolean
Optional, except for multi-tenant SharePoint deployments (learn more). Set to true to have the connector index |
SharePoint Identity Mapping
Setting | Parameter |
Identity Source ID | api.identitySourceId=1234567890abcdef
Required. Identity source ID for syncing SharePoint Local Groups.The Google Cloud Search source ID set up by the Google Workspace administrator, as described in Add a data source to search. |
Reference Identity Sources | api.referenceIdentitySources=CONTOSO,contoso
A comma-delimited list of reference identity sources for active directory principals. The value matches Active Directory NETBIOS name of the reference active directory principals. |
Reference Identity Source IDs | api.referenceIdentitySource.DOMAIN.id=identity-source-id
Required. The Identity Source ID for syncing Active Directory principals. |
Active Directory Lookup
Setting | Parameter |
Active Directory Host | adLookup.host=host
Required. Active directory hostname, such as dc.contoso.com, or IP address. |
Active Directory lookup port | adLookup.port=port
Optional. Default is 389. Use 686 for SSL. |
Active Directory lookup method | adLookup.method=value
Optional. Default is `standard`. For HTTPS connections, set to `ssl`. |
Active Directory lookup user | adLookup.username=CONTOSO\user1
Required. User authorized to perform active directory lookups. |
Active Directory lookup password | adLookup.password=password123
Required. Password for user specified by |
HTML content generation
Setting | Parameter |
HTML template title field | contentTemplate.sharePointItem.title=Title
The SharePoint field to use as the HTML template title for generated HTML. |
HTML content high search quality fields | contentTemplate.sharePointItem.quality.high=highField1[,highField2,...]
A comma-separated list of fields to include in the generated HTML as high-quality fields. When the search query terms match these fields, the results are ranked higher. |
HTML content medium search quality fields | contentTemplate.sharePointItem.quality.medium=mediumField1[,mediumField2,...]
A comma-separated list of fields to include in the generated HTML as medium-quality fields. |
HTML content low search quality fields | contentTemplate.sharePointItem.quality.low=lowField1[,lowField2,...]
A comma-separated list of fields to include in the generated HTML as low-quality fields. |
HTML content unmapped columns | contentTemplate.sharepointItem.unmappedColumnsMode=APPEND
How the connector handles unmapped columns. Value is APPEND (default) or IGNORE.
|