Depending on your inventory, sharding (or breaking up feeds into multiple files) may be necessary.
When to use sharding
Feed exceeds 200 MB for 1 file (after gzip compression).
- Example: Generated availability feed is 1 GB. This should be sharded to 5+ separate files (or shards).
Partner inventory is distributed across systems and/or regions resulting in difficulty reconciling the inventory.
- Example: Partner has US and EU inventory that live in separate
systems. The feed may be generated with 2 files (or shards), 1 for US,
and 1 for EU with the same
nonce
andgeneration_timestamp
.
- Example: Partner has US and EU inventory that live in separate
systems. The feed may be generated with 2 files (or shards), 1 for US,
and 1 for EU with the same
General rules
- Each shard cannot exceed 200 MB for 1 file (after gzip compression).
- We recommend no more than 20 shards per feed. If you have a business justification that requires more than that amount, please contact support for further instruction.
-
Individual records (one
Merchant
object for example) must be sent in one shard, they cannot be split across multiple shards. However, they don't have to be sent in the shard with the sameshard_number
for future feeds. - For better performance, your data should be split evenly among the shards so that all sharded files are similar in size.
How to shard feeds
For each file (or shard), set the FeedMetadata
to the
following:
processing_instruction
set toPROCESS_AS_COMPLETE
.shard_number
set to to the current shard of the feed (starting from 0 tototal_shards
- 1 without discontinuities)total_shards
set to the total number of shards for the feed (starting from 1).nonce
set to a unique identifier that is the same across all shards of the same feed but different from the value of other feeds.nonce
must be a positive int (uint64
).generation_timestamp
is the timestamp in unix and EPOCH format. This should be the same across all shards of the feed.
Recommended: For each file (or shard), set the filename to indicate the feed type, the timestamp, the shard number, and the total number of shards. Shards should be roughly equal in size and are processed once all shards are uploaded.
Example:
“availability_feed_1574117613_001_of_002.json.gz”
Sharded Availability feed example
Shard 0
{ "metadata": { "processing_instruction": "PROCESS_AS_COMPLETE", "shard_number": 0, "total_shards": 3, "nonce": 111111, "generation_timestamp": 1524606581 }, "service_availability": [ { "availability": [ { "spots_total": 1, "spots_open": 1, "duration_sec": 3600, "service_id": "1000", "start_sec": 1577275200, "merchant_id": "merchant1", "confirmation_mode": "CONFIRMATION_MODE_SYNCHRONOUS" } ] } ] }
Shard 1
{ "metadata": { "processing_instruction": "PROCESS_AS_COMPLETE", "shard_number": 1, "total_shards": 3, "nonce": 111111, "generation_timestamp": 1524606581 }, "service_availability": [ { "availability": [ { "spots_total": 1, "spots_open": 1, "duration_sec": 3600, "service_id": "1000", "start_sec": 1577620800, "merchant_id": "merchant2", "confirmation_mode": "CONFIRMATION_MODE_SYNCHRONOUS" } ] } ] }
Shard 2
{ "metadata": { "processing_instruction": "PROCESS_AS_COMPLETE", "shard_number": 2, "total_shards": 3, "nonce": 111111, "generation_timestamp": 1524606581 }, "service_availability": [ { "availability": [ { "spots_total": 1, "spots_open": 1, "duration_sec": 3600, "service_id": "1000", "start_sec": 1576670400, "merchant_id": "merchant3", "confirmation_mode": "CONFIRMATION_MODE_SYNCHRONOUS" } ] } ] }
Using sharding for partner distributed inventory
It can be challenging for partners to consolidate inventory distributed across multiple systems and or regions into a single feed. Sharding can be used to resolve reconciliation challenges by setting each shard to match each distributed system’s inventory set.
For example, say a partner’s inventory is separated into 2 regions (US and EU inventory), which live in 2 separate systems.
The partner can break each feed into 2 files (or shards):
- Merchants feed: 1 shard for US, 1 shard for EU
- Services feed: 1 shard for US, 1 shard for EU
- Availability feed: 1 shard for US, 1 shard for EU
Follow the steps below to ensure the feeds are properly processed:
- Decide on an upload schedule, and configure each instance of inventory to follow the schedule.
- Assign unique shard numbers for each instance (e.g. US = N, EU = N + 1).
Set
total_shards
to the total number of shards. - At each scheduled upload time, decide on a
generation_timestamp
andnonce
. In theFeedMetadata
, set all instances to hold the same values for these two fields.generation_timestamp
should be current or recent past (ideally, the partner’s read-at database timestamp)
- After all shards are uploaded, Google groups the shards via
generation_timestamp
andnonce
.
Google will process the feed as one even though each shard represents a
different region of the partner’s inventory and could be uploaded at a
different time of the day as long as the generation_timestamp
is the same across all shards.
Sharded Availability feed example by region
Shard 0 - US Inventory
{ "metadata": { "processing_instruction": "PROCESS_AS_COMPLETE", "shard_number": 0, "total_shards": 2, "nonce": 111111, "generation_timestamp": 1524606581 }, "service_availability": [ { "availability": [ { "spots_total": 1, "spots_open": 1, "duration_sec": 3600, "service_id": "1000", "start_sec": 1577275200, "merchant_id": "US_merchant_1", "confirmation_mode": "CONFIRMATION_MODE_SYNCHRONOUS" } ] } ] }
Shard 1 - EU Inventory
{ "metadata": { "processing_instruction": "PROCESS_AS_COMPLETE", "shard_number": 1, "total_shards": 2, "nonce": 111111, "generation_timestamp": 1524606581 }, "service_availability": [ { "availability": [ { "spots_total": 1, "spots_open": 1, "duration_sec": 3600, "service_id": "1000", "start_sec": 1577620800, "merchant_id": "EU_merchant_1", "confirmation_mode": "CONFIRMATION_MODE_SYNCHRONOUS" } ] } ] }