Partitioning Secondary Indexes by Term – What is it?

Rather than each partition having its own secondary index (local index). We can construct a global index that covers data in all partitions.

👉 However, we cannot just store that index on just one node!
👉 Since it will likely bottleneck and defeat the purpose of partitioning!

Global Index and term partitioning?

A global index must also be partitioned. But it can be partitioned differently from the primary key index.

Here is how this might work…

Red cars from all partitions appear under the colour red in the index 🚗 🚗 🚗
But the index is partitioned so that the colours starting with
- The letters A to R appear in partition 0
- The letters S to Z in partition 1

The index on the make of car is partitioned similarly…
- With the partition boundary being F and H

We call this type of partition term partition.

Because the term we are looking for determine the partition of the index
Here the term would be the colour red for example
The name term comes from full text indexes
- A particular kind of secondary index
- Where the terms are all the words that occur in a document

How to apply term partitioning?

As before we can partition the index by the term itself or using the hash of the term.

Partitioning by term itself can be useful for range scans
- For example numeric property such as the asking price of the car
Where as partitioning on the hash of the term this gives a more even distribution of load (as explain in earlier blog on hashing)

Term partitioning and range scans

Partitioning by the term itself can be useful for range scans..

For example
- On a numeric property such as the asking price of the car

Where as partitioning on the hash of the term gives a more even distribution load…

Global term partition index vs Document partition index

The advantage of global term partition index over a document partitioned index…

✅ Is that it can make reads more efficient!
✅ Rather than doing scatter gather over all partitions

The client only needs to make a request to the partition containing the term that it wants…

However the downside of a global index is that…

❌ Writes are slower and more complicated
❌ Because a write to a single document may not affect multiple partitions of the index.
❌ Every term in the document might be on a different partition, on a different node! 🤯

In an ideal world the index would always be up to date…

And every document written to the database would immediately be reflected in the index.
🤔 However, in a term partition index, that would require a distributed transaction across all partitions affected by a write
- ❌ Which is not supported by all databases!

Asynchronous global term partitioning

In practice update to global secondary indexes are often asynchronous.

⚠️ This means if you read the index shortly after a write, the change you just made may not be reflected in the index…

For example:

Amazon DynamoDB
- States that its global secondary indexes are updated in a fraction of a second in normal circumstances…
- But may experience longer propagation delays in cases of fault in the infrastructure

Other uses of global term partition indexes:

Riak search feature
Oracle data warehouse
- Which lets you choose between local and global indexing

📚 Further Reading & Related Topics

If you’re exploring partitioning and secondary indexes by term, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partitioning strategies based on terms differ from proportional partitioning, and how they impact data distribution and access.

• How Does Partitioning Work When Requests Are Being Routed? – Explore how term-based partitioning and request routing influence system performance and data retrieval efficiency in distributed systems.

Scalable Human Blog