PolyScale is a plug-and-play global database cache that reduces global query
latency for databases. Using PolyScale, database reads can be distributed and
cached, seamlessly scaling your current database without writing code or
altering query or transactional semantics.
In this article the details of how PolyScale’s cache is configured are
described. The parameters of this configuration are then mapped to the KPIs of
cache performance using a simple but powerful statistical model. This analysis
yields rules of thumb that can be used to optimize the effectiveness of
PolyScale’s cache. Then several ways in which these rules can be applied and
enhanced are discussed.
Caching in PolyScale is controlled by configurations. Cache configurations give
fine-grained control over what is cached and for how long. Each entry in the
configuration specifies two things: a template and a time-to-live (TTL).
The template determines which queries will be cached. A single template groups
together multiple similar queries. The template specifies a query pattern and
any queries matching that pattern will be cached. For example, the following
template matches queries for the name and age of people where the name has some
SELECT name, age FROM people WHERE name = [?];
Any query matching this general form will be separately cached. For example each
SELECT name, age FROM people WHERE name = ‘Alice’;SELECT name, age FROM people WHERE name = ‘Bob’;
would be separately cached. Note that the template is sensitive to the columns
selected, the specific parameters of the WHERE clause, and any additional SQL.
The following examples would not be cached under the example template due to the
indicated difference from the template,
SELECT **age, name** FROM people WHERE name = ‘Alice’;SELECT name, age, **address** FROM people WHERE name = ‘Alice’;SELECT name, age FROM people WHERE name = ‘Bob’ **AND age > 20**;SELECT name, age FROM people WHERE name = ‘Bob’ **ORDER BY name**;
Query templates can be constructed for queries of arbitrary complexity and can
include multiple wildcard parameters. In order to manage this, PolyScale
automatically creates query templates from the span of queries it observes
flowing through its connection. This summarization to the template level makes
configuring the cache simpler and forms the basis for the statistical analysis
of query traffic.
The second component of the cache configuration is the TTL parameter. The TTL
specifies for how long a query result is cached. Any new queries that match the
query template and occur within the TTL period will be served directly from the
cache with no need to recalculate their result on the database server.
The figure shows a hypothetical timing diagram for a sequence of queries. The
initial cache write occurs at 12:35:00 with a TTL of 40 seconds. Subsequently,
three more queries arrive within the TTL window and so are cache hits. Finally,
another query arrives outside the TTL window and it results in a miss and a new
cache write. The execution times for the cache hits are of course considerably
smaller than for the cache writes. Note that for now we make the simplifying
assumption that uncached query execution times are much smaller than inter-query
The inter-query arrival times called out in the figure are driven by external
factors and importantly by the exact queries captured by the query template. It
is the interplay between the TTL and the inter-query arrival times that drives
the cache efficiency.
The inter-query arrival times are likely to have complex statistical properties.
They may have multiple seasonalities (time of day, day of the week, etc.). They
may also be subject to unpredictable bursts of activity or sudden dry spells.
While these details are certainly important, at short time scales, we can make
the simplifying assumption that inter-query arrival times follow a homogeneous
Poisson process. Poisson processes are a complex and mathematically
sophisticated topic. Fortunately, in their simplest form, they are easy to
understand but still retain much of their predictive power.
The homogeneous Poisson process summarizes the query arrival dynamics with a
single parameter, the rate of query arrival, denoted 𝞴. The individual arrivals
are random with an exponential distribution but on average they arrive at the
rate 𝞴. For example, 𝞴 = 0.1 / sec means queries arrive on average once every
ten seconds. The queries depicted in the figure above arise from a Poisson
process with 𝞴 = 0.1 / sec. Such a model may seem overly simplistic but it
actually is useful and often quite accurate because it is the limiting case
under a broad set of circumstances.
Using this model we can derive the statistical properties of the cache. To
begin, let’s consider the hit rate or the fraction of queries that are cache
hits versus cache misses. Intuitively the hit rate should equal the expected
number of arrivals during a period of time equal to the TTL, divided by that
plus one, the hits plus the one miss that resulted in the write. That is indeed
In the following plot, the hit rate is plotted against 𝞴 TTL. This plot
highlights the useful rule of thumb that the TTL should be set to 2 times the
expected arrival rate or more to achieve a hit rate of 66% or better.
Another important statistical property of the cache is the age of a cache hit.
This is the difference in time between when the query that resulted in the hit
arrived and the time when the cached value was written. Under a homogeneous
Poisson process, the age of any specific cache hit is uniformly distributed over
the TTL. That is, the age is equally likely to be any time between zero and the
TTL. This makes it very easy to calculate statistics of the age of cache hits.
For example, a simple rule of thumb is that the average age of a cache hit is
half the TTL. For some applications, the concern may be the upper limits of the
age, this too can be calculated easily. The 99th percentile age of a cache hit
is 0.99 times the TTL, or approximately, equal to the TTL. Reasoning about the
age of cache hits under the homogeneous Poisson process is particularly
Closing (the loop)
PolyScale’s cache provides two primary benefits: improved query latency and
reduced database workload. To realize these benefits, the cache needs to be
configured. To configure the cache well, the pattern of queries arriving at the
database must be understood. While in the simplest of cases, this understanding
may be available, in general, acquiring this understanding is an ongoing
data-driven statistical process.
One of the most important properties of query traffic is the inter-query arrival
rate. This arrival rate dictates how best to configure the cache. PolyScale will
automatically measure and summarize arrival rates across query templates. These
measurements can then be used to configure the cache using the simple rules of
thumb that TTL should be at least double the inter-query arrival rate, and that
the resulting cache hits will have an age uniformly distributed over this TTL
When the query arrival patterns are more complex, with significant short-term
ebbs and flows, managing the cache configuration exceeds the capabilities of a
human. Something more must be done to configure the cache well. PolyScale aims
to automate the configuration of the cache by modelling the statistical
properties of query arrival and execution times on a real-time basis.
Future PolyScale uses analysis of query arrival patterns and execution times,
from which the most effective cache configuration is determined automatically.
The analysis and optimization run continuously in the background. At any time,
the most up-to-date recommendations on optimal cache configurations are
available and ready to be applied.
PolyScale requires no code to write, no infrastructure to manage, and no changes
to your current database. Automating the cache configuration closes the loop on
a true plug-and-play database caching solution.
Ready to try PolyScale.ai? Sign up for a free account here (no credit card required). Or try out PolyScale without bringing your own database with our live interactive demo playground.
Read the Quick Start Guide to understand how easily PolyScale can be integrated without code and without deploying servers.