PolyScale is a plug-and-play global database cache that reduces global query latency for databases. Using PolyScale, database reads can be distributed and cached, seamlessly scaling your current database without writing code or altering query or transactional semantics.
In this article the details of how PolyScale’s cache is configured are described. The parameters of this configuration are then mapped to the KPIs of cache performance using a simple but powerful statistical model. This analysis yields rules of thumb that can be used to optimize the effectiveness of PolyScale’s cache. Then several ways in which these rules can be applied and enhanced are discussed.
Caching in PolyScale is controlled by configurations. Cache configurations give fine-grained control over what is cached and for how long. Each entry in the configuration specifies two things: a template and a time-to-live (TTL).
The template determines which queries will be cached. A single template groups together multiple similar queries. The template specifies a query pattern and any queries matching that pattern will be cached. For example, the following template matches queries for the name and age of people where the name has some specific value.
SELECT name, age FROM people WHERE name = [?];
Any query matching this general form will be separately cached. For example each of
SELECT name, age FROM people WHERE name = ‘Alice’;
SELECT name, age FROM people WHERE name = ‘Bob’;
would be separately cached. Note that the template is sensitive to the columns selected, the specific parameters of the WHERE clause, and any additional SQL. The following examples would not be cached under the example template due to the indicated difference from the template,
SELECT **age, name** FROM people WHERE name = ‘Alice’;
SELECT name, age, **address** FROM people WHERE name = ‘Alice’;
SELECT name, age FROM people WHERE name = ‘Bob’ **AND age > 20**;
SELECT name, age FROM people WHERE name = ‘Bob’ **ORDER BY name**;
Query templates can be constructed for queries of arbitrary complexity and can include multiple wildcard parameters. In order to manage this, PolyScale automatically creates query templates from the span of queries it observes flowing through its connection. This summarization to the template level makes configuring the cache simpler and forms the basis for the statistical analysis of query traffic.
The second component of the cache configuration is the TTL parameter. The TTL specifies for how long a query result is cached. Any new queries that match the query template and occur within the TTL period will be served directly from the cache with no need to recalculate their result on the database server.
The figure shows a hypothetical timing diagram for a sequence of queries. The initial cache write occurs at 12:35:00 with a TTL of 40 seconds. Subsequently, three more queries arrive within the TTL window and so are cache hits. Finally, another query arrives outside the TTL window and it results in a miss and a new cache write. The execution times for the cache hits are of course considerably smaller than for the cache writes. Note that for now we make the simplifying assumption that uncached query execution times are much smaller than inter-query arrival times.
The inter-query arrival times called out in the figure are driven by external factors and importantly by the exact queries captured by the query template. It is the interplay between the TTL and the inter-query arrival times that drives the cache efficiency.
The inter-query arrival times are likely to have complex statistical properties. They may have multiple seasonalities (time of day, day of the week, etc.). They may also be subject to unpredictable bursts of activity or sudden dry spells. While these details are certainly important, at short time scales, we can make the simplifying assumption that inter-query arrival times follow a homogeneous Poisson process. Poisson processes are a complex and mathematically sophisticated topic. Fortunately, in their simplest form, they are easy to understand but still retain much of their predictive power.
The homogeneous Poisson process summarizes the query arrival dynamics with a single parameter, the rate of query arrival, denoted 𝞴. The individual arrivals are random with an exponential distribution but on average they arrive at the rate 𝞴. For example, 𝞴 = 0.1 / sec means queries arrive on average once every ten seconds. The queries depicted in the figure above arise from a Poisson process with 𝞴 = 0.1 / sec. Such a model may seem overly simplistic but it actually is useful and often quite accurate because it is the limiting case under a broad set of circumstances.
Using this model we can derive the statistical properties of the cache. To begin, let’s consider the hit rate or the fraction of queries that are cache hits versus cache misses. Intuitively the hit rate should equal the expected number of arrivals during a period of time equal to the TTL, divided by that plus one, the hits plus the one miss that resulted in the write. That is indeed the case,
In the following plot, the hit rate is plotted against 𝞴 TTL. This plot highlights the useful rule of thumb that the TTL should be set to 2 times the expected arrival rate or more to achieve a hit rate of 66% or better.
Another important statistical property of the cache is the age of a cache hit. This is the difference in time between when the query that resulted in the hit arrived and the time when the cached value was written. Under a homogeneous Poisson process, the age of any specific cache hit is uniformly distributed over the TTL. That is, the age is equally likely to be any time between zero and the TTL. This makes it very easy to calculate statistics of the age of cache hits. For example, a simple rule of thumb is that the average age of a cache hit is half the TTL. For some applications, the concern may be the upper limits of the age, this too can be calculated easily. The 99th percentile age of a cache hit is 0.99 times the TTL, or approximately, equal to the TTL. Reasoning about the age of cache hits under the homogeneous Poisson process is particularly straightforward.
Closing (the loop)
PolyScale’s cache provides two primary benefits: improved query latency and reduced database workload. To realize these benefits, the cache needs to be configured. To configure the cache well, the pattern of queries arriving at the database must be understood. While in the simplest of cases, this understanding may be available, in general, acquiring this understanding is an ongoing data-driven statistical process.
One of the most important properties of query traffic is the inter-query arrival rate. This arrival rate dictates how best to configure the cache. PolyScale will automatically measure and summarize arrival rates across query templates. These measurements can then be used to configure the cache using the simple rules of thumb that TTL should be at least double the inter-query arrival rate, and that the resulting cache hits will have an age uniformly distributed over this TTL time.
When the query arrival patterns are more complex, with significant short-term ebbs and flows, managing the cache configuration exceeds the capabilities of a human. Something more must be done to configure the cache well. PolyScale aims to automate the configuration of the cache by modelling the statistical properties of query arrival and execution times on a real-time basis.
Future PolyScale uses analysis of query arrival patterns and execution times, from which the most effective cache configuration is determined automatically. The analysis and optimization run continuously in the background. At any time, the most up-to-date recommendations on optimal cache configurations are available and ready to be applied.
PolyScale requires no code to write, no infrastructure to manage, and no changes to your current database. Automating the cache configuration closes the loop on a true plug-and-play database caching solution.
Ready to try PolyScale.ai? Signup for an always free account here (no credit card required) or try out the live interactive demo playground.