In 2023, it is extremely common for new startups to grow from tiny amounts of data, to millions (or billions) of records very quickly. Maybe they are acquiring a large volume of customers through a digital storefront at an extremely fast pace. Or maybe they are streaming application events into an event log at a very high velocity. Or maybe they’re just trying to maintain a large variety of tables, objects, or unstructured text data.
Whatever the case, companies can quickly find themselves outgrowing their MySQL, Postgres, or SQL Server infrastructures and scrambling to keep up with data demands from consumers and internal stakeholders alike. These transactional SQL databases that power customer-facing websites and apps can quickly be brought to a slow crawl by innocent-seeming queries from data-hungry Tableau analysts. And efforts to perform database-tuning (like indexes) or implement caching layers (like memcached) can quickly mushroom into enormous daily engineer tasks.
Medium-Sized Databases Can Quickly Slow to A Crawl
Above, we see a very common architecture. A consumer-facing website, powered by MySQL on Amazon RDS. Plus an internal analyst building some Tableau dashboards to analyze the data in MySQL. The database contains a record for each employee, some simple metadata (ex. Name, Birth Date, Hire Date), what department they work for, who they report to, their salary, and how those details have changed over time.
The analyst appears to be making a simple query to fetch every employee and their salary history. But joining a 300k row table with a 2.7M row table quickly overloads the MySQL server, taking over 60 seconds to respond. What’s worse, is the analyst drags and drops Tableau components to create a dashboard that shows the trend of male & female employees over the years. To respond to the analysts’s request, the Tableau engine automatically generates a complex and inefficient SQL query that’s passed onto MySQL. This new query takes almost 30 seconds to complete. It’s also tied to an Age Range Filter that - every time the analyst changes the Age Range, the query has to run again.
What’s worse, in addition to keeping the analyst team waiting around for answers, the query itself eats up precious compute resources needed to power the consumer-facing website, which quickly degrades in performance. Consumers hitting the website at the same time will see slow load times and the company may see users abandoning purchases they may have otherwise made on the website. Without a talented DBA to reindex this MySQL database to handle these new queries, or a dedicated devops team to maintain an in-memory caching layer, this problem can have a major negative impact on the business.
That is, until the introduction of PolyScale.
PolyScale Serverless Edge Cache
PolyScale is a plug-and-play serverless database Cache-as-a-Service (CaaS) that reduces global query latency for databases. Using PolyScale, query execution for database reads is distributed and cached, seamlessly scaling your current database without writing code or altering query or transactional semantics.
With PolyScale, a team can easily spin up a distributed PolyScale CaaS just by signing up through the website, adding their database credentials, and pointing their Tableau client to the Polyscale endpoint. There are no servers to deploy, no linux applications to install, no database tables to tune, and no caching layer to constantly “refresh”. Any data queried by Tableau becomes automatically cached across PolyScale’s global distributed caching service, and suddenly Tableau queries that used to take minutes to complete, now execute in sub-millisecond time frames. Data is quickly read from PolyScale, instead of performing full table scans and searching raw text on your production database. With PolyScale, teams can perform a massive upgrade to their data stack in a matter of minutes without getting snarled up in months-long engineering rollouts.
Setting up PolyScale to Address Performance Problems
Instead of having to tie up expensive resources like Database Administrators and DevOps Engineers for weeks- or months-long technology projects, teams can implement a fix in a matter of minutes. In fact, in just a couple of clicks and a couple of screens, the Tableau dashboard now returns in 11 milliseconds instead of 60 seconds, and the production website is unaffected. Since PolyScale is smart enough to “refresh” any new data from the raw MySQL tables into its cache, there’s no need to spend time manually maintaining the cache. And since PolyScale’s cache is distributed globally, Tableau users can request data from anywhere in the world and won’t experience latency usually associated with globally distributed data. It’s like having a no-code CDN for your MySQL server.
Caching Caveats and Best Practices
Caching does have some caveats that are helpful to know about when your team starts planning your analytics architecture. For example, Tableau dashboards that allow users to filter using data ranges or arbitrary values, it can be easy to accidentally query for uncached results. Our example above that allows Tableau users to filter employees by age is one of those scenarios. If we change the Age Range filter from 60-to-70 to 65-to-75, the underlying SQL will change, and since PolyScale does not already have these results, Polyscale will need to query MySQL to retrieve them.
In these scenarios it may be helpful to limit the amount of filtering options that a Tableau user can perform, thus utilizing the cache efficiently.
If you’d like to try out this workbook yourself, you can download a copy here (link to come). Or, you can sign up for your own free PolyScale account here (no credit card required). Connect your database to a cache and spend less time in your Tableau workbook waiting for data.