Amazon RDS - Reducing Costs and Increasing Performance with Serverless Database Caching

Ben Hagan
Nov 30, 2022

Arguably, Amazon’s Relational Database Service (Amazon RDS) set the benchmark with regard to ease of database setup and management, offering a broad array of features for deploying highly available, secure and scalable database services across geographic regions. RDS provides resizable capacity while automating time-consuming administrative tasks such as hardware provisioning, database setup, patching, and backups. RDS supports multiple databases with an ever growing list of compute resources and optimizations available.

In this blog post, we discuss how PolyScale (a serverless database edge cache) can augment an RDS architecture to increase query performance and concurrency, reduce latency and significantly lower the total cost of ownership (TCO).

PolyScale amazon rdsM

Introducing PolyScale

PolyScale is an intelligent, serverless database edge cache. Using PolyScale, you can scale and distribute your database compute and storage, dramatically lowering latency for data-driven applications.

PolyScale can be plugged into any Postgres, MySQL, MariaDB or Microsoft SQL Server database in a matter of minutes. It intelligently caches the result of read queries (DQL) and repeat queries then execute on PolyScale rather than at the database. PolyScale inspects SQL traffic for caching (rather than tailing Write Ahead Logs etc) and hence does not place any resource overhead on the database. Write traffic (DML) simply passes through PolyScale back to the database, so nothing changes with regards to write consistency.

PolyScale is also a global cache, utilizing a network of distributed regions so multi-region workloads can easily be supported.

With that context, let’s dive into some of the common cost and performance issues incurred with RDS and explore how PolyScale can be utilized to build a cost efficient and highly scalable database architecture.

RDS Provisioning and Sizing - Subscription Based Pricing

RDS offers tremendous flexibility with regards to provisioning and sizing of database instances. Users can choose from an ever-growing list of CPU, disk, memory and network configurations that scale from tiny “micro” instances to large 96+ vCPU’s.

rds instance sizing
RDS administrators are often forced to over provision

Of course, it’s not always clear what performance and capacity is needed, let alone if that workload is fluctuating. Sizing a database could be an entire blog post (or book!) in its own right but basic sizing questions force a DBA or DevOps engineer to set enough disk capacity for data storage (based on ingest expectations) and enough memory and compute to satisfy the concurrency load and query complexity at peak times. This is typically based on an internal SLA that will define performance expectations as well as HA requirements. Again, each of those are complex topics and hence inaccuracy or over-provisioning often occurs.

This “always-on” compute model, by its very nature, is far from optimal whereby RDS instances do not autoscale up and down to satisfy resource needs at that moment in time. For example, an ecommerce application may see periods of high activity during the day, and little to no usage during the night.

No matter the workload’s demands, the RDS instance will be running 24 x 7 with a predetermined set of resources unless manually adjusted. Any changes to the resourcing however typically incur downtime making changes complex when in production. Idle CPU costs the same as fully utilized CPU and that has a significant, ongoing impact on budgets.

rds cpu utilization
Idle CPU cycles incur the same compute costs as fully utilized

This “subscription” based pricing is commonplace in cloud computing however, many consumers are demanding better cost optimizations and vendors are working to utilize compute resources more efficiently.

Serverless Architectures and Consumption Based Pricing

“Serverless” is a cloud computing paradigm that focuses on allocating machine resources on demand. There are two primary (and others as described below) principles to consider:

  • Consumption-based pricing - only pay for what you use.
  • No server management - an application can be deployed, auto-scale, be secure and be fully managed, without deploying servers or even needing to understand the concept of the underlying hardware running the service.

As noted, there are other properties that define “serverless”, depending on the vendor and/or application, such as scaling to zero, whereby when the code is not running, no charges are incurred. High availability is another, whereby the application includes built in availability and fault tolerance. Across the AWS product range, these 3 or 4 properties tend to fluctuate a little as described in this article.

Serverless architectures move the pricing dynamic from subscription based pricing to consumption-based, which yields significant cost saving benefits to consumers. Instead of the familiar “always-on” billing, services are billed based on actual usage. AWS has multiple serverless services available, and specifically in the context of this blog, Aurora Serverless V2, DynamoDB, Neptune Serverless and RDS Proxy are relevant. Aurora Serverless for example, provides “an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application’s needs.”

RDS Instance Reduction with PolyScale

PolyScale can be connected to your current database so that repeatable read queries execute at PolyScale, rather than the database. For most cases, this decreases costs by reducing the database instance size. Of course, how much you can reduce the cost depends on the use case,i.e. what percentage of the database traffic is reads vs writes. If you are unsure what your database traffic looks like, you can plug in PolyScale and it will provide an analysis of that traffic automatically, so it’s easy to see what value can be gained:

polyscale traffic breakdown
PolyScale provides a summary of the traffic, breaking down what percentage of the traffic is cacheable

With the introduction of PolyScale, a percentage of the reads are no longer served by the database. This frees up resources for other tasks (such as writes) and also increases query performance. Any cached reads served by PolyScale execute sub-millisecond with linear scalability.

PolyScale’s SaaS platform utilizes a serverless architecture. It will auto-scale to meet demand in real time, so you don’t have to manage infrastructure. Users are only charged for what they use i.e. the consumption based-model discussed above.

This architecture can be thought of in simple terms as removing a percentage of the database (by downsizing the infrastructure) and replacing it with a serverless component (PolyScale). The cost benefits will depend on what portion of your traffic is read and cacheable. As noted above, this can be quickly ascertained by connecting PolyScale and utilizing a cache summary report.

As an example of this architecture, a PolyScale customer (Baselang) reduced database infrastructure spend by 75%. You can read the case study here.

Summary and Key Takeaways

  • Offsetting reads to PolyScale dramatically increases read query performance and concurrency.
  • RDS instance sizes can be reduced in size, workload (read vs write) dependent.
  • Moving read workloads to serverless means the reads will autoscale to meet varying workloads, without operational overheads or downtime
  • Moving read workloads to serverless means consumption based pricing. Only pay for what you use.

RDS Read Replica Costs and Performance

Amazon RDS Read Replicas are typically used to offload read traffic from your primary database as well as lowering geographic latencies for multi-region deployments.

Within the context of sizing and costs, it is typically recommended to scale all read replicas to be the same size to that of the origin database. This can lead to significant cost increases and once again, exacerbate the subscription based pricing with underutilized compute resources.

It is also worth noting when considering geographic coverage, RDS imposes limits on the number of read replicas that can be supported by a given database. This may be workable for 1 or 2 or even 3 regions, but falls short when considering deployments with global platforms such as Vercel, Netlify, Deno Deploy or etc.

Read replicas are often utilized to accelerate read performance. If the reads are not being served from the origin database, those resources are then freed to perform other tasks. Adding read replicas introduces horizontal scale to typically vertically scaled architectures.

PolyScale and Read Replicas

With the introduction of PolyScale, read replica traffic can be offset to the cache. In many cases, this eliminates the need for read replicas entirely.

Query performance can be dramatically accelerated. For example even a trivial query can take hundreds of milliseconds to execute when subject to concurrency or resourcing issues. In contrast PolyScale can execute that query in less than a millisecond when cached.

PolyScale has locations distributed across the globe which means, like a read replica, reads can be served from multiple locations. A key difference however, as noted previously, is that PolyScale does not read from the database logs to keep up with updates (as is common with read replicas) so no resource penalties are incurred to the database. PolyScale inspects the database traffic in real time and automatically invalidates the cache on data change.

You can read more detail about read-replicas and Polyscale in this blog article.

Summary and Key Takeaways

  • With PolyScale serving reads, read replicas can be be removed from the architecture offering a significant cost saving
  • PolyScale offers a global Edge network meaning latencies can be kept low in dozens of global locations.
  • PolyScale offers a serverless pricing model you only pay for what you use
  • There are no operational overheads so replication lag, or load issues are removed

RDS Proxy Cost and Performance

Amazon RDS Proxy is a fully managed, highly available database proxy for RDS. It is often used to scale database connections to support serverless applications, but also offers other features in the areas of high availability and security.

Serverless applications (such as AWS Lambda, Vercel Edge Functions, Deno Deploy, Netlify Edge Functions etc), will often create very large numbers of short lived TCP connections. These connections each take some memory and CPU resources from the database and hence only a finite number of connections are available based on the RDS instance size. Amazon RDS Proxy provides database connection pooling to help efficiently manage large numbers of connections to scale further.

In addition to scaling TCP connections, another challenge often seen with serverless applications, such as AWS Lambda, is the high latency caused by connecting to databases that are geographically far away. For example, a serverless function may execute in Australia, connecting to an RDS instance in the US East. That will add 200 ms+ of latency to each connection. This problem can be reduced by deploying multiple read-replicas and potentially multiple RDS proxy instances. However that will impact costs significantly, and never satisfy the low latency requirements for global deployments across dozens of instances.

PolyScale and RDS Proxy

PolyScale provides inbuilt connection pooling that offers two primary benefits: supporting very large numbers of concurrent, ephemeral TCP connections, far exceeding the origin database connection limit, and Lowering TCP connection latency.

With Connection Pooling enabled within the cache, PolyScale will maintain a connection to the database. If an incoming connection requests cached data, the connection goes no further than the PolyScale Point of Presence (PoP). If the incoming connection requests non-cached data (cache miss), the current connection that is already established will be utilized.

With these features, PolyScale not only supports database scaling to tens of thousands of concurrent connections, but also dramatically reduces latency for new connections. See our blog on Scaling Data-driven Microservices Globally with AWS Lambda for further details.

database connection pooling overview
PolyScale Connection Pooling

Summary and Key Takeaways

  • If you are using RDS Proxy for connection pooling, you can instead utilize PolyScale’s inbuilt connection pooling to scale TCP connections
  • Since PolyScale is deployed to the edge, TCP connection latency can be dramatically reduced in multi-region environments (serverless functions etc)

Amazon RDS Query Performance

As noted, query performance is a large and complex topic. In the context of solving read-query performance, DBA’s and developers alike may use various Application Performance Monitoring (APM) tools and analysis (such as enabling slow query logs) to explore which queries are performing poorly and why. Some are quite evident based on UI interactions (or customer feedback!); however, some may only become apparent under load where concurrency becomes an important factor. Testing is key.

Queries may be poorly optimized with regards to general SQL syntax (inefficient JOIN’s etc), index utilization or moving large data volumes i.e. SELECT * or perhaps omitting LIMIT clauses.

ORM libraries (client libraries that implement Object-Relational Mapping’s) are commonplace when developing database-driven applications. These libraries look to define a set of abstractions that remove some of the complexities of working with databases. In many cases, the underlying SQL generated is abstracted away and hence if poorly formed queries are being generated, they can be difficult to identify.

Using PolyScale to Increase Query Performance PolyScale is a cache, as opposed to a database, and hence the query execution plan of each is very different. PolyScale will serve any cached query in less than 1ms, and therefore can add significant performance benefits to slower queries.

Coupled with the fact that PolyScale does not suffer with the same concurrency contention as a database, this means performance does not degrade as load and concurrency increases.

Summary and Key Takeaways

  • PolyScale will serve any cached query in less than 1 ms, no matter the query complexity
  • PolyScale will support massive concurrency across all global regions


This article provides an overview of how PolyScale can augment an Amazon RDS architecture to reduce costs and increase performance.

Combining PolyScale with an Amazon RDS database yields several benefits. Firstly, SQL reads can be offset to execute at PolyScale. This results in significant resource savings, lowering infrastructure costs. Query performance is dramatically improved for queries that are cacheable. Inbuilt Connection Pooling enables very large numbers of TCP connections to be made without any resource impact on the database. This can eliminate the need for using RDS proxy entirely.

Finally, given PolyScale utilizes its own Edge network, all of the benefits listed above apply globally. In the case where multi-region deployments are important, read replicas can be replaced with PolyScale. PolyScale requires no capacity planning, sizing or server administration. PolyScale’s serverless compute model means users only pay for what they use, reducing costs significantly when compared to a subscription based pricing model.

Next Steps

Ready to try Sign up for a free account here (no credit card required). Or try out PolyScale without bringing your own database with our live interactive demo playground.

Read the Quick Start Guide to understand how easily PolyScale can be integrated without code and without deploying servers.

Written by

Ben Hagan