Scaling Data-driven Microservices Globally with AWS Lambda

Sam Aybar
Aug 10, 2022

As the Functions as a Service (FaaS) market has matured since its introduction in 2014, developers have found it increasingly easy to deploy and scale applications. With minimal code and no need to manage infrastructure, FaaS allows the creation of basic microservices within minutes. There is no need to decide how much capacity you need – you’ve got as much as you need, on demand, and you pay for only what you use.

Leading players like AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, and Netlify Edge Functions each have their own pros and cons, but most people generally select which provider they use based on the cloud ecosystem they are using.

polyscale lambda

Scaling Your Application

One of the great benefits of FaaS is that the service provider is able to provide its customers the amount of capacity in the selected locations precisely at the time they need them. Rather than sizing a server to meet peak needs, FaaS users pay just for the capacity they use, and no action is required to expand capacity.

Moreover, FaaS allows for easy geographic expansion. As your market grows and you look to serve markets around the globe, you are able to easily deploy your function to your provider’s global network, putting the business logic closer to the end consumer.

The Database Challenge

While FaaS makes it easy to bring the compute power to the application tier when and where you need it, the problem of getting data to your function becomes more difficult.

Single region scaling

While a Lambda function can allow you to scale to support increased traffic, increased calls to a database can create problems that still need to be addressed. For example, as your FaaS usage increases you may see declining database performance due to concurrency issues, read/write contention, and excessive database connections. Solving this may require upgrading your database infrastructure, rewriting your application to reduce database calls, or re-architecting to separate reads and writes.

Multi region scaling

As easy as it is to roll out FaaS across regions, supporting multiple locations from a single database introduces its own set of problems. In addition to the issues mentioned above with single region scaling, global roll-outs introduce problems of data latency, as additional FaaS regions can be located further from the database.

Read-replicas are one way to address this. While this can reduce latency, it also means taking on additional cost and administrative time to manage additional hardware, or rewrite application logic to separate read and write traffic. Moreover, there are limits to the number of read replicas a database can support (physically and often financially), which means not all functions will have nearby data.

An alternative is to cache data closer to your FaaS locations using Redis, Memcached or other similar solutions. While this can improve performance, it also means devoting resources to refactoring your application to support caching. Furthermore, figuring out which data to cache and when to expire is non-trivial when dealing with a complex application.

PolyScale Serverless Edge Cache

PolyScale is a plug-and-play serverless database Cache-as-a-Service (CaaS) that reduces global query latency for databases. Using PolyScale, query execution for database reads is distributed and cached, seamlessly scaling your current database without writing code or altering query or transactional semantics.

When working in conjunction with FaaS, by simply replacing your call to your database with a call to PolyScale, you are able to automatically get the benefits of global caching without any of the effort. PolyScale takes care of the logic for what to cache and how to invalidate it without code changes in your functions. As your application scales, PolyScale provides as much capacity as you need, where you need it.

Real World Example

To illustrate some of the challenges around getting data to the edge, we created a small database (AWS db.t3.micro) in AWS’s Oregon data center - us-west-2, and then created a Lambda function to access that data. You can find the code for the function here.

The Lambda function can be called with a query complexity parameter which determines whether a simple SELECT by primary key is used or more complex JOIN query is executed.


    emp_no, birth_date, hire_date, first_name, last_name, gender
    emp_no = 10001;


    COUNT(*) AS employ_count,
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary
    employees e
        INNER JOIN
    dept_emp de ON e.emp_no = de.emp_no
        INNER JOIN
    departments d ON de.dept_no = d.dept_no
        INNER JOIN
    salaries s ON s.emp_no = e.emp_no
    d.dept_name = 'Sales';

The simple SQL statement requires minimal database execution processing, while the complex statement requires more processing due to the multi-table join and min/max functions.

We are able to use the Lambda function to simulate the behavior of an application deployed in a given AWS region. Within the Lambda function, we timed how long it took to access data from the database, which represents the time it would take an application in the same location to retrieve data from the database.

We deployed the Lambda function to 5 different AWS regions.

  • Oregon (us-west-2)
  • Virginia (us-east-1)
  • London (eu-west-2)
  • Sydney, Australia (ap-southeast-2)
  • Tokyo (ap-northeast-1)

For each location, we ran 100 queries and calculated the average time it took to query the database.

As expected, it is clear to see the response time increase as you get further from the origin database.

NOTE: These response times represent the average time spent in the function accessing the database directly for the “simple query”, including network latency as well as actual execution time.

global direct

As expected, it is clear to see the response time increase as you get further from the origin database.

Managing Connections

One of the complexities in working with serverless infrastructure and relational databases is that establishing and maintaining TCP connections requires some resource allocation, as well as incurring a latency overhead for the connection to be initiated. Typically in a more monolithic application, client side connection pooling is utilized. In that scenario, a pool of long-lived connections is created and reused efficiently to eliminate the initiation latency overhead. The short-lived nature of Lambda functions, however, means likely having to initiate a connection with each new execution of the function.

Managing database connections in the context of Lambda functions can be quite complex and deserves a future blog article for further discussion. That said, in brief, one way to potentially improve database latency from Lambda functions is to reuse an already created TCP connection, or taking that a little further, use client side connection pooling. Connection reuse can easily be achieved by creating the connection outside the handler function that is exported within the Lambda. That way, if the Lambda function is re-using an existing container, the connections are available for reuse. To explore this architecture, we created a second Lambda function which follows this pattern. (You can find the code for this function here.)

This function example uses a Node.js plugin serverless-mysql plugin, which not only pools, but also helps manage zombie connections etc often seen with large numbers of ephemeral database connections.

You can see benefits of this graphically illustrated in the following chart.

direct v serverless

The “with” client-side pooling results bar represents the average time to retrieve data in the Lambda function when opening the connection with the launch of the container rather than each instance of the function.

Reusing an open connection saves time. The absolute amount varies, because the proportion of time it takes establishing the connection relative to the overall query execution time varies depending on how close the Lambda function is from the database. The time saved varies from 5x faster to 2x faster.

This pattern also holds for the more complex database query. On a percentage basis, the time savings are not as great, since the connection time represents a smaller fraction of the overall time. Nonetheless, the reduced latency is still notable when the Lambda is far from the database.

direct v serverless long

While using the client-side pooling approach is an improvement, its benefits are limited to only reducing connection times. That benefit itself is contingent on the configuration of the serverless module and can vary considerably under different traffic patterns i.e. container reuse. With PolyScale, both connection time and query time latencies can be reduced.

Enter PolyScale

PolyScale makes it super easy to get your data to the edge with no coding. Simply configuring a cache in PolyScale (using just your database hostname and port) plugging in for the host in your application, and prepending your PolyScale cache ID to your database username in your application has you up and running.

The benefits of using PolyScale are two-fold. First, because PolyScale sits closer to your application, network latency is reduced once the data is cached. Second, because PolyScale serves the data from the cache, it is much faster than running a query on the database, particularly when those queries are complex.

For the five locations in which we deployed the Lambda function, we can see the following comparisons between direct queries to the database and queries via PolyScale.

direct v polyscale short

PolyScale provides blisteringly fast database response times almost independent of the distance between the Lambda function and the database. Using PolyScale erases the penalty Lambda functions must pay for accessing database resources while preserving the scalability benefits Lambda functions offer.

With more complex queries the absolute time savings can be enormous and game changing.

direct v polyscale long

For the complex query, even for calls being made from a database in the same location as the Lambda function, PolyScale is saving over 300ms for each call (9ms average via PolyScale vs 335ms average directly to the database). In addition, PolyScale is dramatically reducing the load on the database, allowing you to potentially downsize your database, saving money while actually improving performance globally. Note that the times for PolyScale with the complex query are higher than with the simple query because not ALL executions result in cache hits. The averaging of the longer query time (versus the simple query) to the origin database for the cache misses results in a higher average response time (relative to the simple query).

PolyScale Connection Pooling

In addition to the benefits of the data being cached close to your function, PolyScale also can provide server-side connection pooling between PolyScale and your database. (You can read more about PolyScale pooling here.)

This has several advantages. First, your connection count can significantly exceed the database limit. Second, TCP connection speeds are dramatically lowered due to both reduced physical distance and the caching of parts of the authentication handshake. In most cases — as we have done here — it is a good best practice to leverage this feature when using PolyScale.

TCP connection speeds are dramatically lowered due to both reduced physical distance and the caching of parts of the authentication handshake.


While there are many benefits to working with FaaS such as AWS Lambda functions, they also introduce complexity when working with databases. The benefit of locating compute-logic closer to the end user can be undermined by the time it takes to retrieve data from a central database.

With PolyScale, you can easily solve this problem without having to change your code. By intelligently caching the data your functions need within the same location, PolyScale significantly improves application performance while reducing the load on your database.

Next Steps

Try a real-life interactive PolyScale demo in 2 minutes. Or sign up for a free account here (no credit card required).

Read the Quick Start Guide to understand how easily PolyScale can be integrated without code and without deploying servers.

Try it Yourself

If you’d like to run your own queries against your own database, comparing direct connections to PolyScale from various AWS regions, you can find the repository and instructions here. All that is required is a free PolyScale account, an AWS account and the AWS Serverless Application Model cli set up on your machine.

Written by

Sam Aybar