Exploring Global Database Latency

Sam Aybar
Dec 07, 2022
Share:

Overview

When building an application, you want to deploy the entire application as close to your users as possible to maximize performance. However, if you have geographically dispersed users, this becomes more challenging, as your application can’t be everywhere at once. Static assets and business logic can be deployed close to users through CDNs and edge functions, but deploying your database close to dispersed users creates new challenges.

PolyScale database latency M

While you may be able to locate your database close to where the bulk of your end consumers are, you cannot easily locate the database close to all users.

To deal with this issue, there have traditionally been two primary approaches:

  1. Deploy database read replicas
  2. Create a cache for your database at the application tier

Solving this problem is particularly important if you are trying to expand your business globally and serve customers in multiple regions, or if you are moving to a microservices architecture and leveraging global edge platforms like Netlify, Deno Deploy, Cloudflare Workers, or Vercel. In these scenarios, you cannot avoid serving data to locations that are far from your database, and the latency this introduces – if left unaddressed – will negatively impact the end-user experience.

PolyScale solves this issue by providing a global database cache as a service. Using PolyScale, database data can be distributed and cached, seamlessly scaling your current database without requiring changes in your code.

In a future blog post, we will explore further the pros and cons of each of these approaches to providing global access to your database. The focus of this blog, however, will be to help give some insight into the latency that can result from accessing your database from a distant location. In other words, why should you care about making your database global.

To do so, we built a simple tool to allow you to see how many requests can be made to a database from a distant location in a fixed period of time. This test will connect to your database – using the database connection string you provide – and count how many requests it can make during the interval you specify. We’ve created the test as a AWS simple Lambda function that returns the number of requests made and latency (measured as average ms/database request).

In addition to being able to query a single database, the function allows you to run the same query with two different databases simultaneously, making it easy to compare response times of two different database providers (compare Neon to Supabase for example!), different database locations, or the performance of a database with and without PolyScale.

Getting Started

We have deployed a Lambda function in four AWS locations:

  • eu-central-1 (Frankfurt)
  • us-east-1 (Virginia)
  • us-west-2 (Oregon)
  • ap-south-1 (India)

You can make a POST request to any of these endpoints in which you provide:

  1. One (or two) database URIs
  2. A SQL query to run again the databases
  3. The type of database (postgres or mysql)
  4. The number of seconds to run queries for (max of 20 seconds)

When you make a POST request with these parameters, the Lambda function will run queries for the amount of time specified, and then return a payload which includes the number of queries made to each database in that time period as well as the average latency.

As an example, we have a MySQL database deployed in AWS Paris (eu-west-3). The connection string is: mysql://polyscale:playground@database-1.ctiftbwzooit.eu-west-3.rds.amazonaws.com:3306/employees

We have created a PolyScale cache for this database and have a connection string of: mysql://590e97c6-8d85-4731-a035-a9247498b42e-polyscale:playground@psedge.global:3306/employees

We can make a curl request of:

curl --request POST 'https://jhiobprv65.execute-api.eu-central-1.amazonaws.com/Prod/' \
  --header 'Content-Type: application/json' \
  --data '{"databaseUrl1":"mysql://590e97c6-8d85-4731-a035-a9247498b42e-polyscale:playground@psedge.global:3306/employees","databaseUrl2":”mysql://polyscale:playground@database-1.ctiftbwzooit.eu-west-3.rds.amazonaws.com:3306/employees","type":"mysql","time":5,"sql":"SELECT * from departments limit 1"}'

From which we get a response of

​​{
  "db": [{
    "host": "psedge.global",
    "requestsMade": 6915,
    "avgResponseTime": 0.7
  },
  {
    "host": "database-1.ctiftbwzooit.eu-west-3.rds.amazonaws.com",
    "requestsMade": 471,
    "avgResponseTime": 10.6
  }],
  "time": 5,
  "sql": "SELECT * from departments limit 1"
}

Sample Results

You can see here that when querying the database directly, the latency increases as the distance from the origin database increases. But for queries which use PolyScale, response times are in the neighborhood of 1ms regardless of location. As the distance between the queries and the origin database increases, so too does the benefit of using PolyScale for caching.

But for queries which use PolyScale, response times are in the neighborhood of 1ms regardless of location.

Query:

SELECT * from departments limit 1

Comparison of requests in 5 Seconds (Direct vs via PolyScale):

LocationRequests (Direct)Requests (PolyScale)Latency (Direct)Latency (PolyScale)
Frankfurt471691510.6ms0.7ms
Virginia62481982.7ms1.1ms
Oregon395948134.8ms0.8ms
India487424107.8ms0.7ms

The previous query was computationally trivial, meaning that the primary benefit of PolyScale in this case, comes simply from delivering the response closer to application. The benefits of PolyScale are even greater when the SQL complexity increases. These queries require more compute time from the database.

Query:

SELECT
  COUNT(*) AS employ_count,
  MAX(salary) AS max_salary,
  MIN(salary) AS min_salary
FROM
  employees e
  INNER JOIN dept_emp de ON e.emp_no = de.emp_no
  INNER JOIN departments d ON de.dept_no = d.dept_no
  INNER JOIN salaries s ON s.emp_no = e.emp_no
WHERE
  d.dept_name = 'Sales'

Comparison of requests in 5 Seconds (Direct vs via PolyScale):

LocationRequests (Direct)Requests (PolyScale)Latency (Direct)Latency (PolyScale)
Frankfurt155311361.5ms1.0ms
Virginia135923421.3ms0.8ms
Oregon124771481.8ms1.0ms
India136717433.1ms0.7ms

Because more complex SQL queries typically require more database compute time, the introduction of PolyScale in this case reduces both the network latency and the query execution time.

The beauty of PolyScale is that you don’t need to do any coding at all to create the cache or to keep it current. You simply replace your original database credentials in your application with the string PolyScale gives you when you create the cache.

Try it Yourself

If you’d like to test out your own database, you can either use the functions we’ve deployed, or else deploy your own. To try ours, you can make post requests to:

Be sure to include a header of Content-Type:application/json and a JSON body of

{
"databaseUrl":["postgres:/USERNAME:PASSWORD@psedge.global:5432/DATABASE_NAME?application_name=CACHE_ID"","postgres://USERNAME:PASSWORD@DATABASE_HOST:5432/DATABASE_NAME"], //you can run queries to one or more databases at the same time
"type": "postgres", //can be “postgres” or “mysql”
"time": 5, //can be any value <= 20
"sql": "SELECT * from todos limit 1" // can be a SELECT statement of your choice, based on your DB
}

Deploy Your Own

You can find the repository here. If you wish to deploy on your own you need to first install the AWS cli and the AWS SAM cli. You can then clone the repository:

git clone https://github.com/samaybar/database-tester.git

In order to ensure that you have a unique AWS bucket for deployment, add a prefix of your choice to the BUCKET_NAME in Line 4 of the Makefile.

You can then deploy your own lambda function with

make bucket
make deploy

Assuming you have configured your AWS and AWS SAM cli correctly, the script connects with AWS and creates an API Gateway and Lambda function. The output of the commands will provide you with the endpoint to make your POST request to.

If you want to deploy to a different region, you can do so with

make reg=ap-south-1 bucket
make reg=ap-south-1 deploy

You can deploy in any AWS region, though we suggest you deploy in a region where PolyScale has nearby infrastructure. You can see our latest list here.

A Note about AWS Lambda

In writing the Lambda function to run this test, we observed a fair amount of variability in Lambda performance based on the memory configuration for the Lambda function. This isn’t entirely surprising, insofar as one would expect a function with more memory to be able to run faster.

In the table below, you can see the number of requests in 5 seconds to the sample database in Paris, with and without PolyScale, using a Lambda with various levels of memory.

MemoryRequests (Direct)Requests (PolyScale)Latency (Direct)Latency (PolyScale)
128 MB5563992.9ms7.8ms
512 MB61373983.6ms1.3ms
1024 MB62453882.7ms1.1ms
2048 MB62481982.7ms1.1ms
4096 MB62663282.3ms0.8ms

For these specific runs, the results followed the pattern that one might expect, with increasing compute power resulting in faster results. However, at different times, less powerful configurations would outperform more powerful ones. In addition, within the database request loop in a function, variances of 10-15ms could be seen. In the case of PolyScale, in lower memory Lambdas, the majority of requests would be in the 1ms range, but perhaps ⅓ might be in the 15ms range, driving up the overall reported PolyScale latency. To be clear, this is to say a low powered Lambda function will not keep up with receiving thousands of database responses per second.

While a few years old, there was an interesting blog post that explored this issue. It is perhaps worth revisiting at some point.

Next Steps

Use the function provided here to see how global access directly to your database introduces latency. If you didn’t already set up a PolyScale account to try this walkthrough, sign up for your own free PolyScale account here (no credit card required). Connect your database to a cache and start delivering your data around the globe, faster.

Share:
Written by

Sam Aybar