Optimizing and Scaling a MySQL Web Application
Or How to Stop Worrying When You See Timeouts and Learn to Love These Techniques
What Does it Mean to Optimize and Scale a MySQL Web Application?
Ask an application developer, a database admin, and a DevOps engineer what it means to “optimize and scale MySQL” and you’ll probably get three different answers. This article presents techniques from all three of these perspectives. As a heuristic, we use the following to define what it means to “scale MySQL”:
Your MySQL-backed web application’s traffic just increased 100x (yay!) but now you’re starting to see timeouts (no!). What should you do?
The techniques we discuss fall into these categories:
- Application Level Optimizations
- MySQL Level Optimizations
- Scaling Through Infrastructure
Note most of these techniques are also applicable to PostgreSQL and other SQL-RDBMSs. For simplicity, we colloquially use MySQL to refer to these other databases as well.
Application Level Optimizations
These optimization techniques occur at the application level.
Cache Queries in Redis
Caching database queries in an in-memory database like Redis will lead to fewer MySQL queries. Not every query should be cached, just those queries which are frequently accessed and/or slow.
As with all caching solutions, this will increase code complexity and introduce potentially hard-to-find bugs. This complexity comes primarily from keeping the cache up-to-date with MySQL. A common technique is when a table is updated, all its caches are deleted/invalidated. The next time a cache-able query is requested, it sees the cache is absent/invalid (aka cache miss), so it will query MySQL and cache the result.
This type of caching works great for high-read-low-write caches, such as queries which populate the content for the front page of a popular media site. But in high-write cases, caches are frequently missed, defeating the purpose of caching.
Prefetching Records to Prevent n+1 Queries
The n+1 query is defined as a query that fetches n records from the database, then runs an addition query for each of those records. If this definition is abstract and confusing, an example will clarify:
Suppose, we want to fetch all 5 users from an account as well as each user’s expenses. Here are the 5+1 queries
SELECT * FROM users WHERE account_id = 42;
# returns users 3,5,27,38,99# Now select the expenses for each user
SELECT * FROM expenses WHERE user_id = 3;
SELECT * FROM expenses WHERE user_id = 5;
SELECT * FROM expenses WHERE user_id = 27;
SELECT * FROM expenses WHERE user_id = 38;
SELECT * FROM expenses WHERE user_id = 99;
Prefetching (aka eager-loading) means if we expect the application to also query for the expenses, then we batch query them by user_id
, resulting in just 2 queries instead of 5.
SELECT * FROM users WHERE account_id = 42;SELECT * FROM expenses WHERE user_id IN (3,5,27,38,99);
Many ORM frameworks already support prefetching such as ActiveRecord’s include
method, so just be sure to use them.
Caching Query Results as Variables
If a query result is used multiple times during an operation, be sure to cache it as a variable. Here’s an example in Ruby:
Cache Columns and Summary Tables
A cache column on a table stores certain frequently accessed aggregate (ie. COUNT
SUM
MIN
MAX
AVG
) data. For example, if a messages
table is quite large and the number of unread messages for users
are accessed frequently, it may be a good idea to store unread_message_count
in the users
table as a column. As such, we would need to update this unread_message_count
every time a message gets read or new messages come in.
Similarly, a summary table is a table that holds a variety of aggregate queries. For example, a CRM dashboard might show the number of sales leads and closes by month by salesperson. Rather than query this data every time the dashboard is loaded, save the data beforehand in a summary table. This example works especially well with cases like monthly historical data, so it will be rarely updated.
Supplement Queries With a Search Engine Like ElasticSearch
Take the load off database by moving complex or heavy-text queries into a search engine like Elasticsearch. The cost here will be similar to the caching solutions mentioned above, primarily increased code complexity from keeping the search indices up-to-date with the database.
Archiving Old Rows in Large Tables
If a table becomes extremely large (ie. such as the messages
or activities
table on a social networking site), consider archiving old rows in a separate database. As such, the old data is not readily available on the UI and the user needs to explicitly request more data.
MySQL Level Optimizations
These optimization techniques involve changes at the MySQL level.
Select Only the Required Columns
A query using *
will return all columns, including joined tables. For example,
SELECT *
FROM posts
JOIN post_tags ON posts.id = post_tags.post_id
JOIN tags ON post_tags.tag_id = tags.id
will return all columns from all three tables adding unnecessary I/O, memory and CPU load to the MySQL server. Instead, return only the needed columns.
Use ENUM for categorical data types
Use ENUM
for categorical data types. For example
# Instead of
education VARCHAR(10) NOT NULL# Use ENUM
education ENUM ('highschool', 'college', 'postgrad') NOT NULL
Aside from a data integrity check, an ENUM
column only uses 1 byte for up to 255 distinct values.
Add Needed Indexes and Remove Unused Indexes
An index is a data structure (usually a b-tree) that speeds up certain queries for a specific table. An index works by sorting the table by one (or more) of its column(s). This sorting allows lookups by that column to take O(log_2(N))
time since its a binary search instead of O(N)
time as a linear search.
For queries that use multiple columns, use composite indexes which are indexes that sort a table by more than one column. For more details on how single and composite indexes work, check out this article.
To find missing indexes, use MySQL’s EXPLAIN
tool. When EXPLAIN
is prepended to a query, it runs the query and returns information like
mysql> EXPLAIN SELECT * FROM users WHERE account_id = 1 LIMIT 5\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users
type: ref
possible_keys: index_users_on_account_id_and_uid
key: index_users_on_account_id_and_uid
key_len: 5
ref: const
rows: 104
Extra: Using where
Here is a good write-up from Sitepoint on how to read the EXPLAIN output
, but the main thing to watch out for is when the type
field has the value ALL
, which means the query performed a full table scan. A full table scans means there’s probably a missing index, so be sure to add them.
As always, indexes don’t come free. They increase the database write time and use more disk space, so be sure to add only indexes that will be used and remove un-used indexes.
Sometimes Many Simple Queries are Better Than a Single Complex Query
Sometimes breaking up a single complex query into multiple simple queries is better. For example, instead of
SELECT users.full_name FROM users
JOIN user_locations ON users.id = user_locations.user_id
JOIN locations ON locations.id = user_locations.location_id
WHERE locations.id = 4231;
Rewrite it as
SELECT locations.id FROM locations WHERE id = 4231;
SELECT user_locations.id FROM user_locations WHERE location_id = 21;
SELECT users.full_name FROM users WHERE id IN (123,456,789,101);
The reason why this is the case is because:
- The MySQL query cache can cache these simple queries for other uses. A complex join is less likely to be used in other contexts. But note the query cache has been deprecated in MySQL 8.0.
- Simple queries are less likely to be subject to lock contention.
- Multiple simple queries can be run across different replicas instead of being confined to a single master (more on replicas/master below).
- In earlier days when network speeds were slower, it was better to make only one trip to the database. This isn’t the case nowadays. Network speeds are much faster, especially when your app and database servers are hosted on the same cloud provider region.
Check The Slow Query Log
Now that we know how to optimize queries, where do we find slow queries to optimize?
In the (aptly named) Slow Query Log which records all queries that exceed a user-defined threshold called long_query_time
. You can enable the Slow Query Log with
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 5; # seconds
SET GLOBAL slow_query_log_file = '/path/filename';
Scaling Through Infrastructure Changes
Scaling MySQL through infrastructure essentially means adding more database servers or using more powerful database servers, called horizontal scaling or vertical scaling, respectively.
Use Replica Databases For Read Operations (Horizontal)
MySQL has built-in support for replicas, which as the name implies, are database copies of the master database. The way replication works is that data changes to the master database are saved to a binary log which the replica replays to sync itself with the master. As such a replica is not always synced up with master and this delay is called replica latency.
Replicas can be to take the load off the master by handling read queries that tolerate a certain level of replica latency. Furthermore, keeping reads and writes on separate databases will prevent read-write-locking issues.
Also, replicas can be used for backups (ie. when a new dev accidentally deletes client data on his first day).
Functional Partitioning and Micro-services (Horizontal)
Some applications can easily be partitioned. For example, an e-commerce app can be partitioned into a consumer-oriented database and a merchant-oriented database. Most joins should be within the same database but joins across databases can be done at the application level.
When the database is partitioned along functionality, this leads naturally to micro-services. In our case, the consumer-facing functionality can be its own micro-service talking with the consumer-oriented database and the merchant-facing functionality can be its own micro-service talking with the merchant-oriented database.
Table Sharding (Horizontal)
There seems to be some ambiguity of the term sharding, but we here we will define table sharding as the partitioning of a table by rows.
The are many performance issues with a table having too many rows, such as:
- queries are slower
- backups take longer
- migrations lock up the database for a longer duration
Table sharding partitions a single big table into many smaller tables (all with the same schema). This partitioning can occur along functional lines. For example, a users
table can be partitioned along geographic lines. Maybe an enterprise CRM has one huge customer who accounts for 50% of all the records in a table, so partitioning by this one customer and every one else is a good strategy. The rule of thumb with table sharding is that the resulting partitions should be roughly the same size. Also note these partitioned tables don’t even necessarily have to be on the same database server.
MySQL supports table sharding which it calls user defined partitioning.
Upgrading Server Hardware (Vertical)
Vertical scaling means upgrading the database server to a machine with more memory, CPU speed/cores, I/O capacity, etc. Nowadays cloud service providers like Heroku or AWS make this pretty easy.
For example, on Heroku, to upgrade the database server, first create a new (and more powerful) replica to follow the master. Once the replica is caught up, swap the ENV vars and make this replica the new master.
Conclusion
We looked at a variety of techniques to optimize and scale a MySQL application, but if the database is still not scaling even after all these techniques have been employed, it might be worthwhile to look at Non-SQL databases.
If your data models and relations are very graph-like with lots of queries being graph traversals, then a graph database like Neo4j and would be the obvious choice.
Another popular non-SQL database is MongoDB which is a document database, meaning there is no schema, no relations, and the entire database can thought of as a collection of JSONs. But from a scaling perspective, MongoDB automatically shards data.
Thanks for reading! Feel free to leave your own suggestions, corrections or other comments below — S