Single vs Composite Indexes in Relational Databases
When To Use Single Indexes Versus When to Use Composite Indexes
Single Indexes
In a relational database, an index is a data structure that increases retrieval speed at the expense of decreasing write speed as well as using more storage space.
Querying a database table of n
records by a field other than a key, requires O(n)
record reads. (Technically, n
should be the number of disk blocks used by that table, but for simplicity, lets assume it’s just the number of records).
For example, assuming ssn
is a unique field, a query like the one below reads on average n/2
records to find the match.
SELECT * FROM users WHERE users.ssn = '1234';
This is because ssn
is unique, so the database can stop when the first record is found.
However, if the field is not unique, a query like the one below reads all n
records (aka a full table scan) to find all matches.
SELECT * FROM users WHERE users.first_name = 'Teemo';
A full table read is slow, especially for large tables. This is where indexes can help.
An index, or more specifically, an index on a column is an additional data structure of the table’s records sorted (typically via b-tree) only on that column. Each record in the index also includes a pointer to the original record in the table, such that finding records in the index is equivalent to finding records in the original table.
For example, suppose we had the following users
table.
ID | first_name | last_name | Class | Position | ssn |
---------------------------------------------------------------
1 | Teemo | Shroomer | Specialist | Top | 2345 |
2 | Cecil | Heimerdinger | Specialist | Mid | 5461 |
3 | Annie | Hastur | Mage | Mid | 8784 |
4 | Fiora | Laurent | Slayer | Top | 7867 |
5 | Garen | Crownguard | Fighter | Top | 4579 |
6 | Malcolm | Graves | Specialist | ADC | 4578 |
7 | Irelia | Lito | Figher | Top | 5689 |
8 | Janna | Windforce | Controller | Support | 4580 |
9 | Jarvan | Lightshield | Figher | Top | 4579 |
10 | Katarina | DuCouteau | Assassin | Mid | 5608 |
11 | Kayle | Hex | Specialist | Top | 4794 |
12 | Emilia | LeBlanc | Mage | Mid | 3468 |
13 | Lee | Sin | Fighter | Jungle | 8085 |
14 | Lux | Crownguard | Mage | Mid | 4567 |
15 | Sarah | Fortune | Marksman | ADC | 6560 |
16 | Morgana | Hex | Controller | Support | 3457 |
17 | Orianna | Reveck | Mage | Mid | 9282 |
18 | Sona | Buvelle | Controller | Support | 4722 |
19 | Jericho | Swain | Mage | Mid | 5489 |
20 | Shauna | Vayne | Marksman | ADC | 2352 |
21 | Xin | Zhao | Fighter | Jungle | 6902 |
22 | Yorick | Mori | Tank | Top | 4840 |
23 | Wu | Kong | Fighter | Jungle | 4933 |
If we create an index on users.first_name
CREATE INDEX first_name_index ON users (first_name) USING BTREE;
it would create a sorting of the users
by their first_name
with a pointer to their primary key, something like this:
Annie -> 3
Cecil -> 2
Emilia -> 12
Fiora -> 4
Garen -> 5
Irelia -> 7
Janna -> 8
Jarvan -> 9
Jericho -> 19
Katarina -> 10
Kayle -> 11
Lee -> 13
Lux -> 14
Malcolm -> 6
Morgana -> 16
Orianna -> 17
Sarah -> 15
Shauna -> 20
Sona -> 18
Teemo -> 1
Wu -> 23
Xin -> 21
Yorick -> 22
Then a query like
SELECT * FROM users WHERE first_name = 'Teemo';
would take only O(log_2(n))
reads since the database can perform a binary search on this index, since it is sorted by first_name
.
Enforcing Uniqueness with Indexes
In addition to performance gains, indexes can also be used to enforce uniqueness of fields. For example, suppose we don’t want more than one user to have the same social security number field on a table. Creating an index with a UNIQUE
modifier
CREATE UNIQUE INDEX ssn_index ON users (ssn);
will raise an error when another record is created with an ssn
that already exists in the table.
Avoid Unnecessary Indexes
As per the no free lunch principle, index-based performance gains don’t come free. Its costs come in the form of:
- Additional storage space to store indexes
- Indexes also need to be updated when state-changing queries like
CREATE
UPDATE
andDELETE
are made
As such, adding unnecessary indexes can actually degrade performance overall. Here are some guidelines for when to use indexes:
- Do not use an index for low-read but high-write tables. As mentioned previously, indexes improve read performance but degrade write performance.
- Do not use an index if the field has low cardinality, the number of distinct values in that field. The whole idea behind the
O(log_2(n))
query performance is that we can roughly eliminate half the records, with each binary search step. But that’s only the case with high cardinality fields. - Do not use an index for small fixed-size tables. Small tables will see no noticeable performance gains with indexes. Note that some tables may be small now, but they are likely to grow large over time, like
users
. Other tables are small and will stay small likeice_cream_flavors
, because, come on, even Baskin-Robbins only has 31 flavors.
Composite Indexes
Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.
For example, returning to our users
table example:
ID | first_name | last_name | class | position |
--------------------------------------------------------
1 | Teemo | Shroomer | Specialist | Top |
2 | Cecil | Heimerdinger | Specialist | Mid |
3 | Annie | Hastur | Mage | Mid |
4 | Fiora | Laurent | Slayer | Top |
5 | Garen | Crownguard | Fighter | Top |
6 | Malcolm | Graves | Specialist | ADC |
7 | Irelia | Lito | Figher | Top |
8 | Janna | Windforce | Controller | Support |
9 | Jarvan | Lightshield | Figher | Top |
10 | Katarina | DuCouteau | Assassin | Mid |
11 | Kayle | Hex | Specialist | Top |
12 | Emilia | LeBlanc | Mage | Mid |
13 | Lee | Sin | Fighter | Jungle |
14 | Lux | Crownguard | Mage | Mid |
15 | Sarah | Fortune | Marksman | ADC |
16 | Morgana | Hex | Controller | Support |
17 | Orianna | Reveck | Mage | Mid |
18 | Sona | Buvelle | Controller | Support |
19 | Jericho | Swain | Mage | Mid |
20 | Shauna | Vayne | Marksman | ADC |
21 | Xin | Zhao | Fighter | Jungle |
22 | Yorick | Mori | Tank | Top |
23 | Wu | Kong | Fighter | Jungle |
Creating a composite index on the class
and position
columns
CREATE INDEX class_pos_index ON users (class, position);
will create a composite index, sorted by a concatenation of those fields, something like:
class-position Primary Key
--------------------------------
AssassinMid -> 10
ControllerSupport -> 16
ControllerSupport -> 18
ControllerSupport -> 8
FigherTop -> 7
FigherTop -> 9
FighterJungle -> 13
FighterJungle -> 21
FighterJungle -> 23
FighterTop -> 5
MageMid -> 12
MageMid -> 14
MageMid -> 17
MageMid -> 19
MageMid -> 3
MarksmanADC -> 15
MarksmanADC -> 20
SlayerTop -> 4
SpecialistADC -> 6
SpecialistMid -> 2
SpecialistTop -> 1
SpecialistTop -> 11
TankTop -> 22
Given this composite index, a query like
SELECT * FROM users
WHERE
class = 'Specialist'
AND
position = 'Top';
will have improved retrieval time, because the composite index is sorted by class-position
. The database can find SpecialistTop
in O(log_2(n))
reads rather than a full table read .
Note that even a query on just the class
field
SELECT * FROM users WHERE class = 'Specialist';
will benefit from this composite index since class
is the first field. Searching the index that is sorted by class
is more or less equivalent to searching a composite index sorted by class-position
.
However, a query like
SELECT * FROM users WHERE position = 'Top';
will NOT benefit from this composite index because position
is the second field. A composite index sorted by class-position
cannot be used to quickly find a record by position
.
As such, the order of the constituent columns that comprise a composite index is important. The rule is if a composite index is made on column1
, column2
, column3
,…, columnN
. Then queries like
SELECT * FROM table
WHERE column1 = 'value';SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2';SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'...SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'
...
AND columnN = 'valueN'
will have performance gains.
Guidelines for determining composite indexes
Like single indexes, composite indexes also come with the cost of slower write speeds and increased storage space. The choices of which fields should comprise a composite index and how to order these fields can be determined by considering the following:
- If certain fields tend to appear together in queries, then it’s a good idea to create a composite index on them. For example, in our
users
example table above, it’s probably a good idea to create a composite index on(last_name, first_name)
. - If we’re going to create an index on
field1
but also create an composite index on(field1, field2)
. Then creating just the composite index on(field1, field2)
is enough since it can be used for querying byfield1
alone. - Similar to single indexes, the cardinality of the fields matters to the effectiveness composite indexes. Obviously, if two fields have high cardinality, a composite index of those fields will also have high cardinality. But it’s also possible that several low cardinality fields can together create a high cardinality composite index depending on the distribution of values among those fields.
Enforcing Unique Combinations With Composite Indexes
Composite indexes can be used to enforce uniqueness of combinations of fields.
Oftentimes, we don’t require values in fields to be unique, but we require combinations of fields to be unique. For example, suppose an addresses
table has a street
field, address_number
field, and city
field. We don’t require its street
or house_number
or city
to be unique, since multiple addresses can share the same street
, house_number
or city
. However, we probably want to require every street-house_number-city
combination to be unique, since that pretty much defines an address. We can use a composite index for this with a UNIQUE
modifier.
CREATE UNIQUE INDEX index_st_no_city
ON addresses (street, house_number, city);
Feel free to leave comments, questions, suggestions, corrections below. — S