
Problem
Given a table in a relational database, we want a way to generate the values we are going to use as primary keys on it.
Using a sequential integer is a common solution for this. In this post we want to consider an alternative approach: random UUIDs as primary keys.
Deciding between these two options (integers and UUIDs) is a non-trivial decision with several trade offs involved.
What is a UUID?
The term UUID refers to a Universally Unique Identifier. It is also called GUID (Globally Unique Identifier) in some contexts. This is a standard defined in RFC 4122 which defines a UUID as 128-bit integer that is displayed as a UTF-8 encoded string with the following format:
aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
A simple solution to the problem of identifying records on a table is using an incremental integer. This is simple and allows for efficient ordering of the values in an index. However this approach has a drawback: in order to insert a record one needs to either know a value that is not already present in the table (which creates the new problem of how to obtain this value), or let the database generate a new value automatically in the insertion, which means that we don’t know the value until after the insertion has completed.
Pros
- You can know your PK before insertion, which avoids a round trip DB hit, and simplifies transactional logic in which you need to know the PK before inserting child records using that key as its foreign key (FK).
- At scale, when you have multiple databases containing a segment (shard) of your data, for example a set of customers, using a UUID means that one ID is unique across all databases, not just the one you’re in now. This makes moving data across databases safe.
- Security: UUID values do not expose the information about your data so they are safer to use in a URL. For example, if a customer with id 10 accesses his account via http://www.example.com/customers/10/ URL, it is easy to guess that there is a customer 11, 12, etc., and this could be a target for an attack.
Cons
- Performance: UUIDs as four times larger than a 4 byte integer. Using UUID values may cause performance issues due to their size and not being ordered. This is particularly troublesome in complex database schemas with several one-to-many and many-to-many relations that need joins and sorts to work with. The performance impact of having to store the larger keys in memory tends to add up. The significance of this aspect needs to be evaluated by the engineer on a case-by-case basis.
- Storage size: storing UUID values (16-bytes) takes more storage than integers (4-bytes) or even big integers(8-bytes).
Best practices when using UUIDs
Regardless of the approach you take regarding your primary keys, it’s a good idea to adhere to the following principles:
- Avoid exposing UUIDs in an browser-facing URL. Even if the UUID is not a primary key in the database but a an indirect reference to a record, exposing it provides information that may be exploited by an attacker.
- If you use UUIDs as primary keys, avoid storing them as strings (VARCHAR). Instead, use the mechanism provided by your DBMS to store the UUID in binary. For example, in MySQL you can store the UUID as BINARY(16) , use the function UUID_TO_BIN to store it and use the function BIN_TO_UUID to retrieve it.
- If you use UUIDs as primary keys, make sure that when records are inserted, they key is generated as a random value. Sequential values (such as generated by SQLServer’s newsequentialid) expose too much information about the underlying data.
- Tom Harrison’s blog post on the matter suggests using UUIDs as primary keys, but without exposing them outside the service. This means your service’s API would use a different, external identifier for the entities, and the service needs to translate from the external identifiers to your internal UUID primary keys.
Sources
- RFC 4122: https://tools.ietf.org/html/rfc4122
- Wikipedia article on UUIDs: https://en.wikipedia.org/wiki/Universally_unique_identifier
- About pros and cons of using UUIDs as primary keys: https://www.mysqltutorial.org/mysql-uuid/#:~:text=Using UUID for a primary,to use in a URL.
- Tom Harrison’s blog post about pros and cons of using UUIDs as primary keys: https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439
- Jeff Atwoods blog post about using UUIDs as primary key: https://blog.codinghorror.com/primary-keys-ids-versus-guids/
- On the trade offs involved in UUIDs vs integer primary keys: http://web.archive.org/web/20150511162734/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html