Thomas Bandt

Storing JSON In Your SQL Database Is Just Fine

NoSQL, SQL, or NewSQL? Those database types and the concepts they are based on are not mutually exclusive, even though some dogmatists try to make you think otherwise.

Published on Monday, 23 May 2022

Let me start with a disclaimer: I have not built my career focusing on databases, so I am by no means an expert who knows all the details inside out. Instead, I would consider myself a practitioner who has helped to build several software systems over the last two decades. At some point, all involved at least one kind of database, if not more.

So far, I have actively worked with databases running on Microsoft Access (not kidding), SQL Server, MySQL, PostgreSQL, SQLite, LiteDB, MongoDB, and RavenDB, to name a few. And with working, I mean designing, optimizing, and keeping them up and running in production. Also, I would describe myself as obsessed with database performance optimization, and I like to do things "right." So normalizing a database schema is something I could do if you would wake me up at 3 am.

However, while I value a nicely normalized database schema, I also see its limitations, especially when it comes to complex or sometimes impossible queries or poor read performance. On the other hand, I also appreciate simple solutions to complex problems. So instead of putting a caching layer in front of a SQL database, e.g., with Redis, why not make compromises within the SQL database first? The fewer moving parts a system consists of, the better. Let me provide you with a few examples.

Example 1: "Complex" Data Structures

type BusinessClerkClientAccess =
    | Unrestricted
    | Restricted of ClientId list

type BusinessUserRole =
    | Admin
    | Manager
    | Clerk of BusinessClerkClientAccess

Let's assume we have a business user who can act within the system either as an admin, a manager, or a clerk. Admins and managers have access to all the clients. At the same time, clerks are separated between those with unrestricted access and those who are only allowed to access specific clients. Now, this is expressed through the data structure shown above.

How could we put that into a relational database schema? In our user table, we could have an integer column representing the role (given that we would map that integer to the role type). Maybe we would replace Clerk with RestrictedClerk and UnrestrictedClerk. We could also have an n:m table, which would primarily consist of two columns, the id of a client and the id of a user.

Now let's assume we need to check if a given user is allowed to access a particular client. What would we do? The most likely solution would be to fetch the user first. If its role would be that of a restricted clerk, we would then fetch all records from the other table to check if that set contains a record with the id of the user and the id of the given client.

Another solution could be to include some logic within the query itself and use a conditional expression and a subquery for restricted clerks so we can handle that all within a single pass.

But how about storing the role JSON serialized along with the user? Then we would be able to deserialize in an instant after fetching the user record. We would only have to make that one query. The second query and the complex mapping would not be necessary.

Example 2: Variable Data Structures

type BusinessUserAction =
    | MasterDataUpdated of
        {| Email: string option
           FirstName: string option
           LastName: string option
           Mobile: string option
           Role: BusinessUserRole option |}
    | SignedUp
    | Confirmed

Another use case is one where we don't know the exact data structure upfront or where it is possible to see it be extended or changed over time. We simply want to dump the data in the database. For example, in the system I am working on at the moment, I started introducing "action tables" that record "actions" to related entities over time. E.g. for users who reside in a user_account table, there is a user_account_action companion table. Next to some metadata, there is one central column called action. That column contains the serialized action values, which could be of any shape, as seen above. There would be little to no value in representing those actions as multiple table columns.

Example 3: The Notorious N+1 Problem

type Client =
    { Id: ClientId
      Name: string
      Features: Map<Feature, bool> }

Another variation of the first problem is the famous issue of n+1 queries. For example, let's assume we do have a set of clients, and every client can have its own set of features which can be either turned on or off.

We could again easily model this as two or three database tables, e.g., with a client table and a client_feature table which would contain the id of a client and the id of a feature.

If we had to get all clients and their enabled features at one, we would need to get a bit creative. The simplest thing would be to fetch all clients first and then perform another query for every one of the clients to get their features (n+1). We could also make it two queries only if we would fetch the clients first and then select all features assigned to one of the client ids we pass on to the second query. In another step, we would then have to filter and assign those features to our clients in the application code. Maybe we can get it down to a single query through a subquery if we return that subquery's result as a temporary column of the client, e.g., as a comma-separated list of the assigned features. But we would still have to parse that then.

If we instead serialize the list of features, we can store it in a single column with the client. Reading the assigned features now becomes as easy as reading the client's name.

JSON Is Not Just A Dumb String In Your Database

Some may argue that the serialized stuff is still just concatenated characters inside a table column that you cannot perform operations on as you can with SQL and regular tables and other objects. The thing is: Depending on what kind of database system you are using, the opposite might be true. For example, JSON functions and operators in PostgreSQL are incredibly powerful. They allow you to do all kinds of crazy stuff with the JSON data inside your tables during the runtime of your queries. E.g., you can still use them to dynamically join things together or filter for details contained in them.

The Weak Spot: Schema Changes

Of course, it's not all guns and roses. For example, while classic table columns can easily be transformed and migrated, the JSON stuff is a more complex beast to tackle. Also, suppose your application's data structures you are serializing change. In that case, you must ensure that your app is not breaking when trying to deserialize outdated JSON values. However, that's not impossible to do. It also applies to "schemaless" databases (e.g., document stores like MongoDB or key-value stores like Redis).

As with so many other things, too, if consciously applied, JSON can be a valuable and powerful extension of the already powerful capabilities of your SQL database. So give it a thought if you haven't already.

What do you think? Drop me a line and let me know!