Migrate Primary Keys to UUIDs - Sequelize/Node

Last updated Invalid date

This is how and why I took upon myself the hellish task of migrating an existing Sequelize + TypeScript application to use UUIDs instead of auto-incrementing primary keys.

In my domain-driven design journey, I've come to realize that auto-incremented IDs just aren't gonna cut it for me anymore for my Sequelize + Node.js + TypeScript backend.

Why?

The reason for the migration is that we want the Domain Layer¹ code to be able create Domain Objects without having to rely on a round-trip to the database.

With auto-incremented ids, that's not possible (unless of course, we were to do some hacky things).

Being able to create Domain Objects without having to rely on a db connection is desireable because it means that our unit tests can run really quickly, and it's a good idea to separate concerns between creating objects and persisting objects.

This technique also simplifies how we can use Domain Events to allow other subdomains and bounded contexts to react to changes in our systems.

If you're interested in other approaches for generating identities in Domain-Driven Design, see "Chapter 5: Entities" in Vaughn Vernon's "Implementing Domain-Driven Design" for a more detailed discussion.

I realized that there's not a lot of information out there for how to migrate an existing production database with over 40 tables from auto-incremented IDs to UUIDs², so here's me documenting how I got it done.

Gettin' er' done

The best option was to just re-create a new database with the same tables but with different primary key data types. From there, I could insert the old data into the new database, and then swap out production databases.

Step 1: Dump the prod database + import the schema

I'm using MySQL so I was able to dump the entire production database to a self-contained file then import it locally with MySQL workbench using the Data Import tool.

Step 2: Create JSON datafiles of each table's data

The next thing I did was export all of the rows from each model to json files in the format of out/TableName.json.

import models from '../../src/infra/models'
import * as fs from 'fs'
import * as path from 'path'

/**
 * Each file gets written out to `/out/{modelName}.json`
 */

const writeToFile = (filePath, data: any) => {
  return new Promise((resolve, reject) => {
    fs.writeFile(path.join(__dirname, filePath), 
      JSON.stringify(data), 'utf8', () => {
      return resolve();
    });
  })
}


class CreateDatabaseData {
  // All sequelize models
  private models: any

  constructor (models: any) {
    this.models = models;
    this.init();
  }

  async init () {
    // ['User', 'Product', etc...]
    const modelNames = Object.keys(models);

    for(const modelName of modelNames) {
      const Model = this.models[modelName];
      console.log("Getting all the data for: " + modelName)
      // Select * from current moodel
      const rawData = await Model.findAll({});
      console.log('Writing....');
      // Write it to the json file in the out/ folder
      await writeToFile('out/' + modelName + ".json", rawData);
    }

    console.log('Done.')
    process.exit(1);
  }
}

new CreateDatabaseData();

To execute this, I had to give Node.js a little bit more ram to run this.

node --max_old_space_size=8192 -r ts-node/register scripts/uuid-migration/create-files.ts

At the end of this, I had 40 tables with all of my existing data in json files.

Step 3: Drop the local production database

We drop the local production database and create a fresh new schema.

drop schema app_database
create schema app_database

Step 4: Change all of the Sequelize models to use UUIDs

Normally, in our Sequelize projects, we'll have all of our models in a /models folder. This step meant to go into each model and update each primary key and foreign key relationship to use UUIDs.

  user_id: {
-   type: DataTypes.INTEGER(11),
+   type: DataTypes.UUID,
-   autoIncrement: true,
+   defaultValue: DataTypes.UUIDV4,
    allowNull: false,
    primaryKey: true
  },

Sequelize allows you to choose between using UUIDV4 or UUIDV1 for the default value. It also allows you to supply your own UUID generation function.

Step 5: Update the initial migration & any seeder files

Not only should we update the models, but we should also update the initial migration file with changing

  user_id: {
-   type: Sequelize.INTEGER(11),
+   type: Sequelize.UUID,
-   autoIncrement: true,
+   defaultValue: Sequelize.UUIDV4,
    allowNull: false,
    primaryKey: true
  },

We do pretty much the same thing as step 4 but in our initial migration file instead. There's also a tiny bit of a difference because we have access to Sequelize but not DataTypes here.

Step 6: Run the migration

After we've updated the migrations and we've updated the models, it's time to run the sequelize migration in addition to any seeder files.

npx sequelize db:migrate --env production && npx sequelize db:seed:all --env production

Step 7: Identify the order of table creation

In order for us to insert all of the existing data that we have saved in JSON files, we need to know which order to insert data as to not refer to tables that don't yet exist.

Looking at the history of tables being created through my migration files, I could figure out the order.

const modelOrder = [
  'User', // first table ever created
  'Product',
  // ...
  // ... more tables
  // last table
]

This took me a little while to do, but after I had them all saved in an array, I could move to the final step.

Step 8: Import the data

The last part is actually importing the data.

The general pseudocode of the script was:

Loop through each of the tables in the order created
  for each table 
    get the data file
    for each row in the data file
      prepareRowWithUUIDs(row attributes, row data)

In prepareRowWithUUIDs(row attributes, row data)
  for each attribute in row attributes
    if the type is UUID
      use the old auto-incremented row data to hash it into a new UUID
  
  return row data

Here's the final script.

import * as fs from 'fs'
import * as path from 'path'
import models from '../../src/infra/models'
import { get } from 'lodash'
const createUUID = require('uuid-by-string')

const modelOrder = [
  'User', // first table ever created
  'Product',
  // ...
  // ... more tables
  // last table
]

const OpenDataFile = (fileName) => {
  return JSON.parse(fs.readFileSync(path.join(__dirname, `out/${fileName}.json`), 'utf8'));
}

class MigrateToUUID {
  private models: any;

  constructor () {
    this.init();
  }

  async init () {
    const models = await getModels();
    this.models = models;

    for (let modelName of modelOrder) {
      const model = this.models[modelName];
      const data = OpenDataFile(modelName);
      console.log(`Inserting data into ${modelName}...`)
      await this.insertDataToTable(model, data, modelName);
    }

    process.exit(1);
  }

  prepareRowWithUUIDs (rowAttributes: Object, rowData: Object): any {
    for (let [i, attrName] of Object.keys(rowAttributes).entries()) {
      const attr = rowAttributes[attrName];
      
      if (get(attr, 'type.key') === "UUID") {
        const oldId = rowData[attrName];
        const newUUID = createUUID(String(oldId));
        rowData[attrName] = newUUID;
      }
    }

    return rowData;
  }

  async insertDataToTable (modelInstance: any, data: any[], modelName: string) {
    for(let row of data) {
      row = this.prepareRowWithUUIDs(modelInstance.attributes, row);
      try {
        await modelInstance.create(row);
      } catch(err) {
        console.log(`Error occured in model => ${modelName}`);
        console.log(err);
      }
    }
  }
}

new MigrateToUUID();

Finally, at that point- all I had to do was make sure everything still worked and swap out the old production database for the new one utilizing UUIDs.

Overall, this process was pretty painful.

If anyone knew of a better way with better tooling to accomplish something like this, drop a comment so the next person doesn't have to go through the same kind of hell.

Discussion about UUID Performance

I was curious about the performance of UUIDs compared to ints and whether I should bother with worrying about the performance tradeoffs right now.

After a good 10 minutes browsing the web and seeing a mixed amount of engineers speak negatively about using UUIDs and another half speaking positively about it, I figured I'd open my own Twitter thread.

I was delighted that some really smart domain-modelers, including Vaughn Vernon himself, could share their experiences.

My original Twitter thread.

This is an interesting article on how we can adjust the structure of a UUID to put the least rapidly changing section first to pseudo-increment the key.

My understanding is that current UUIDv1 and UUIDv4 are very bad for performance. https://t.co/81TG1TOlxA
— Khalil Stemmler (@khalilstemmler) April 22, 2019

True, the larger key size will have some implications, but depending on the database it will be less or more. The trouble with database auto-increment and sequences is the order of identity creation. If you don't have the id in the ctor you can't emit an event there.
— Vaughn Vernon (@VaughnVernon@mastodon.social) (@VaughnVernon) April 22, 2019

Yeah, persistence, especially with relational DBs, seems to be the place where there are three most trade-offs to consider, when doing DDD. I'm that sense, CQRS+ES seems alluring, but that too comes with trade-offs.
That is what makes our occupation challenging and interesting 😀
— Egil Hansen (@egilhansen) April 22, 2019

I have also created a uuid generator that produces cached values to provide almost in memory performance
— Ahmed Chicktay (@ahmedchicktay) April 22, 2019

The comment that really eased me on this was:

https://twitter.com/bharam5/status/1120158050894131200

This was the whole reason why I wanted to use UUIDs, so that I didn't have to rely on a round-trip to the DB and so that I'd actually be able to dispatch Domain Events³: containing the entity id as part of the payload.

For my needs, and I'd wager for most people's needs getting started- defer unnecessary optimization.

Based on discussion, a smart thing to do when necessary would be to compact the UUIDs or implement some caching (I like this suggestion, it seems like a really good use case for caching).

Further reading on UUID performance and optimization.

Storing UUIDs in an Optimized way

Storing UUID Values in MySQL Tables

MySQL/InnoDB - Clustering Index with UUID

MySQL UUID Smackdown: UUID vs. INT for Primary Key

Sequelize ORM

I use the Sequelize ORM for most of my Node/TypeScript projects so I was curious about the type of support available for these optimizations.

It looks like there's an open PR in the Sequelize repo to support binary(16) UUIDs.

This would help compact the UUID from a char(36) down so that it takes up less space.

This is part of the Domain-Driven Design w/ TypeScript & Node.js course. Check it out if you liked this post.

The Domain Layer is the center of our Layered Architecture. It has 0 dependencies to anything other that what belongs in the domain. This layer is encapsulated and doesn't know anything about web servers, http, rest, databases, caching, and any other infrastructre or frameworks.
↩
I probably have the world record for the most amount of un-answered questions on Stack Overflow.
↩
Domain Events are an excellent way to execute post-entity creation tasks like, after adding an item to a Schedule in a Scheduling application, sending an email to the affected parties. It allows the application service that sends emails to be de-coupled from the Schedule domain entity (Schedule shouldn't need to know anything about emailing), yet still execute some code in response to a relevant domain event.
↩

Discussion

Liked this? Sing it loud and proud 👨‍🎤.

Stay in touch!

About the author

Khalil Stemmler,
Software Essentialist ⚡

I'm Khalil. I turn code-first developers into confident crafters without having to buy, read & digest hundreds of complex programming books. Using Software Essentialism, my philosophy of software design, I coach developers through boredom, impostor syndrome, and a lack of direction to master software design and architecture. Mastery though, is not the end goal. It is merely a step towards your Inward Pull.

View more in Domain-Driven Design