Migrate Primary Keys to UUIDs - Sequelize/Node

Last updated Invalid date
This is how and why I took upon myself the hellish task of migrating an existing Sequelize + TypeScript application to use UUIDs instead of auto-incrementing primary keys.

In my domain-driven design journey, I've come to realize that auto-incremented IDs just aren't gonna cut it for me anymore for my Sequelize + Node.js + TypeScript backend.

Why?

The reason for the migration is that we want the Domain Layer1 code to be able create Domain Objects without having to rely on a round-trip to the database.

With auto-incremented ids, that's not possible (unless of course, we were to do some hacky things).

Being able to create Domain Objects without having to rely on a db connection is desireable because it means that our unit tests can run really quickly, and it's a good idea to separate concerns between creating objects and persisting objects.

This technique also simplifies how we can use Domain Events to allow other subdomains and bounded contexts to react to changes in our systems.

If you're interested in other approaches for generating identities in Domain-Driven Design, see "Chapter 5: Entities" in Vaughn Vernon's "Implementing Domain-Driven Design" for a more detailed discussion.

I realized that there's not a lot of information out there for how to migrate an existing production database with over 40 tables from auto-incremented IDs to UUIDs2, so here's me documenting how I got it done.


Gettin' er' done

The best option was to just re-create a new database with the same tables but with different primary key data types. From there, I could insert the old data into the new database, and then swap out production databases.

Step 1: Dump the prod database + import the schema

I'm using MySQL so I was able to dump the entire production database to a self-contained file then import it locally with MySQL workbench using the Data Import tool.

Step 2: Create JSON datafiles of each table's data

The next thing I did was export all of the rows from each model to json files in the format of out/TableName.json.

import models from '../../src/infra/models'
import * as fs from 'fs'
import * as path from 'path'

/**
 * Each file gets written out to `/out/{modelName}.json`
 */

const writeToFile = (filePath, data: any) => {
  return new Promise((resolve, reject) => {
    fs.writeFile(path.join(__dirname, filePath), 
      JSON.stringify(data), 'utf8', () => {
      return resolve();
    });
  })
}


class CreateDatabaseData {
  // All sequelize models
  private models: any

  constructor (models: any) {
    this.models = models;
    this.init();
  }

  async init () {
    // ['User', 'Product', etc...]
    const modelNames = Object.keys(models);

    for(const modelName of modelNames) {
      const Model = this.models[modelName];
      console.log("Getting all the data for: " + modelName)
      // Select * from current moodel
      const rawData = await Model.findAll({});
      console.log('Writing....');
      // Write it to the json file in the out/ folder
      await writeToFile('out/' + modelName + ".json", rawData);
    }

    console.log('Done.')
    process.exit(1);
  }
}

new CreateDatabaseData();

To execute this, I had to give Node.js a little bit more ram to run this.

node --max_old_space_size=8192 -r ts-node/register scripts/uuid-migration/create-files.ts

At the end of this, I had 40 tables with all of my existing data in json files.

Step 3: Drop the local production database

We drop the local production database and create a fresh new schema.

drop schema app_database
create schema app_database

Step 4: Change all of the Sequelize models to use UUIDs

Normally, in our Sequelize projects, we'll have all of our models in a /models folder. This step meant to go into each model and update each primary key and foreign key relationship to use UUIDs.

  user_id: {
-   type: DataTypes.INTEGER(11),
+   type: DataTypes.UUID,
-   autoIncrement: true,
+   defaultValue: DataTypes.UUIDV4,
    allowNull: false,
    primaryKey: true
  },

Sequelize allows you to choose between using UUIDV4 or UUIDV1 for the default value. It also allows you to supply your own UUID generation function.

Step 5: Update the initial migration & any seeder files

Not only should we update the models, but we should also update the initial migration file with changing

  user_id: {
-   type: Sequelize.INTEGER(11),
+   type: Sequelize.UUID,
-   autoIncrement: true,
+   defaultValue: Sequelize.UUIDV4,
    allowNull: false,
    primaryKey: true
  },

We do pretty much the same thing as step 4 but in our initial migration file instead. There's also a tiny bit of a difference because we have access to Sequelize but not DataTypes here.

Step 6: Run the migration

After we've updated the migrations and we've updated the models, it's time to run the sequelize migration in addition to any seeder files.

npx sequelize db:migrate --env production && npx sequelize db:seed:all --env production

Step 7: Identify the order of table creation

In order for us to insert all of the existing data that we have saved in JSON files, we need to know which order to insert data as to not refer to tables that don't yet exist.

Looking at the history of tables being created through my migration files, I could figure out the order.

const modelOrder = [
  'User', // first table ever created
  'Product',
  // ...
  // ... more tables
  // last table
]

This took me a little while to do, but after I had them all saved in an array, I could move to the final step.

Step 8: Import the data

The last part is actually importing the data.

The general pseudocode of the script was:

Loop through each of the tables in the order created
  for each table 
    get the data file
    for each row in the data file
      prepareRowWithUUIDs(row attributes, row data)

In prepareRowWithUUIDs(row attributes, row data)
  for each attribute in row attributes
    if the type is UUID
      use the old auto-incremented row data to hash it into a new UUID
  
  return row data

Here's the final script.

import * as fs from 'fs'
import * as path from 'path'
import models from '../../src/infra/models'
import { get } from 'lodash'
const createUUID = require('uuid-by-string')

const modelOrder = [
  'User', // first table ever created
  'Product',
  // ...
  // ... more tables
  // last table
]

const OpenDataFile = (fileName) => {
  return JSON.parse(fs.readFileSync(path.join(__dirname, `out/${fileName}.json`), 'utf8'));
}

class MigrateToUUID {
  private models: any;

  constructor () {
    this.init();
  }

  async init () {
    const models = await getModels();
    this.models = models;

    for (let modelName of modelOrder) {
      const model = this.models[modelName];
      const data = OpenDataFile(modelName);
      console.log(`Inserting data into ${modelName}...`)
      await this.insertDataToTable(model, data, modelName);
    }

    process.exit(1);
  }

  prepareRowWithUUIDs (rowAttributes: Object, rowData: Object): any {
    for (let [i, attrName] of Object.keys(rowAttributes).entries()) {
      const attr = rowAttributes[attrName];
      
      if (get(attr, 'type.key') === "UUID") {
        const oldId = rowData[attrName];
        const newUUID = createUUID(String(oldId));
        rowData[attrName] = newUUID;
      }
    }

    return rowData;
  }

  async insertDataToTable (modelInstance: any, data: any[], modelName: string) {
    for(let row of data) {
      row = this.prepareRowWithUUIDs(modelInstance.attributes, row);
      try {
        await modelInstance.create(row);
      } catch(err) {
        console.log(`Error occured in model => ${modelName}`);
        console.log(err);
      }
    }
  }
}

new MigrateToUUID();

Finally, at that point- all I had to do was make sure everything still worked and swap out the old production database for the new one utilizing UUIDs.

Overall, this process was pretty painful.

If anyone knew of a better way with better tooling to accomplish something like this, drop a comment so the next person doesn't have to go through the same kind of hell.

Discussion about UUID Performance

I was curious about the performance of UUIDs compared to ints and whether I should bother with worrying about the performance tradeoffs right now.

After a good 10 minutes browsing the web and seeing a mixed amount of engineers speak negatively about using UUIDs and another half speaking positively about it, I figured I'd open my own Twitter thread.

I was delighted that some really smart domain-modelers, including Vaughn Vernon himself, could share their experiences.

My original Twitter thread.

The comment that really eased me on this was:

https://twitter.com/bharam5/status/1120158050894131200

This was the whole reason why I wanted to use UUIDs, so that I didn't have to rely on a round-trip to the DB and so that I'd actually be able to dispatch Domain Events3: containing the entity id as part of the payload.

For my needs, and I'd wager for most people's needs getting started- defer unnecessary optimization.

Based on discussion, a smart thing to do when necessary would be to compact the UUIDs or implement some caching (I like this suggestion, it seems like a really good use case for caching).

Further reading on UUID performance and optimization.

Storing UUIDs in an Optimized way

Storing UUID Values in MySQL Tables

MySQL/InnoDB - Clustering Index with UUID

MySQL UUID Smackdown: UUID vs. INT for Primary Key

Sequelize ORM

I use the Sequelize ORM for most of my Node/TypeScript projects so I was curious about the type of support available for these optimizations.

It looks like there's an open PR in the Sequelize repo to support binary(16) UUIDs.

This would help compact the UUID from a char(36) down so that it takes up less space.

This is part of the Domain-Driven Design w/ TypeScript & Node.js course. Check it out if you liked this post.


  1. The Domain Layer is the center of our Layered Architecture. It has 0 dependencies to anything other that what belongs in the domain. This layer is encapsulated and doesn't know anything about web servers, http, rest, databases, caching, and any other infrastructre or frameworks.

    ↩
  2. I probably have the world record for the most amount of un-answered questions on Stack Overflow.

    ↩
  3. Domain Events are an excellent way to execute post-entity creation tasks like, after adding an item to a Schedule in a Scheduling application, sending an email to the affected parties. It allows the application service that sends emails to be de-coupled from the Schedule domain entity (Schedule shouldn't need to know anything about emailing), yet still execute some code in response to a relevant domain event.

    ↩


Discussion

Liked this? Sing it loud and proud πŸ‘¨β€πŸŽ€.


5 Comments

Commenting has been disabled for now. To ask questions and discuss this post, join the community.

Rahul
4 years ago

Great stuff again! Maybe I haven't gotten far enough to learn this yet, but would DB persistence then just be an event that's tied to entity creation?

Khalil Stemmler
4 years ago

The sequence for creating an entity goes like this:


  • Create an entity (like User) by calling User.create(props: UserProps).
  • When the entity is created, a unique UUID is assigned to it.
  • Pass the entity to a UserRepository.
  • The UserRepository uses a Mapper to structure the entity into the JSON object needed for the ORM to save it to the database (like `UserMap.toSequelizeProps(user: User): any` or `UserMap.toTypeORM(user: User): any`, including the UUID used for the primary key field.


This approach is better than relying on the database to come up with the unique primary key (auto-incremented). If we did it that way, we would need to interact with the database at Step 1 of that sequence I described in order to get a unique ID for the entity. A lot simpler to let the application decide on the unique ID.

John Angel
3 years ago

Hi! Thanks for all the content you provide to the community :) I'm a pretty junior dev but I liked this software architecture thing and aww all your content is gold for me :D Thanks!!

Quick question... maybe dumb, would you use an UUID as well for MongoDB databases to replace the auto generated `_id` (ObjectId).

Thanks again!

James
2 years ago

Hi, would you have the repository of this project? I'm going through the same situation, but I was confused about a few steps ago

Ali
a year ago

After having built a fairly complex application with UUIDs as sole primary keys and later switching them back for "int"s but still keeping UUIDs for public..that was painful..


You see, UUIDs work fairly fine if you are just going to call db for fairly simple things, i.e. fetch me this by id. However, this whole thing starts to fall apart, when you have complex joins or a lot of inserts/updates, due to nature of UUIDs.

Now, among many things, the indexing cannot be ordered and so that is going to affect inserts and update. Besides, it is going to take a lot more resources, because UUIDs are bulky and your joins will suffer a lot!


In most applications, you wouldn't really mind having slow updates or inserts but sacrificing on joins..they are quite important, specially if you want to keep your application fast without too many roundtrips and in memory calculations where db is almost always going to win. And having a crippling system defeats any "elegant" solution that anyone can muster!


Now one could argue that they will go with NoSQL where everything is "repeated". Sure. You can. But you will still need aggregations at some point (Not talking about DDD aggregations) and that will still be even more slower due the database being inherently slower at such tasks.


Anyways, I hope it worked out for you. Unfortunately for me, while I love the UUIDs, I don't use them as PKs anymore.

Daniel Overdevest
a year ago

For people who like the database approach:


I created a stored procedure for in MSSQL - but I'm sure you can port it to other RDMS


ALTER PROCEDURE api.sp_id_to_guid @tableName nvarchar(255)
AS
BEGIN
	declare @SQL nvarchar(4000)
	
	--= Add new GUID PK as id_new
	SELECT @SQL = 'ALTER TABLE ' + @tableName + ' ADD id_new UNIQUEIDENTIFIER DEFAULT newid() NOT null';
	EXEC(@SQL);
	Β Β 
	--= Delete PK constraints
	declare @const nvarchar(200)
	set @const = QUOTENAME((
	 select kc.name
	 from sys.key_constraints as kc
	 where kc.parent_object_id = object_id(@tableName) and kc.type = 'PK'Β 
	))
	SELECT @const
	exec('ALTER TABLE ' + @tableName + ' DROP CONSTRAINT ' + @const)

	--=Drop original Id field
	SELECT @SQL = 'ALTER TABLE ' + @tableName + ' DROP COLUMN Id';
	EXEC(@SQL);
		
	--= Rename Id2 column as Id
	EXEC('sp_RENAME ''' + @tableName + '.id_new'', ''Id'', ''COLUMN''');

	--= Add PRIMARY KEY
	SELECT @SQL = 'ALTER TABLE ' + @tableName + ' ADD CONSTRAINT [PK_' + @tableName + '] PRIMARY KEY (Id)';
	EXEC(@SQL);
END


Stay in touch!



About the author

Khalil Stemmler,
Software Essentialist ⚑

I'm Khalil. I turn code-first developers into confident crafters without having to buy, read & digest hundreds of complex programming books. Using Software Essentialism, my philosophy of software design, I coach developers through boredom, impostor syndrome, and a lack of direction to master software design and architecture. Mastery though, is not the end goal. It is merely a step towards your Inward Pull.



View more in Domain-Driven Design



You may also enjoy...

A few more related articles

How to Handle Updates on Aggregates - Domain-Driven Design w/ TypeScript
In this article, you'll learn approaches for handling aggregates on Aggregates in Domain-Driven Design.
Decoupling Logic with Domain Events [Guide] - Domain-Driven Design w/ TypeScript
In this article, we'll walk through the process of using Domain Events to clean up how we decouple complex domain logic across the...
How to Design & Persist Aggregates - Domain-Driven Design w/ TypeScript
In this article, you'll learn how identify the aggregate root and encapsulate a boundary around related entities. You'll also lear...
Make Illegal States Unrepresentable! - Domain-Driven Design w/ TypeScript
By using TypeScript's static type system, not only can we enforce (typically challenging things like) business rules and error sta...

Want to be notified when new content comes out?

Join 15000+ other Software Essentialists learning how to master The Essentials of software design and architecture.

Get updates