Handling Collections in Aggregates (0-to-Many, Many-to-Many) - Domain-Driven Design w/ TypeScript

Last updated Jul 25th, 2019
In this article, we discuss how we can use a few CQS principles to handle unbounded 0-to-many or many-to-many collections in aggregates when designing web applications for performance.

The aggregate design article I wrote was definitely my most in-depth article yet. And that's because it's such a big topic.

In response to the article, I was asked a really good question about performance on collections. Check it out:

"I would like ask a question regarding the Artist-Genres (1-m) relationship.

In your example you limit the number of Genres an artist can have, but what do you if there is no such limit?

Do you load all related Genres when initializing a new Artist entity?

Let's say there is a Post-Comment (1-m) relation where a Post can have hundreds or even thousands of Comments. When you have a getPost useCase, do you also load all Comments?"

How do we handle when a collection will grow out of scope?"

Really good question and a valid concern. Let's get into it.


Let's visualize the Post and Comment classes.

interface PostProps {
  // WatchedList is a custom utility I made that 
  // encapsulates a collection/array. It's able to
  // tell when items are initial vs. newly added.
  comments: WatchedList<Comment>;
}

export class Post extends AggregateRoot<PostProps> {
  
  get comments (): Comment[] {
    return this.props.comments.currentItems();
  }

  private constructor (props: PostProps, id?: UniqueEntityID) {
    super(props, id);
  }
  ...
}

So by this design, there are actually 0-to-many Comments for a Post with no domain logic restricting an upper bound.

If everytime we want to perform an operation on a post, we have to retrieve every Comment for it, our system simply won't scale.

How do we remedy this?

CQS (Command Query Segregation)

When we first start learning about DDD, we often run into terms like CQS, CQRS and Event Sourcing.

These topics can explode into complexity for developers just getting started with DDD, so I'm going to attempt to keep it as pragmatic as possible for relatively simple DDD projects (that might be contradictory - DDD is needed when our projects are complex 🤪).

Here's what's important for you to know now: CQS (command query segregation).

Fowler's explanation is that "we should divide an object's methods into two sharply separated categories:"

  • Queries: Return a result and do not change the observable state of the system (are free of side effects).
  • Commands: Change the state of a system but do not return a value.

Let's talk about commands first.

Commands

If we think about how we design our web applications, this is pretty much how we think of things when we do CRUD.

With respect to things that web developers are concerned about, here are some command-like equivalent terms:

  • CRUD: Create, Update, Delete
  • HTTP REST Methods: POST, PUT, DELETE, PATCH
  • Our Blog subdomain use cases: CreatePost, UpdatePost, DeletePost, PostComment, UpdateComment

These are writes. Writes make changes to the system in some way.

To illustrate, let's build the PostComment use case.

PostComment Use Case - Command

interface PostCommentRequestDTO {
  userId: string;
  postId: string;
  html: string;
}

export class PostCommentUseCase extends UseCase<PostCommentRequestDTO, Promise<Result<any>>> {
  private postRepo: IPostRepo;

  constructor (postRepo: IPostRepo) {
    this.postRepo = postRepo;
  }

  public async execute (request: PostCommentRequestDTO): Promise<Result<any>> {
    const { userId, postId, html } = request;

    try {
      // Retrive the post
      const post: Post = await this.postRepo.findPostByPostId(postId);

      // Create a comment
      const commentOrError: Result<Comment> = Comment.create({ 
        postId: post.postId, 
        userId: UserId.create(userId),
        html
      });

      if (commentOrError.isFailure) {
        return Result.fail<any>(commentOrError.error);
      }
      // Get the comment from the result
      const comment: Comment = commentOrError.getValue();

      // Add a comment
      // => This adds the comment to the post's watched list
      post.addComment(comment);

      ...

    } catch (err) {
      console.log(err);
      return Result.fail<any>(err);
    }
  }
}

And addComment(comment: Comment): void from within Post.

interface PostProps {
  comments: WatchedList<Comment>;
}

export class Post extends AggregateRoot<PostProps> {
  
  get comments (): Comment[] {
    return this.props.comments.currentItems();
  }

  public addComment (comment: Comment): void {
    // Adds to comments: WatchedList<Comment>.newItems()
    this.comments.add(comment);
  }

  private constructor (props: PostProps, id?: UniqueEntityID) {
    super(props, id);
  }
  ...
}

In the code above, we've created a PostCommentUseCase where we retrieve the Post domain entity from the repo, and utilized the Post domain model to post a comment with post.addComment(comment).

Let's stop right there for a sec...

When we retrieved the Post domain model, did we also retrieve all (possibly hundreds of) comments?

No.

Why not?

Well, we could set a limit on the number of Comments we return initially from our repo.

For example, our baseQuery() method in the PostRepo could look like this:

export class PostRepo implements IPostRepo {
  private createBaseQuery (): any {
    const models = this;
    return {
      where: {},
      include: [
        { 
          model: models.Comment, 
          as: 'Comment', 
          limit: 5, 
          order: ['date_posted', 'DESC']
        }
      ]
    }
  }

  ..
}

This would have the effect of returning the 5 most recent comments.

But don't we have to return all of the Comments in this Post? Doesn't that ruin our Post domain model?

No, it doesn't.

My question is, for this PostCommentUseCase (which we've identified as a COMMAND), did we need to have all the comments in order to execute it?

Is there some invariant that we need to enforce here on the comments in the list to post a new comment?

In the previous article, we looked at the fact that:

...an "aggregate" is a cluster of associated objects that we treat as a unit for the purpose of data changes." - Evans. 126

And in Vaughn Vernon's book, he says that:

...“When trying to discover the Aggregates, we must understand the model’s true invariants. Only with that knowledge can we determine which objects should be clustered into a given Aggregate. An invariant is a business rule that must always be consistent.” - Excerpt From: Vernon, Vaughn. “Implementing Domain-Driven Design.”

Emphasis on true invariants. Understand that there aren't any reasons for us to need to have all of child Posts in order to execute this COMMAND.

Unless there was a rule to limit the total number of comments allowed to have been posted, and unlike my Genres example in the previous article, if the upper bound was much higher (say, 6000), then we might consider making totalComments: number a required member of the Post entity upon retrieval from the PostRepo.

A COUNT(*) WHERE post_id = "$" would be much more efficient than having to retrive and reconsistute 6000 comments in memory in order to post a comment.


So let’s continue, I just pulled in Post and did post.addComment(comment). Next, we'll save it to the repo.

export class PostCommentUseCase extends UseCase<PostCommentRequestDTO, Promise<Result<any>>> {
  ... 
  public async execute (request: PostCommentRequestDTO): Promise<Result<any>> {
    const { userId, postId, html } = request;

    try {
      ...
      post.addComment(comment);

      // save the post, cascading the save the
      // any commentsRepos as well for new comments
      await this.postRepo.save(post);

      return Result.ok<any>()

    } catch (err) {
      console.log(err);
      return Result.fail<any>(err);
    }
  }
}

When I do postRepo.save(post), it’ll pass any new comments in the Post model to the commentRepo and save them like we did last time.

Nice.

Let’s flip it around to some READs now.

Reads

Let's say that I'm working on creating the API call to return the Post as a resource.

Getting a Post by Id

The API call might look like this:

  • GET /post/:id

And the GetPostByIdUseCase simply retrives that post.

interface GetPostByIdRequestDTO {
  postId: string;
}

interface GetPostByIdResponseDTO {
  post: Post;
}

export class GetPostByIdUseCase extends UseCase<GetPostByIdRequestDTO, Promise<Result<GetPostByIdResponseDTO>>> {
  private postRepo: IPostRepo;

  constructor (postRepo: IPostRepo) {
    this.postRepo = postRepo;
  }

  public async execute (request: GetPostByIdRequestDTO): Promise<Result<any>> {
    const { postId } = request;

    try {
      // Retrive the post
      const post: Post = await this.postRepo.findPostByPostId(postId);

      // Return it
      return Result.ok<GetPostByIdResponseDTO>(post);
    } catch (err) {
      console.log(err);
      return Result.fail<any>(err);
    }
  }
}

And the PostRepo only returns the 5 most recent Comments in the post by default.

export class PostRepo implements IPostRepo {
  private createBaseQuery (): any {
    const models = this;
    return {
      where: {},
      include: [
        { 
          model: models.Comment, 
          as: 'Comment', 
          limit: 5, 
          order: ['date_posted', 'DESC']
        }
      ]
    }
  }

  public async findPostByPostId (postId: PostId | string): Promise<Post> {
    const PostModel = this.models.Post;
    const query = this.createBaseQuery();
    query.where['post_id'] = (
      postId instanceof PostId ? (<PostId>postId).id.toValue() : postId
    );
    const post = await PostModel.findOne(query);
    if (!!post) return PostMap.toDomain(post);
    return null;
  }
}

That should be enough for the first call. And you could even tune that if you like.

What about retrieving the rest of the resource? Namely, the Comments.

Getting a Post Comments By Post Id

Assume I'm reading the post via the UI and I start to scroll down. What happens if this post has over 1000 comments. What do we do now?

If we had some slick fetch-on-scroll functionality, we could make some async API calls on-scroll.

To fetch more comments, the API call might look like:

  • GET /post/:id/comments?offset=5

We could create a GetCommentsByPostId use case.

interface GetCommentsByPostIdRequestDTO {
  postId: string;
  offset: number;
}

interface GetCommentsByIdResponseDTO {
  comments: Comment[];
}

export class GetCommentsByPostIdUseCase extends UseCase<GetCommentsByPostIdRequestDTO, Promise<Result<GetCommentsByIdResponseDTO>>> {

  private commentsRepo: ICommentsRepo;

  constructor (commentsRepo: ICommentsRepo) {
    this.commentsRepo = commentsRepo;
  }

  public async execute (request: GetCommentsByPostIdRequestDTO): Promise<Result<any>> {
    const { postId, offset } = request;

    try {
      // Retrive the comments
      const comments: Comment[] = await this.commentsRepo.findCommentsByPostId(postId, offset);

      // Return it
      return Result.ok<GetPostByIdResponseDTO>({
        comments
      });

    } catch (err) {
      console.log(err);
      return Result.fail<any>(err);
    }
  }
}
export class CommentsRepo implements ICommentsRepo {
  private createBaseQuery (): any {
    const models = this;
    return {
      where: {},
      limit: 5
    }
  }

  public async findCommentsByPostId (postId: PostId | string, offset?: number): Promise<Comment[]> {
    const CommentModel = this.models.Comment;
    const query = this.createBaseQuery();
    query.where['post_id'] = (
      postId instanceof PostId ? (<PostId>postId).id.toValue() : postId
    );
    query.offset = offset ? offset : 0;
    const comments = await CommentModel.findAll(query);
    return comments.map((c) => CommentMap.toDomain(c));
  }
}

While we still use our reference to the post through postId, we go straight to the comments repository to get what we need for this query.

Forum conversation about leaving out the Aggregate for querying

From StackExchange,

"Don't use your Domain Model and aggregates for querying.

In fact, what you are asking is a common enough question that a set of principles and patterns has been established to avoid just that. It is called CQRS."

"I can't imagine that anyone would advocate returning entire aggregates of information when you don't need it." I'm trying to say that you are exactly correct with this statement. Do not retrieve an entire aggregate of information when you do not need it. This is the very core of CQRS applied to DDD. You don't need an aggregate to query. Get the data through a different mechanism (a repo works nicely), and then do that consistently."

Takeaway

  • If there's a invariant / business rule that needs to be protected by returning all of the elements in an associated collection under an aggregate boundary, return them all (like the case with Genres).
  • If there's no underlying invariant / business rule to protect by returning all unbounded elements in an associated collection under an aggregate boundary, don't bother returning them all for COMMANDS.
  • Execute QUERYs directly against the repos (or consider looking into how to build Read Models).

Additional reading



Discussion

Liked this? Sing it loud and proud 👨‍🎤.


8 Comments

Commenting has been disabled for now. To ask questions and discuss this post, join the community.

Leo
4 years ago

Hi Khalil, as always, nice article and lecture. Thanks for that!


Now, adding more complexity to this use case... Let's say we want to setup nested comments and we have the following models:

// a post
{ 
  userId: 1,
  postId: 100,
  html: '...'
}

// a comment post
{
  userId: 2,
  commentId: 500,
  parent: { type: 'POST', id: 100 },
  html: '...'
}

// a nested comment for the above comment
{
  userId: 3
  commentId: 501,
  parent: { type: 'COMMENT', id: 500 },
  html: '...'
}
  


How would you define the aggregates and commands for this case? I imagine the command for nested comments would require the top post parent Id, not only the comment parent id, so we can find the post easier... not sure yet, but again this case can have thousands of comments and nested trees.


Thank you!


Khalil Stemmler
4 years ago

Hey Leo, thanks for asking this question!


This is the fun stuff.


Bit of an aside, but I'm reminding myself of when I was in high school and I used to always call my friend Kirk to help me with math. Every time I'd call him, it was like he was fiending to do some math, and it'd weird me the hell out. Nerd :p


Anyways, that's me now with this stuff.


For writes (POST /api/comments/new):

I think the aggregate is probably the `Post` model where we either change `addComment(comment: Comment)` to `addComment(comment: Comment, parentComment?: Comment)` OR we create `postReply(comment: Comment, parentComment: Comment)`.


Here's what the use case might look like. I also used functional error handling techniques from a previous article.


export class PostCommentUseCase extends UseCase<PostCommentRequestDTO, Promise<Result<any>>> {
  ... 
  public async execute (request: PostCommentRequestDTO): Promise<Result<any>> {
    const { userId, postId, html } = request;

    try {

      // A lodash utility can look in the payload instead of destructuring it
      if (has(request, 'parentCommentId')) {
        const parentComment: Comment = await this.postRepo.getCommentByCommentId(parentCommentId)
        const parentCommentFound = !!parentComment === true;

        // If the comment doesn't exist, let's return a Use Case error
        // like we do in the functional error handling article:
        // https://khalilstemmler.com/articles/enterprise-typescript-nodejs/functional-error-handling/

        if (!parentCommentFound) {
          return left(new PostCommentErrors.ParentCommentNotFoundError()) as Response;
        }

        // A new method for replying to comments.
        post.addReply(comment, parentComment);
      } else {
        post.addComment(comment);
      }

      // save the post, cascading the save the
      // any commentsRepos as well for new comments
      await this.postRepo.save(post);

      return right(Result.ok<any>()) as Response;

    } catch (err) {
      console.log(err);
      return left(Result.fail<any>(err)) as Response;
    }
  }
}


Khalil Stemmler
4 years ago

And you pretty did the same thing I would do with the comment props. I would simplify the `CommentProps` as:


interface CommentProps {
  userId: UserId,
  commentId: CommentId,
  parentCommentId?: CommentId, // (optional)
  html: CommentText; // could be a value object to restrict size and validate
}


I think this relationship keeps it pretty linear, and doesn't rely on needing to know the entire tree of comments. Since there are no invariant rules to satisfy, you only need to add a new comment to the post aggregate with an optional parent comment id then save it in order to complete the PostComment command.

Pato
4 years ago

Your articles are always a nice read Khalil: very informative and well articulated.


I'm definitely looking forward for more!

Patryk
3 years ago

Let's say that each comment can be updated by the user who created it. How would you handle such a case?


I am just starting my journey with DDD and thanks to your articles, many things become more understandable. Thanks a lot for that!

Theu
3 years ago

as watchedList is for this type of case?

{
  "desc":  "hey",
  "id": 1,
  "user_id": "2",
  "comments": [...] // ??
}

Driekwartappel
3 years ago

Thanks for the article, but didnt quite grasp this part:

"When I do postRepo.save(post), it’ll pass any new comments in the Post model to the commentRepo and save them like we did last time."


How does the "postRepo.save(post)" know that we have added comments to the post? By looking if the post comment doesnt have an ID?


Which then brings me to my more important question:

How would this work when editing or deleting a comment. How would the "postRepo.save(post)" know we removed/edited a comment, if we dont fetch all the comments?


przemek
2 years ago

First of all, thank you for sharing your knowlagde. That is one of the most informative article about the subject I have found so far. I would like to know your opinion about one aspect, which I would call DDD vs. scaling/performance. When you want to add Comment to Post it seems that going thru Post is not neccessary (technically speaking). We do not have to fetch Post just to check if the one exists. We can simply relay on ForeignKey defined in DB which doesn’t allow us to save a Comment when it contains wrong post identifier.

Robert St. John
a year ago

Thank you for the excellent articles. I have a couple of questions related to this topic in particular. First, if, as you cited from Vernon, aggregates should reflect true invariants, and there is no invariant that necessitates all comments be known before appending a new comment to a post, should Comments then become a new aggregate, and not fetched along with the associated post? As Vernon also says in chapter 10, "design small aggregates." Second, how do the repositories you describe implement the save() operation? As save() takes a domain object rather than an ORM instance with dirty checking, that seems to necessitate the repository fetches the existing object, then perform a deep difference between the given domain instance and the fetched instance in order to know what to update. That sounds a bit difficult when the domain aggregate actually does have a nested collection of child entities, especially when considering whether order matters, and that perhaps the update to the aggregate could involve one or more updates to existing child entities. I wonder further, when using a functional paradigm and immutable data types, do updates like that involve very expensive deep copies as opposed to, for example, an in-place update of a child entity in the aggregate's array? How do you approach such updates when implementing repositories?

Sang
7 months ago

Hi Khalil, this post is very useful and make me a new look at developing software. I'm learning about DDD and Event Sourcing. I have a few questions. Do we save entities, and value objects belonging to an aggregate root in one event store stream? Or do we save entities in separate streams? If we save it in one aggregate, so in case "Do you load all related Genres when initializing a new Artist entity?" with Event Store", we have to read all events to initialize an Artist. If we have hundreds Genres, is it affect performance? Again, Thank you for your sharing!


Stay in touch!



About the author

Khalil Stemmler,
Software Essentialist ⚡

I'm Khalil. I turn code-first developers into confident crafters without having to buy, read & digest hundreds of complex programming books. Using Software Essentialism, my philosophy of software design, I coach developers through boredom, impostor syndrome, and a lack of direction to master software design and architecture. Mastery though, is not the end goal. It is merely a step towards your Inward Pull.



View more in Domain-Driven Design



You may also enjoy...

A few more related articles

How to Handle Updates on Aggregates - Domain-Driven Design w/ TypeScript
In this article, you'll learn approaches for handling aggregates on Aggregates in Domain-Driven Design.
Challenges in Aggregate Design #1 - Domain-Driven Design w/ TypeScript
In this series, we answer common questions in aggregate design. Here's an article orignating from the question, "How can a domain ...
How to Design & Persist Aggregates - Domain-Driven Design w/ TypeScript
In this article, you'll learn how identify the aggregate root and encapsulate a boundary around related entities. You'll also lear...
Decoupling Logic with Domain Events [Guide] - Domain-Driven Design w/ TypeScript
In this article, we'll walk through the process of using Domain Events to clean up how we decouple complex domain logic across the...

Want to be notified when new content comes out?

Join 15000+ other Software Essentialists learning how to master The Essentials of software design and architecture.

Get updates