Sunday, December 6, 2015

Enchance scalability with DDD. Part 1: Effective aggregate.

Modelling relation between entities in object oriented way may at first seem as a quick and relatively easy work especially when developers empowered with ORM tool such as hibernate. But among all others design flaws, this one very often leads to poor performance and scalability bottlenecks. The best way to explain why? by simple and straightforward example, which would allow reader to grasp general idea without being overloaded with proprietary domain knowledge. In this post I'm going to touch only modelling basics, so experienced modeler wouldn't find it much helpful.    

Let's consider user management domain which is described by the following rules:

1. Every user has a name. The Name has length from 5 to 15 characters;

2. Name is changeable and could be updated many times;  

3. The User could be of two types: Parent or Child. Unlike Child, Parent user could have zero or many Children. Please see Figure 1. 

4. The model should allow to add Child users to Parent one.


  Figure 1: "one-to-many" association between Parent and Child users.


For warming up, let have a look at first two rules, where our task is to model an User entity with a name constrained by max and min characters.  The name could be represented by primitive String type: 
1
2
3
4
5
6
7
public class User {  
  private String name;  
  ...  
  public void changeName(String name){  
    this.name = name;  
  }   
 .... 

Actually, it's more preferable to "wrap" String into immutable Value Object, let call it Name:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 public class Name {  
   private final String name;  

   public Name(String name){
    validateLength(name) 
    this.name = name; 
 }    
  ....   
 // validation logic, equal, hash code 
 }  


Such modelling option is: 
  • explicitly represents domain concept. So it's more readable and easier to reason about model; 
  • aligned with Single Responsibility Principle. The User class is not cluttered with name validation logic;
  • aligned with Open-Close Principle. For ex. if we want to extend name by adding first, last names, then we would have to modify only Name class, whereas User class is remained untouched;
For getting more info, please see this awesome talk by Dan Bergh Johnsson, where he provided more considerations about usage of Value Object pattern instead of primitive data types.

In order to model last two domain rules, let's define all common functionality (including name) in a base abstract class and implement "one-to-many" relationship in concrete Parent and Child classes. The hibernate solution with hierarchy mapped by single table and discriminant is below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
@Entity
@Table(name = "users")
@Inheritance(strategy=InheritanceType.SINGLE_TABLE)
@DiscriminatorColumn( name="usertype",  discriminatorType=DiscriminatorType.STRING)
public abstract class User {
    @EmbeddedId 
    private UserId id;    

    private Name name;
    @Version 
    private int version;

    public void changeName(Name name){
        this.name = name;
    }
 ...
}



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 @Entity  
 @Table(name = "users")  
 @DiscriminatorValue("P")  
 public class ParentUser extends User{
  
   @OneToMany(fetch = FetchType.LAZY, mappedBy = "parent")  
   private Set<ChildUser> childUsers = new HashSet<ChildUser>();  
  ...  
   public void addChild(ChildUser newChildUser){  
     this.childUsers.add(newChildUser)  
   }  
...
 } 


1
2
3
4
5
6
7
8
9
 @Entity  
 @Table(name = "users")  
 @DiscriminatorValue("C")  
 public class ChildUser extends User{  
   @ManyToOne   
   @JoinColumn(name = "parent_id")  
   private ParentUser parent;  
  ...  
 }


1
2
3
4
5
 @Embeddable  
 public class UserId{    
  private String Id;  
  ...  
 }

 
Given implementation works well until Alice and Bob ran into issue.
Who are Alice and Bob? 
They are unlucky users of our model who suffer from its limited scalability.  Let's imagine: Bob fetches copy of ParentUser object from database and tries to change the name (please, see Figure 2). At the same time T1, Alice also fetches copy of same ParentUser object and tries to add new ChildUser to it. Few seconds later T2, Bob have finished its operation and committed transaction, whereas Alice is still in progress. Finally Alice managed to finish her job and tries to commit its own transaction T3, but poor Alice got OptimisticLockingException with some message like: “ParentUser had been modified by Bob, while you was working on it! Please, re-load ParentUser and do your operation again.” 

Figure 2: Failed concurrent modification of the same user.

It is likely that following dialog may took place:
Alice: Bob, what did you do with that ParentUser?
Bob: I corrected misspelling in the name!
Alice: Changing a name should NOT affect adding new ChildUser!
The Alice’s conclusion is domain knowledge which we missed to properly model. Instead we modeled an object graph which include two entities: ParentUser and ChildUser.
In his book Domain Driven Design, Eric Evans called this graph an Aggregate. The aggregate is a cluster of associated objects that we treat as a unit for the purpose of data changes.


Figure 3: Large cluster aggregate.


How to address this issue? 

Modifying a name and adding a new ChildUser belong to same user concept but they are not affecting each others. In object-oriented words: they are not forming the same invariant. Normally, in enterprise application development we consider the invariant as a business rule that must always be consistent. Since both operations have different consistency boundaries, they could be performed independently. 

Eric Evans emphasized that complex association between objects always comes with a cost: 

“It is difficult to guarantee the consistency of changes to objects in a model with complex associations. Invariants need to be maintained that apply to closely related groups of objects, not just discrete objects. Yet cautious locking schemes cause multiple users to interfere pointlessly with each other and make a system unusable.”

--Eric Evans. Domain Driven Design
Vaughn Vernon suggests to start from smallest possible aggregates and associate new entities only if true invariant involving them is identified:
Rule: Design Small Aggregates:
What additional cost would there be for keeping the large-cluster Aggregate? Even if we guarantee that every transaction would succeed, a large cluster still limits performance and scalability.
Smaller Aggregates not only perform and scale better, they are also biased toward transactional success, meaning that conflicts preventing a commit are rare. This makes a system more usable. Your domain will not often have true invariant constraints that force you into large-composition design situations. Therefore, it is just plain smart to limit Aggregate size. When you occasionally encounter a true consistency rule, add another few Entities, or possibly a collection, as necessary, but continue to push yourself to keep the overall size as small as possible.
--Vaughn Vernon. Implementing Domain-Driven Design


Personally, I found Vaughn Vernon’s book the most practical guide for getting started with DDD. The examples he provided are very simple and not overloaded with complex domain knowledge, so reader could very quickly pick up concepts and apply them on his daily work.


Second attempt! Domain driven solution. 

Getting deeper into domain we’ve discovered that two operations: changing a name and adding new Child user don’t form the same invariant and they have different consistency boundaries.  Both the referencing Entity ParentUser and referenced entity ChildUser must not be modified in the same transaction. Only one or another may be modified in a single translation. Thus, they should not be guarded by the same concurrency control mechanism, in our case it’s optimistic version control implemented by version counter. On the other hand ParentUser and ChildUser entities are logically connected and somehow we have to keep tracking of this fact.  How we can model such dualistic nature?  

Now consider an alternative model on Figure 4 and it's implementation below. This time PersonUser entity doesn’t have object reference to set of ChildUsers anymore, from the other side the ChildUser is also free of reference to its ParentUser object, instead it has simple property - parent id, which logically relates two entities. Thus, having totally removed bidirectional object-oriented references between two entities, we got two separate aggregates where everyone encompasses only one entity.



Figure 4. ParentUser and ChildUser are modeled as separate aggregates.


1
2
3
4
5
6
7
8
@Entity
@Table(name = "users")
@DiscriminatorValue("P")
public class ParentUser extends User {

 // removed set of ChildUsers 

}



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
@Entity
@Table(name = "users")
@DiscriminatorValue("C")
public class ChildUser extends User{  
    // removed ParentUser

   @Embedded
   private UserId parentUserId;  
   // added simple plain property parentUserId
  
  private ChildUser(Name name, UserId parenUserId){    
    super.name = name;
    this.parentUserId = parenUserId;
  } 
...
}

This technique is called Disconnected Domain Model, and it's actually a form of lazy loading. We referenced ChildUser to ParentUser only by its globally unique identity UserId, not by holding the direct object reference.  

With given model Bob and Alice could perform their operations simultaneously:
Figure 5: Parallel modification of the same user. 

So, we've solved the transaction failure issue by modeling it away. Any ParentUser and ChildUser instances could be manipulated by simultaneous user requests. That's pretty simple! 

Since our aggregates don't use direct references to each other, their persistent state could be moved around to reach large scale. For example, let imagine the domain with relatively small number of ParentUser and infinitely growing amount of ChildUser, then it's obvious that we would like to use some NoSQL solution which supports distributed storage with different scaling option for each entity. Thus, reference by identity plays an important role when it comes to large scale, especially in NoSQL world. It's really crucial to consider AGGREGATE as consistency boundaries and not be driven by a desire to design object graph.

What if we need to read bigger chunk of data?
Very often we need to fetch a data across multiple aggregates, in other words we require  more data than single aggregate type could provide. For example, let assume for read purposes we have to fetch and return ParentUser together with all of its ChildUsers. Should we come back to initial model with larger aggregate?
Not necessarily! This is exactly the case when Martin Fowler’s Data Transfer Object pattern could help: 

The solution is to create a Data Transfer Object that can hold all the data for the call. It needs to be serializable to go across the connection. Usually an assembler is used on the server side to transfer data between the DTO and any domain objects.
--Martin Fowler. Patterns of Enterprise Application Architecture. 
In our case the mapping between domain and DTOs is not “flat one-to-one”. But actually it shouldn't be! Moreover, in his example Martin Fowler provides exactly “not flat” mapping. In addition, I would also say that good justification for applying such costly pattern is actually overcoming incompatibility between domain (behavioral-centric) and read (representation-centric) models.  Sometimes, in complex domains the dissonance between two models could cause so much overheads, that it worth to consider applying CQRS architecture.  

No comments:

Post a Comment