Domain-Driven Design: Working with Legacy Projects
by Vladimir Khorikov
Hi, everyone. My name is Vladimir Khorikov and welcome to my course, Domain-Driven Design: Working with Legacy Projects. I am a domain-driven design evangelist and very excited to share this course with you. We programmers all want to work on green field projects; you're most likely working on a legacy project right now or will be at some point in the future; there is no way around it. Each green field project becomes someone else's legacy project someday, therefore it's critical to develop skills that will help you be productive in this area. In this course you will learn how to approach legacy code bases, how to handle rewrites, when to do that, and when it is better to keep the existing legacy code base instead. Some of the major topics that we will cover include using the anticorruption layer pattern, how to create a clean and maintainable domain model in the environment with an existing legacy code base, how to evolve your domain model using synchronizing anticorruption layer, and guidelines to consider before conducting a refactoring or a rewrite. By the end of this course, you will know how to be productive with a legacy code base even if it's a complete mess. Before beginning this course you should be familiar with the C# programming language. I hope you will join me on this journey to learn about domain-driven design and working with legacy projects here at Pluralsight.
Hi. My name is Vladimir Khorikov and this is the course about domain- driven design and working with legacy projects. We programmers all like to work on greenfield projects, those that we build from the ground up ourselves. However, you are most likely working on a legacy project right now or will be at some point in the future. There is no way around it. Each greenfield project becomes someone else's legacy code base someday, therefore it's critical to develop skills that will help you be productive in this area. Productive in a sense that you will continue to deliver new functionality and keep the stakeholders happy, but also productive in a sense that you will make your way through the legacy code base by refactoring it one step at a time, reducing its maintenance cost and eventually increasing the speed which you develop new functionality. In this course you will learn patterns and techniques that will help you do exactly that. Here's a quick outline. In the first module we will talk about what a legacy project is and how to approach it from the business point of view. We will discuss the pros and cons of full rewrites as opposed to gradual, incremental refactoring. In the next module I will introduce you to the legacy system we'll be working on throughout this course. In the third module we will start building new functionality inside the legacy code base. We will create a so-called bubble context with a clean code which will separate from the surrounding mass and protect with an anticorruption layer. In the next module we will grow this bubble so that it has a separate database to work with. We will also promote the anticorruption layer into a standalone bounded context of its own. It will synchronize changes between the bubble and the existing legacy system. In the last module we will explore further ways of dealing with the legacy project. We will discuss the two options. The first one is turning it into a micro service that provides its functionality via a REST API. The second one is building a domain event channel on top of it. For this course you will need a basic knowledge of what the main treatment design is. You can watch Domain-Driven Design Fundamentals by Julie Lerman and Steve Smith or my Domain-Driven Design in Practice to gain such knowledge. I also recommend that you check the Refactoring from Anemic Domain Model one.
What Is a Legacy Project?
Before we dive into the patterns and techniques of working with legacy projects, let's discuss what a legacy project is. You might have heard multiple definitions of it, perhaps even those that contradict each other. So what is it? One way to define it is to say that a legacy project is a project whose code base you inherited from another developer or a team of developers. You or your team is now in charge of that code base in a sense that you need to maintain and expand it with new functionality. So according to this definition, a legacy project is any project that was developed by someone else. By the way, for the purpose of this course I will use the terms legacy project, legacy code, and legacy code base interchangeably. That's because from the development point of view they are most the same thing. There are other definitions, too. For example Michael Feathers, the author of the classic book, Working Effectively with Legacy Code, defines legacy code as code that has no tests. Some people also define it as any system that works in production and some that any code that you or someone else wrote more than a month ago. All these definitions have some merit to them, although the last one is probably more on the extreme side; however, neither of them reflect the underlying meaning well. Let's take Michael Feathers' definition. It's true that legacy code usually doesn't have any tests, but does it mean it that any code base that does have tests can be automatically considered non-legacy? I don't think so. Tests are code, too, and can be of bad quality just as much as the code base it covers. There are a lot of unmaintainable tests in the wild, tests that span across hundreds of lines of code and are just as cryptic as the underlying code base. Another example is when a development team practices assertion-free testing. That's when you write tests that don't have any assertions in them in order to increase test coverage metrics. Needless to say, they don't add any value and having such tests is pretty much as bad as not having any tests at all, even worse actually because they can give you a false sense of security as they always pass regardless of what's going on in the code. So the presence or absence of tests is not a good metric when it comes to determining whether the project is legacy. How about this definition. Does it suffice to say that a legacy project is a project that was developed by someone else? Well, what if its code base is well structured and generally of good quality? You wouldn't consider such a system legacy because there is not much to refactor and improve in it and even if it was you who developed this system, it doesn't tell us anything. It could be you a year or two ago with no knowledge of best practices, unit testing, and refactoring techniques. Returning to a code base written by past you can be as difficult as working on someone else's code base. The same is true for the other two definitions. They don't tell us anything about the specifics of the code base and thus cannot be used to determine whether or not a project is legacy. Alright, so what is it then? How would I define a legacy code? A legacy code base is a code base that takes significant effort to maintain. The maintenance cost could be measured in the time it takes to develop new functionality or the number of bugs that pop up when you do so. It could be because you don't have unit tests or because the code base was developed by a less experienced team, but the root cause is not important for us. The reason why we call a code base legacy is because it's difficult, inefficient, or too risky to change. Therefore, whether or not the project is legacy is a measure of quality more than anything else. In the world of software development this label has gained a negative connotation precisely because we associate it with something messy and unmaintainable.
Legacy Projects and Bad Code
Note that in the minds of most software developers there is an intrinsic connection between a legacy code base and something unmaintainable. Why is that? It's definitely not common across other spheres of our lives. We usually associate legacy with something good, something valuable a person leaves behind like a heritage or an inheritance. So why is it different in the world of programming? That's because of software entropy. Entropy is the amount of disorder in a system. This notion takes root in the second law of thermodynamics, but can be also applied to software systems. We all tend to take shortcuts when adding new functionality, implement quick and dirty solutions, and apply duct tape to defects. This works well for the short term as we are able to release software quicker, but after a while things start to get really bad. The code base becomes so complex that it's virtually impossible to maintain. Fixing one bug introduces more bugs and might define one part of the software and break several others. It works like a domino effect. It's very easy to make such code base unreliable and very hard to bring it back to stability. So each time you introduce something to the code base, the amount of disorder in it or entropy increases. You can see that code tends to deteriorate. If left without proper care such as constant clearing and refactoring, the system becomes increasingly complex and disorganized and unfortunately, that is the case for the most of the legacy systems, especially those that have been around for a long time. Eventually, such a system becomes a big ball of mud with little structure, lots of spaghetti code, and duplication. The original architecture, if there was any, erodes beyond recognition. There is no wonder developers don't like working on legacy systems. Who would? And that brings us to the following topic. What to do with all this mess?
Rewrite or Not?
When you see a legacy code base, the first option that comes to mind is to rewrite the whole thing. We all like to work on greenfield projects and rewriting a legacy system is basically turning it into a greenfield one, which is exciting news for us programmers, but is it really a good decision? When it comes to dealing with a legacy project, you always need to view it from the business perspective and that is if the project makes money, it's a great project regardless of how it is written. The business doesn't care about the code. All it cares about is the functionality. It's often hard to accept this reality because we intrinsically connect the two together, but you need to understand the following. Code is a liability. Functionality is an asset. You can view this concept as the following formula. Project success is defined by the amount of functionality divided by the amount of code. It means that if you can develop functionality with as little code as possible, you should do that. The ideal here is to develop it with no code at all. How does this formula relate to legacy systems, you might ask? Well, if the system works, but the underlying code is a mess, it doesn't necessarily mean that you need to rewrite it. Business doesn't care about the code. Functionality and your ability to develop more of it is what matters. So when you propose to rewrite the whole thing and start the project from scratch, what you actually do is you propose to increase the liability, which is code without adding any assets, which is a horrible proposition from the business perspective; think about it. Not that many systems under development survive through the development stage and get deployed and used in production; even less end up making money. If the system has successfully gone through all these stages and brings food to the table, it is great. Even if the code is a mess there is no architecture and it doesn't follow the best practices. It is still great. Your job now is to pick up the baton and make sure the project is evolving and being developed successfully. If you convince the stakeholders to rewrite it from scratch, how can you guarantee that you version would be better? You cannot. We all have this attitude that the ultimate goal for any legacy code base is to eventually replace it and we should get over it. It's a never-ending treadmill of constant rewrites. Moreover, during such a rewrite you will have to support both systems and if the older one is still actively developed, you will also need to continuously catch up with it functionality wise, which means the duration and the scope of the project rewrite can increase indefinitely. So starting over is like declaring a bankruptcy; it should be a last resort. Alright, but now to approach the legacy system then? The better attitude here is to aim at refactoring the older code base or replacing small parts of it while simultaneously delivering new business value. You need to focus on development of new functionality first. If you can do that in context of the existing system, that's good and you should do that, but if the legacy system is a mess, you can still introduce new functionality without rewriting lots of the existing code. That is what we'll be doing in the rest of this course with the help of an anticorruption layer. Okay, one more thing. You as a programmer might view working on a legacy code base daunting and tedious so the desire to turn it into a new and shiny greenfield project by declaring a rewrite is understandable, but look at it in a different way. Rescuing existing code from a rewrite while still producing new functionality is the most challenging work out there, much more challenging than working on a greenfield project and it's more valuable, too. Everyone can develop a project from scratch. It requires much more skill to be productive with an existing one. So view these as a challenge, not something boring. This work can be extremely rewarding.
When a Rewrite Is Acceptable
There still are situations where the rewrite would be a good choice, though. One such situation is when the legacy project doesn't work or doesn't make money. In this case there is no useful functionality in the existing code base and therefore no asset to lose. For such a project the rewrite can be beneficial because you can take all the good parts of the existing system, analyze its mistakes and come up with something better and you will get a nice morale boost to your team as they will have the sense of ownership for the new code base. Another situation is when the project is small. In this case rewriting it will not take long and the better code base will enable you to deliver new functionality faster, which is very similar to regular refactoring where you refactor one piece at a time while keeping the whole thing working. In this sense, the microservices architecture is the ultimate rewrite enabler. It erases the difference between refactoring and the full rewrite. If you keep the size of the services relatively small, the rewrite becomes a feasible task, something that you can implement within several days rather than months or even years, which makes its scope similar to that of refactoring.
In this module you learned what a legacy project is. While there are several definitions of this term, the most applicable for the purpose of our course is this. A legacy project is a project that takes significant effort to maintain. We talked about why we associate legacy projects with unmaintainable code. That's because of a phenomena called software entropy. Entropy is the amount of disorder in a system. It increases each time we introduce something to the code base and not clean it up or refactor after that. We discussed how the business views software projects. Code is a liability, functionality is an asset. You should adopt the same mental model. This formula will help you with that. Project success is the amount of functionality divided by the amount of code. We also discussed rewriting the whole thing. From the business perspective it's a horrible proposition because it increases the liability, which is code, without adding any assets, which is functionality. Rewriting should be a last resort. Instead, refactor or rewrite small parts of the legacy application by delivering new business value and producing new functionality. Also, view this process as a challenge. It's one of the most challenging and important types of work in software development. Finally, we talked about when a rewrite can be acceptable. It's when the project itself is small and can be rewritten fairly quickly or if the project doesn't actually work in production yet and therefore doesn't make money for the business. In the next module we will look at the sample legacy project. We will discuss its problem domain and we'll outline the plan for a new functionality.
Introducing a Legacy Project
Hi. My name is Vladimir Khorikov and this is the course about domain- driven design and working with legacy projects. In this module we will look at the existing legacy project. I will guide you through its problem domain and we will talk about the underlying code and the database.
Legacy Project Introduction
The application we'll be working on is a package delivery system. As its name suggests, the system allows you to keep track of packages for delivery. There is workflow each of the delivers should follow. First, you need to create a new delivery. Here you choose a customer and specify the destination address. After you do that, the delivery is created and remains in status new. Then you can edit the package by selecting products to deliver and calculating the estimated cost of delivery. This will bring the delivery in the ready status. Finally, you can send the package to the customer. This will change the status of the package to in progress. This is how the application itself looks. You can see here a list of deliveries with their customers, statuses, and destination addresses. You can create a new delivery. For that you need to select a customer and write down the address. Let it be Some Street, Suite 300. The city and the state is Washington, D.C. and some zip code. You can see the new delivery has appeared in the list with the new status. Now we can specify what we are going to deliver by editing the package. Here we can choose up to four products and select the amounts for each of them. So let's say that the customer makes the Best Pizza Ever in the amount of 3 items and let's say that they also want a Fridge. Saving the selection. As you can see, the delivery has been moved to the ready status. We can open the same package once again and ask the system to recalculate the delivery cost, like this. The cost is $160. Finally, after all the preparations are done, we can mark the package as in progress, meaning that it's on its way to the customer. The application is rather simple, as you may see. Let's now look at the underlying code base and the database.
We'll start with the database and just as in manual legacy projects, it's awful. Right from the start you can see these weird table names. They are shortened and all in caps. If we look at this table which represents deliveries in the system, we can see that the column names are no better. It's hard to grasp the meaning behind these cryptic titles, but that's not the worst part. Renaming a column wouldn't be that big of a deal after all. There are other shortcomings in this table, too. First, all columns except that column that represents the primary key, all of them are nullable, which is a red flag because normally you would have both nullable and non-nullable columns in your tables. This suggests that the current state of the table structure is incorrect and doesn't represent the real data model. Alright, let's now try to map the columns to what we saw on the UI. The cost column is this one. This is the number or id. This is a reference to the customer and this one stands for the current status of the delivery, but as you can see on the screen, we can specify four products with the amounts for each of them; however, in the database we can only see two, product line 1 and product line 2 and their respective amounts. Where are the other two products? It turns out that there is a second table with information about deliveries. Here it is and there you can see the missing two products, product line 3 with its amount and product line 4. So the information about the products is spread across two tables and the application gathers it when it shows this window to the user, but the bigger problem here is that the delivery table is not normalized. You shouldn't have the products inlined into the parent table. That's database design 101. What you need to do instead is you need to have a separate table with product lines and set the relationship between the two as one too many, meaning that for each delivery you will have multiple roles with the product line and its amount. This will allow you to get rid of these duplications here where the columns are basically the same with the only difference between them being the number of the line. This normalization would also help deal with the other problems with the current design. As you can see on the UI, the application allows for only up to four products at a time in a given delivery and the reason for that is that the underlying database can only handle four lines. So if the user wants to create a delivery with five lines, they will need to split those products into separate deliveries instead and it seems that at some point in the past the application allowed for only two products to be delivered at a time because as you can see here, the main delivery table only handles the information about two of them. It seems that there was a change request from stakeholders to increase that limit, but instead of refactoring the whole thing and extracting the product lines into a separate table, the developers just extended the existing one and they did that in the worst way possible, by adding a separate table delivery to with another two product lines in it. So the limit was increased, but if the stakeholders would ever want to increase it again, the developers will have to again modify the structure of this table. Another issue with this table is that the product amounts are all represented with the char of size 2 data type. They should all be integers instead, but the application code seemed to needlessly convert the strings to and from the integer type because if we look here, the amounts are currently handled by up-down control, but that is small potatoes comparing to the previous issue. Alright, that's it about this screen. Let's look at this screen for adding a new delivery. The customer field is represented with this CSTM field. It's a reference to the customers table, which we'll look at in just a minute and as I mentioned earlier, the STS column is for the status of the delivery, but where is the information about the address? The street, city, and state and the zip code. We have a table here for all addresses. It pretty much mimics the screen we just saw. Here is the column with the street information, city and state, and the zip code and it also has a link back to delivery, which means that the delivery and the address tables are related as many to one. In other words, for each delivery there can be multiple addresses, but if you look at the UI again, there is no ability for us to add another address to the delivery, which means that the relation between the two should be one-to-one instead. So what we have here is another issue with the way the tables in the database are structured. Instead of having a reference from the address table to delivery, there should have been a reference the other way around, from delivery to address, or alternatively the address could have been inlined in the delivery table. Also note the inconsistencies in the column name. Some of the columns have this CLM post fix and others don't. Besides the primary key column in this table is named differently than in the delivery one. It's called ID column whereas in other tables the primary key is called number column and of course the city and state shouldn't be stored in a single column. That is failing to adhere to the first normal form, plain and simple. Let's look at this table now. It contains information about customers and the name P is probably related to the word people, but no one really knows for sure because the original developer quit a long time ago. The columns you can see here are the name and two references to the address table. So what you might get from reviewing this table is that each customer might have two addresses. Maybe one of them is primary and the other one is secondary, but if you look at the data this table contains, there is no one customer with the second address field out so the second column is orphan. It's still there because no one ever bothered to remove it. Alright, and finally there is the product table. Each product has an id, name, description, and weight. Note that for some reason the id column is not marked as primary key. Also having the name and description stored as fix sized strings means that all strings in there will be padded with trailing spaces to match their size and that means that in order to properly display those strings on the UI, the application has to trim them every time it loads the products from the database. A better way would be to have this data stored as varchar or nvarchar, which allows for variable string length. Another strange design decision here is that the table has two columns representing the weights of the products. One of them is weight in pounds and the other ones in kilograms and if we look at the data inside, we can see that for any given product there is data in one of the columns or the other, but not both, which means that some of the products have their weights encoded in pounds while others in kilograms. A better solution here of course would be to convert the weight to one of those units of mass and store them uniformly for all products, but it's a legacy project. You can't have high expectations here. Alright, that's all the tables we have in our legacy database. Let's create a diagram with them and summarize what we just saw.
Recap: Database Introduction
Here is the diagram. As we discussed earlier, the most obvious thing here is naming. The uppercase is everywhere, the names themselves are cryptic, and there is no single naming convention. Some of the columns have this CLM post fix, others don't. Another issue is that all columns, except for identifiers are made nullable. That's a lazy way to approach database design as it doesn't give information about which fields can and which cannot be empty. Next there are no foreign key constraints between the tables. For example, the delivery table refers to the customer 1 with the CSTM column, but there is no constraint to actually enforce this relationship and so it's possible to delete a customer that is referenced by delivery, which would bring their delivery to an inconsistent state because we would lose the important piece of information about it. There also are a few other minor issues like for example, the fact that all string fields are represented with fixed size char data types and have information about both the city and the state of an address stored in a single column. The most important drawbacks, however, are related to the relationships between the tables. First of all, the information about product lines in a delivery is inlined into the delivery table itself, which means the application can have a maximum of four products in a single package. It would have been better to introduce another table instead like this and get rid of all those fields in the delivery table, but we have what we have. Secondly, data related to the deliveries is spread over two tables for no good reason. It doesn't add any benefits and only complicates the work with the deliveries as the application code needs to manually assemble that information from those two tables and finally, the relationship between the delivery table and address is one too many, meaning that one delivery can have multiple addresses whereas in the reality, you cannot specify more than one address for a given package. So the relationship should be backward. A single delivery should have only address.
Application Code Introduction
You can find the application code on GitHub or in the exercise files on Pluralsight. If you prefer GitHub, head over to my page and navigate to the legacy projects repository. Alternatively, you can use these short links. There are currently two folders in this repository, initial which contains the initial version of the source code and ACL which contains code with the anticorruption layer from the next module. By the time you watch this there will be another folder containing code with a synchronizing anticorruption layer which we'll be discussing in a later module. The database script is also here in the initial folder. Alright, so let's take a look at the application code. We will not be working with this code much so I will not dive deep into it. As you saw earlier, it's a desktop application. It's written using WPF. The code base itself is actually not bad. It's definitely not as awful as the database. Here is for example the view model that is responsible for editing the product lines in a package. There is some data the UI binds to such as products, name, and amounts, commands to change the products, allotting them data from the database, and saving it back when clicking on the OK button. The DBHelper class uses plain SQL. This SQL is quite hard to read, but it's mostly the product of the messy database rather than the application code itself. This class uses Dapper to execute SQL queries. If this was a real project I wouldn't consider its code base legacy. The way I would approach it in the real life is I would continue working on adding new features to it while simultaneously refactoring the database using the migration-based approach. Alternatively, because the project itself is quite small, a full rewrite would be a viable option here, but keep in mind that it's a simplified version. I need to keep a balance between the project size on one hand and its release on the other. I just would not be able to realistically show you a fully fledged legacy project within the scope of this course so even though the code base here is pretty good, let's assume that it's not and that it is as messy as the underlying database or even more unmaintainable. Therefore, working on the existing code base is not an option for us and let's also assume that it's much larger than what I showed you and so rewriting it from scratch is not an option either. So what do we do when we cannot maintain the existing code base due to it being a complete mess and we cannot rewrite it either because that would be several years worth of effort that the stakeholders would never prove. That's what we'll be discussing in the next module. And by the way, if you want to learn more about refactoring databases using the migration-based approach, check out my Database Delivery Best Practices course here on Pluralsight.
In this module you saw the legacy system we'll be working on throughout this course. The underlying database is a mess without any concern for data integrity, normalization, or readability. The application code is actually not bad, but for the purpose of this course we will assume that it is bad and unmaintainable. We will also assume that it's large enough so that a full rewrite is not an option for us. Note that although the application is a desktop app and quite small overall, the principles and techniques you'll be learning in this course are applicable to any type of a legacy system, be it an API, a web application, or a desktop one like ours. In the next module we will start introducing new functionality to the project. You will learn the concept of anticorruption layer and the bubble bounded context and we'll see how to maintain good separation between our new code and the existing legacy mass.
Creating a Bubble Context with a New Domain Model
Hi. My name is Vladimir Khorikov and this the course about Domain- Driven design and working with legacy projects. In this module we will start working on the legacy project. We will introduce new functionality to it and in order to do that, we will create a clean bubble-bounded context that will be surrounded by an anticorruption layer later on.
Alright, so we are now responsible for maintaining the legacy project and we have the first task already. Let's say that the business people came to us and asked for a new feature, changing the way that estimated cost of delivery is calculated. Let me show you this feature again. If we added one of the packages, the estimated cost is displayed in this textbox. We can increase the amount in one of the products, click the recalculate button and the application will change the estimate. In our case it changed from 40 to 80. We can also add another product, increase its amount to and recalculate the cost again and here is the place in the code where this functionality is implemented. It's pretty straightforward as you can see. The application just sums up all amounts from all products in the delivery and multiplies the result by 40. So the stakeholders are not satisfied with the way it is currently working. This estimate often appears to be incorrect and the amount of money the company spends on delivering a package often doesn't match this initial evaluation. To close this gap between the estimate and the real cost, the application needs to have more precise calculation algorithm. This algorithm has to take into account the distance between the warehouse and the destination address and it also needs to account for the mass of each product. So if you look at the database diagram, then your version of the algorithm will need to get two additional pieces of information. The address from this table and the weight from this one. So this is our first task for this project. Note once again that if it was a real project, I would recommend to just go ahead and modify this logic within the existing legacy code base, but this is an example and the examples have to be simple. So we will continue with the assumption I asked you to make in the previous module. The assumption that the application code base is much messier than you saw here and much larger as well, which means that implementing the change within this code base is hardly and option for us due to its enormous complexity. Rewriting it is not an option either because that would be a project for several months or even years with no new functionality to show for it. So what options do we have available then? As we discussed in the first module, the better option is to refactor the old code base while simultaneously delivering new business value. This last part, delivering new business value is essential. Remember this formula we discussed earlier. Any development activity that does not result in new useful functionality is wasteful from the business point of view. Sure, the stakeholders can tolerate it if this activity is not too lengthy, but the perception still stands. As long as the new assets or the new functionality equals 0, the whole effort results in 0 benefits. So you must always show something in the enumerator. You should always focus on delivering immediate value if and when you refactor the old code base, especially if you refactor the old code base because it's too easy to get lost in the refactoring and forget about the value it brings to the business.
This principle, focusing on delivering immediate value entails several implications. First of all, if you see that in order to develop new functionality you don't need to refactor some piece of legacy code, don't touch it. Try to keep the scope of refactoring as narrow as possible. Remember, the less the denominator in this formula the better. Try to produce the results fast and make the refactoring incrementally, but at the same time, don't compromise on the code quality. If you do that, you will be hit by the fact of a broken window. That is if you make just a single compromise on the quality, it will be increasingly easier for you to continue making such compromises in the future. If there is one suboptimal design decision in the code base, why not add another? This is what you will think of it. Eventually, this leads to degrading the quality of the new code and pretty much negates the whole point of the refactoring. The broken window effect is akin to software entropy, the phenomena we discussed in the first module. So keep the scope of the refactoring as narrow as possible and at the same time, keep its quality as high as possible and if you need to compromise on something, for example, because of deadline pressure, reduce the scope, not the quality. When dealing with a legacy code base, don't expect to understand how the whole application works; it can be quite overwhelming. Approach it from different angles, one step at a time, implementing new functionality while refactoring small parts of the existing code base will help you with that. Keep in mind that not all code is worth being refactored. As I mentioned earlier, if a module of code doesn't need to change in the near future, leave it alone. Also, not everything in the large system will be well designed, even if you try to make it so. Here is a quote from Eric Evans with that regard, keeping everything well-designed is the enemy of good design. The point is the desire to refactor everything will not necessarily bring the most benefits. That's because of opportunity cost. In most cases your effort will bring more value if applied to developing new functionality rather than working on not-so-important pieces of the code base and frankly, you just don't need to keep everything well-designed. There always will be some bad parts in the legacy code base. Refactoring is essential, but you need to be strategic about where to apply it and that means once again applying it to only those parts of the system that require new functionality. Alright, narrow targeted refactoring is the key, but what to do if the legacy code base is a complete mess? What if the code base is so unmaintainable and hard to change that it will take you days or even weeks to implement a new feature, let alone refactor anything in it. In this case, it can be more practical to use approaches other than trying to refactor the legacy code base.
If you want to build the new functionality in a clean way, conforming to the existing code base is detrimental. At this point the legacy code base is effectively a big ball of mud, an unmaintainable piece of interconnected code parts with little or no structure. The mud tends to spread itself in a sense that if you try to conform to it, it will absorb your code, too and that is where the idea of bubble context with an anticorruption layer comes into play. So what is it? This pattern was first introduced by Eric Evans in his domain-driven design book. It is a layer of the software whose purpose is to separate the domain model from the outside code base. The main idea here is that if you want to start writing clean code with a brand new domain model, you can do that, but this domain model will need to somehow communicate with the surrounding mess and in order to keep your domain model clean, you need to pass this communication through a separate translator so that the legacy code base will not infect the clean bubble with its messiness. Here is a diagram. This is a larger legacy code base and this is our new bounded context, a small bubble of clean domain model. The anticorruption layer isolates this bubble from the harmful influence of the legacy code base. It talks to the legacy system through its existing interface and thus doesn't require modification of that system or at least not too much of it. Internally the layer performs a translation between the two models in both directions. So if our new domain model requires some data from the legacy system, we don't just use that data in whatever format it is represented there. We first pass it through the anticorruption layer, which translates that data into the notions of our new domain model and only after that use it. And the same is true when we are sending some data back to the legacy system. The bubble shouldn't know how to do that. Instead, it talks to the anticorruption layer using the language of its own domain model and the anticorruption layer then performs the backward translation. Alright, that's basically it. We will start implementing the new domain model shortly, but before that let's do some groundwork and talk about some of the terms I used earlier such as domain model and bounded context. You can learn more about them in my Domain-Driven Design in Practice course, but here is a short version. The term domain is a synonym for problem domain and basically means the actual problem the software is solving. You can also view it as a sphere of knowledge essential for a business. We call it domain knowledge. Model or domain model is a system of abstractions that represent selected aspects of the domain. It is the key part of the actual solution that our software represents. If you are familiar with such things as entities where your objects, aggregates, and domain events, they all recite in the scope of a domain model. You also need to know about the term ubiquitous language. It is a business-oriented language that all developers and domain experts should share and use when discussing the domain. The developers should also use it to name entities for your objects and other domain classes in the domain model. Finally, a bounded context is the scope in which some particular domain model is consistent. Basically a bounded context determines the area of applicability for a domain model in ubiquitous language. That's because for any more or less complex system you cannot build just one unified domain model and you also cannot speak the same ubiquitous language in this case because words tend to have a different meaning and different contexts and it's hard to agree upon a single ubiquitous language in a large project that involves a lot of people. It is more beneficial to split the work into several subsystems with their own domain models in ubiquitous languages and thus tackle independent parts of the problem. If you draw a diagram depicting these notions together, it would look like this. The problem domain consists of several smaller subdomains, sub problems, so to speak. On the right there is our software. It consists of multiple bounded contexts, each of which has a domain model inside. There could be some auxiliary code in bounded contexts, too, but the domain model is what we are interested in the most. Each bounded context tackles its own subdomain. Note that the software doesn't necessarily have a single executable. It can be implemented as a set of microservices. Alright, so this is the quick outline of these terms. For more details, refer to my other course, Domain-Driven Design in Practice.
Outlining the New Domain Model
Alright, back to our new requirement. This is going to be the new formula which we will be using to estimate the cost of delivery. The total weight of all products we need to deliver in pounds multiplied by the distance between our warehouse and the destination address multiplied by the price per pound per each mile of commute. The $20 is a flat fee that is taken regardless of the weight or distance. So let's say for example that we need to deliver a package of two items, 15 pounds each, to the distance of 10 miles. Then the cost estimate will look like this. The total weight of all products in the package is 2 times 15 equals thirty pounds, times the distance of 10 miles, times 4 cents, and plus $20, and that is $32. We will build a new domain model with an anticorruption layer around it for this new feature and the first step here is to outline how this new domain model will look. This doesn't have to be the precise picture; it could be just a high-level overview of the key entities in the new model. So to estimate the delivery cost we will need obviously the delivery itself so we would have a delivery entity. It will need to contain information about all products and their amounts that we need to deliver and instead of representing them the way it is done in the legacy code base instead of repeating the four sets of fields, we can do that in a proper way and that is introducing a new concept, product line, that would consist of the product and the amount. So the delivery entity will contain the collection of product lines. Aside from that, we will probably need to store the estimates somewhere so we'll need another attribute for that, too. To calculate the weight of each item, we'll need to add a product entity with the weight property. To get the transformation distance, we'll need to also introduce the concept of address that will consist of the fields from our existing legacy database. The delivery will need to have the information about this address so we will add it to the delivery class as well. That's it for this new requirement. Note that we are not bringing in the customer entity. That's because in order to calculate the cost of delivery, we don't need to know anything about the customer, only the products and the address of the delivery. That's an important point. Whenever you build a new domain model, be it a brand new model or a bubble of clean code surrounded by a legacy system like in our case, only bring in the concepts from the problem domain that you actually need to deliver the new functionality. Don't introduce entities just in case without the actual need in them. That's especially true when you're creating a bubble context because you will need to not only maintain the new domain model itself moving forward, but also the anticorruption layer that separates that model from the legacy code. Remember, the less code you write to solve the problem, the better. Also note that we are outlining our new domain model as if there was no existing legacy code base whatsoever. That's an important point, too. Don't allow the legacy code base influence how your new domain model looks. It should evolve independently from the rest of the code base. That's the whole point. We will handle the translation between the two in an anticorruption layer. For example, we will represent these four product lines as a collection. Merge the two weight columns into a single field and split city and state into two separate ones.
Creating the Bubble Context
The new bubble context with a new clean domain model is going to reside in a new Visual Studio project. So let's create one. Let me change the location real quick. Okay, as for the project's name, we could call it something like cost estimate or cost estimation because all it's going to do is contain the new logic for estimating the delivery cost, but there's going to be more functionality in our bubble soon and having this knowledge up front, it would be more beneficial to assign a less specific title to it. The name of our current project is package delivery so let's go with PackageDeliveryNew. I know it doesn't sound plausible, but it's hard to come up with a more appropriate title in this particular case. So we'll go with this one. Let me get rid of this automatically added class and create a new folder, deliveries. This is where our new domain model will be located and the first class in it is of course delivery. Good. So what do we need here? First of all, the estimate itself. This is the field that will represent this column from the legacy database. Note that the data type of this column is flawed, which translates into double in .NET, but at the same time this field stores information about money, specifically how much it will cost to deliver the package and because of that, the more appropriate data type here would be decimal, not double, and so let's change the type of this property to decimal. As we discussed earlier, the new domain model should not be influenced by the decisions made in the legacy system. We shouldn't compromise on the quality of our new code base. And next, delivery needs the product lines, but instead of conforming to the existing data structure and repeating the four sets of properties, we will represent them with a collection, which is a much more appropriate way to do that. Here it is. A list of product lines, named lines. Creating a new class, product line. It will consist of two fields, the product and the amount. So basically we're collapsing the eight properties into just two. Next we need the product itself and in that product, we will just need a single property, weight in pounds. This property will represent both these columns at the same time because remember, every product has either one of these fields or the other. So we will just need to convert that data into a single unit of mass. Note that we make it absolutely clear what unit of mass we are using here by specifying it in the property name itself. Also note that we don't add any other fields to this class. Weight is the only field we will ever need for cost calculation. So the other two fields, name and description, we shouldn't bring them in here and we don't. Again, the less code we have, the better. Finally, the last class we need for the new functionality is the delivery address. This is going to be a new property in the delivery class and we are also adding a new class for that property address. It will consist of a street, city, state, and the zip code. Here these fields are in the legacy database. Street, zip code, and the city and state merged together. We will separate these two fields when we'll be doing the conversion in the anticorruption layer. So here it is, our new domain model. Note that it's only a sketch, a starting point from which we will start growing it. We will get rid of these public setters shortly and we will also add the actual logic to our domain classes so that the domain model will not remain anemic for long and we will probably replace some of the primitive types here with separate classes, too, to get rid of the primitive obsession, but we'll see. We will not introduce any new abstractions unless we really need them and by the way, if you wonder what the term anemic domain model means and why I'm implying that public setters are bad for your domain model, check out my course about anemic domain models and refactoring away from them here on Pluralsight.
When it comes to working with a bubble context, it's useful to draw a map that shows how the concepts from the legacy code base translate to the new domain model. Such a map is called translation map. So let's do that. It doesn't have to be a fully fledged diagram of all classes and their properties and methods, just a high-level overview of the key concepts and how they translate to each other. The main class in our domain model is of course delivery. It will be translated from this delivery table; actually from both delivery tables in the legacy database. Next we have the ProductLine class. This class doesn't have its own separate table in the database, but we will still introduce this class because it will help us simplify the domain model. We will combine data from these eight fields into four instances of product line. We will go into the product entity and it will get information from the product table. Finally there is an address class and we will query data from the address table in order to instantiate it. So here it is, our translation map. Note that we are using the domain model structure when talking about the new bounded context and at the same time we are using the database structure when discussing the legacy system, not the legacy domain model. In theory, we could identify the set of legacy classes and integrate with them instead of the underlying database; however, it usually requires more effort as opposed to working directly with the database. In most cases, the legacy code base is in a much worse shape than its database and it makes sense to just not touch it altogether. And so we will implement the translation between the two bounded contexts on the data level without involving the legacy domain model or any legacy code for that matter. Our anticorruption layer will refer to the database directly and convert data from it into our clean and shiny domain model and it will do the backward conversion too, when we will need to save something back to the legacy database. We will see all that in the next module.
Identifying the Entry Point
Now that we have outlined the new domain model, it's time to discuss how exactly it will interact with existing application. The new functionality needs to be incorporated somehow, in a sense that the users need to be able to start using it along with the old functionality. The best way to do that is to have a separate UI, a separate entry point devoted exclusively to the new bounded context. That could be achieved differently depending on what type of application you are working on. If you have a web application you can create a separate web page that works with a new functionality and the new domain model only, although that's often not possible because the new functionality is rarely isolated from the rest of the legacy code base and so it rarely happens that you are able to align the UI along the boundaries of the new domain model perfectly well. Usually you need to have the new functionality in the already existing UI. If that's the case for your web application you can implement the so-called IFrame integration where you still create a separate web page, but integrate it into an existing one using the an IFrame. That could be tricky, but I actually had some success with this approach in the past. It allows you to achieve a pretty good separation between the old and the new UI and still present it to the user as a single unified interface. If you've got an API, then it's simple; you just have some of the API endpoints handled by the old code base and other endpoints by the new one. From the user's standpoint, it's not going to make a difference. And finally, the third type of application is desktop apps like the one we're working on in this course. You cannot have a completely separate UI here. After all, it's a desktop application and so you cannot just create another WPF UI for the functionality provided by our new domain model. In some cases it is possible, but in most it is not. Having two separate UIs to handle the same task is unacceptable from the user's point of view, which is understandable. Who would want to work into two apps, one for creating a delivery and the other one for calculating the estimated cost of it? In some cases though, you can create a different UI window just for the new functionality alone and that would be good enough. Generally you want to deal with the legacy code base as little as possible and the approach with the separate window will allow you to minimize such kind of interactions, but in our case, we are not able to do that either. As you can see here, the Recalculate button resides on this screen and it wouldn't make sense to move it to a separate UI window and it also could be an explicit requirement to preserve the UI as it is now so having a separate window is not really an option for us. We will need to incorporate the bubble context into the legacy code base. So how to do that? Here is where the calculation is currently implemented. What we will need to do is we need to replace this code with the new calculation logic from our bubble context. There should be some service that accepts information about the products and returns the cost estimate back. The input information is the IDs of the products that we can extract from these private fields and the amounts. Let's add such a service to the new code base.
Rethinking the Domain Model
I'll name the service EstimateCalculator and it will have a calculate method. Now what parameters should we add here? As we discussed earlier, it needs all four products from this view model and the amounts the user has selected for them. So here they are. The id of the delivery, and all products the user has chosen on the UI. Note that the IDs are nullable because there could be no products in those slots. At the same time, the amounts are not nullable. If there is no product there, the amount equals 0. We can now use this service in the view model. First, let's add a private field with a reference to that service. We need to reference the new project from the legacy one. Good! Let's instantiate the calculator and we can use it in this method now. EstimateCalculate.calculate. We can get the delivery id from the delivery object using this strangely looking property. Product 1 id comes from the product1 field and because it's nullable we need to use the safe navigation operator. If the field is null, this operator would return null, otherwise the content of this property. Exactly what we need. The amount for the first product is stored in this property and the same for the other three products. Let me just copy the code here. The compiler shows an error. That's because our service returns a decimal instead of double so we need to convert that value to a double here. Removing the old code. Good. Note once again that the new service doesn't conform to the old code base. While it would be more convenient to return a double from this service and avoid the conversion, we go with the decimal value nonetheless because it is more appropriate for representing a money amount. We could even introduce a separate value object class object for it named something like money amount, but we won't do that here because there is not much need in such a value object in this time, but it's definitely an option. Okay, so that is what the signature of the new service will look like and let me actually collapse these parameters to just two lines to save the space. Good. Now that we know what data will come into our new bounded context and what data we need to return back from it, let's review our domain model. Do we need to change anything in it? Actually, we do. Look at the delivery class. When we added these three properties to it we assume that the only data we would have at hand is the delivery id and all other information about it will come from the database through the anticorruption layer; however, now it is clear that the legacy system is able to provide not only the id itself, but also the information about products in the their delivery, their identifiers and amounts. And so we don't really need this collection anymore and because we should always follow the guideline I described above because we need to keep as little information in our new domain model as possible, it's a good idea to get rid of this property and it's also true for the estimate. We don't really need to save it to the database because the legacy code base does it already. Here the estimate gets recorded to this property and when the user clicks the OK button, the estimate is saved along with the rest of the data about the delivery and so we can get rid of this property, too. It's enough to just return it from the service. No need to save it in the delivery object after that. Let's fix the compilation errors and run the application to see if this code works. I'm returning 0 just to make sure all wirings between the legacy system and the bubble context is done correctly. If I added this package and click Recalculate, you can see that the cost is updated. Very good. It's always a good idea to make sure that all the moving parts are working together correctly before proceeding with the actual implementation.
Recap: Rethinking the Domain Model
In the previous demo, we might defy the new domain model we received more information about how it's going to be used. Note once again that we keep it lean in a sense that it contains only the information required to solve the particular problem, nothing more. No need to try to replicate the whole legacy project in the bubble. Let's look at the legacy database. The collection of product lines and the estimates are no longer in the bubble context so no need to query data from these fields in the old database. The same is true for the estimate field. We don't need to keep it in the delivery class anymore. Note that although we got rid of the ProductLine's collection, we can still keep the ProductLine class itself. We will convert the incoming data into instances of that class moving forward. So this is it. We have all the necessary pieces brought together and can proceed to the next step, implementing the anticorruption layer.
In this module we started working on the legacy project. We had our first requirement, changing the way the delivery cost estimate is calculated. We discussed the refactoring guidelines. You need to always keep in mind this formula. Project success equals the amount of assets, which is functionality, divided by the amount of liability, which is code, and that means you should always focus on delivering immediate business value. Refactor only those pieces of legacy code that have something to do with the new functionality you develop. Don't touch the other pieces. Try to keep the scope of refactoring as narrow as possible. Remember, the less the denominator in this formula, the better. At the same time, don't compromise on the code quality. If you need to compromise on something, reduce the scope, not the quality. Keep in mind that not all code is worth being refactored. You will not be able to ever achieve this goal and in most cases your effort will bring more value if applied to developing new functionality rather than working on not-so-important pieces of the code base. We talked about the concept of anticorruption layer. It is a layer of the software whose purpose is to separate the domain model from the outside code base. You need this layer to keep the domain model in the bubble context clean. When working on the new domain model, outline it first to get the idea of how it should look. In the process of it, follow the two guidelines. First, bring only those domain classes that you absolutely need to handle the problem you have at hand. Don't introduce additional concepts to the bubble context. And second, don't allow the legacy code base influence the new domain model. Design it as if there is no legacy code base whatsoever. It's the job of the anticorruption layer to glue the two together. The new domain model should now know anything about the surrounding legacy mess. In order to better understand the relationship between the bubble context and the legacy code base, write down the translation map. It will help you understand which concepts correspond to what data in the legacy code. You also need to identify the entry point for the bubble context. If you can, create a separate interface for it, either an API endpoint, a new web page, or a new desktop window. In some cases like ours, the entry point will reside inside the legacy code base. Don't hesitate to rethink the new domain model after you have all information about how it's going to be used. In our case, we were able to simplify the model by removing the estimate field and the product line collection from the delivery class after we discovered that the legacy code base would be able to provide this information and we don't need to query it from the database. In the next module we will proceed with the implementation of the anticorruption layer and we will also complete the task of calculating the delivery cost estimate.
Creating an Anticorruption Layer
Hi. My name is Vladimir Khorikov and this is the course about domain- driven design and working with legacy projects. In this module we will continue working on the new functionality for the legacy project. You will see the implementation of the anticorruption layer that will protect the bubble context from the outside world.
Creating the Anticorruption Layer
In the previous module we outlined the new domain model and identified the entry point for the bubble context. It is time now to start implementing the actual functionality. To calculate the cost estimate, we will need information about the delivery itself and the four products the UI passes to the calculator. As we discussed earlier, we should get this information from the legacy database through the anticorruption layer so that when our domain model works with that data, it deals with it in the form of clean domain classes we defined here and that means it is time to build the anticorruption layer. Let's create a folder for it. I am calling it ACL, which stands for anticorruption layer. So we now need to select data from the legacy database and transform it into the domain entities, delivery, and products. How should we name classes that will do this work? It turns out that we already have a name for a class that queries data from the data store and transform it into domain entities. It is repository and that's exactly what our anticorruption layer will consist of. It will contain a delivery repository and a product repository. Okay, so let's add delivery repository, cleaning this file, good. It will have a single public method, GetById. The implementation will consist of two parts, getting data from the legacy database and converting it into delivery. Here is once again the database diagram. As the delivery class needs only the information about the destination address, we need to query these two tables in order to get this info. Alright, so let's do that. I'm adding a method, GetLegacyDelivery, which will accept an id. Asked for its return type, it's a good idea to define a separate type that would be a one-to-one reflection of the legacy data. We need the delivery id and these three fields from the address table. This delivery legacy class will contain them for us. Here it is. As you can see, it consists of the delivery id which is this property and the address information. Note that we have made this class private and embedded it into the delivery repository. There is no need to expose it to the outside world. We only want this class to temporarily hold legacy data for us. Also note that we only defined the fields we need here, nothing more. No need to query more information than necessary. We will use Dapper to query the database, but before we do that we need the connection string so let's create a new folder, utils, and put a new class to it. I'm naming it Settings. It's going to be a simple static class with the connection string property. It will also have an init method that will accept that string from the outside. Let's go to the Applications composition root, extract the connection string into a variable, and initialize the Settings class with it. Note that I'm hard-coding the connection string here, but of course a proper way to do that would be to have an Application Settings file. Okay, getting back to the delivery repository. We can now instantiate the connection using the connection string from Settings class. This is our SQL query to get the info from the legacy database. Note once again that we are querying only one field from the delivery table. That's all we need for now. Finally, to execute this query we will use Dapper. It's a lightweight ORM that perfectly fits our needs here. We need to install it first, though. Good. Now just writing connection.Query, specifying DeliveryLegacy as the type parameter, the first parameter of this method is the query itself, and the second would be the anonymous type that contains all parameters we need to pass into that query. Returning the result, good. We can call this method from here. Good. That was the first step. Now we need to convert the legacy data into the delivery domain class so I'm writing MapLegacyDelivery. That will accept the LegacyDelivery and return the new one. Using Resharper to create this method, alright, so here is the data from the legacy database once again and here is the new domain class. Note that in the bubble context we defined the city and state separately in the address class; however, they are joined together in the legacy one. So we need to separate them. I'm calling split on this field. The resulting array would contain a city and state. Now it could be that the legacy data field contains null or that it only holds the city name or the state name but not both. We need to guard against that. We have the assumption that both city and state should be present in that field and so we are stating this assumption explicitly. At this point you need to have a conversation with the domain experts about what to do in this kind of situation. It's possible that you could come up with some default value for such cases. In our project though, we will just throw an exception. This means that we assume there can be no invalid data in the database, which is a strong assumption for a legacy application, but we'll go with it for simplicity's sake. Alright, now that we have everything in place, we can instantiate an address class. The street property corresponds to the old CTR field. Note that all text fields in the legacy database are represented as fixed-size char values which means that they are padded with empty spaces to match the size. Because of that we need to trim them before bringing to our new domain model. The same for city, which is the first element of the array, state, and zip code. Very good. After the creation of address, we can go ahead and instantiate the delivery object itself. Let me put a semicolon here and remove these comments. Perfect.
Strengthening the Domain Model with Proper Encapsulation
Note a couple of things in the current implementation. First, we didn't actually use the id property from the DeliveryLegacy class. We will fix that shortly. And second, the domain model itself is not properly encapsulated. It exposes public setters. That is not good because it's too easy to mess up with such classes. We'll fix that, too, but before that, let's introduce base classes so that we can factor out the common behavior from our domain objects. I'm talking about entity and ValueObject-based classes. To learn more about these DDD concepts, check out my Domain-Driven Design in Practice course or this article where I fully describe the differences between them. In short, entity is a concept with an inherent identity while a ValueObject does not have such an identity. When working with multiple entities we treat them as different instances even if their properties are the same. As for value objects, we can treat them interchangeably. Here for example, the delivery class is an entity. Even if we have two deliveries for the same address, we still treat them as two separate deliveries, not the same one. Address on the other hand is a ValueObject. If we have two instances of an address, we can replace one with the other as long as they have the same values in these fields. ProductLine is also a ValueObject and the product itself is an entity. Alright, so let's create a new folder, Common. It will contain all classes that are related to our domain model, but could be reused across multiple bounded contexts should we ever decide to create another one. The first one is going to be Entity. I'm copying its content here. Let's review it real quick. The class itself is abstract, meaning that we cannot instantiate it. We can only create classes inheriting from it. It has the id property which will represent the identity of the object and we also override the quality members. We deem two entities the same if they have the same identifiers and here are the quality operators and the GetHashCode method. With this addition we can now modify the delivery class. As I mentioned earlier, it is an entity so we need to inherit it from the base entity class and we will define a constructor. As you can see, it accepts the id with the delivery and its address. We can now get rid of the public setter on the property. In the delivery repository we need to specify the id of the delivery and the address. Very good! It's now impossible to accidentally forget to provide an idea or an address when instantiating a delivery. Let's add another base class, ValueObject. Cleaning it up and copying its content. Good! Note that if you watched my previous courses, you might notice that this implementation is different from what I usually use and it's true, it is different. This version is shorter and more concise. You can read more about the differences between this and the old versions here. Let's review this class, too. You can notice that unlike the entity base class, this one doesn't contain an identifier. That's because value objects don't have an inherent identity. Because of that, the quality logic differs, too. Instead of relying on the id property to compare two value objects, we compare them by their content. Here you can see an abstract method that when implemented in sub classes returns the components of the class. You will see its implementation shortly. Then in the equals method we compare those components to each other using this SequenceEqual method. It basically goes through each component of the ValueObject and compares it to the corresponding component from the other ValueObject and the same is in the GetHashCode method. We aggregate hashes of all components inside the ValueObject. The quality operators are the same as in the entity base class. Alright, let's now inherit address from ValueObject and implement the GetEquality components. Here's how we can do that. Yield return street, city, state, and zip code. As you can see, it's very simple and readable. Let's also add the constructor. I am selecting all the properties to be injected into it, moving the class into a separate file and getting rid of all setters as we don't need them anymore. Good! In the delivery repository we can use the constructor instead of assigning the properties. Street, city, state, and zip code. Deleting all this. Very good. Let's go through the remaining domain classes. ProductLine is a value object so I'm inheriting it from the ValueObject base class, implementing the quality components. Here we'll return the product itself and its amount and also getting rid of public setters. Moving the class into a separate file. The last one is product. It should inherit from Entity, removing the setter, adding constructor, and extracting it out of this file. Finally, we can move the calculator to its own file as well. No need to keep it together with delivery. Very good. We now have all the domain classes properly encapsulated.
Recap: Creating the Anticorruption Layer
In the previous two demos we created an anticorruption layer. Again, anticorruption layer is a code component that does the conversion between the bubble context and the legacy code. In our case the repository behaves as the anticorruption layer. It knows what delivery means for the bubble context and it also knows where to get it from. Note that when the bubble context doesn't have its own database, having repositories act as the anticorruption layer is often enough. In this case the anticorruption layer becomes part of that bubble. As the bubble grows and starts working with its own data storage; however, you need to promote the anticorruption layer into its own bounded context. We will do exactly that in the next modules. Also note that we did the conversion at the data level, meaning that our anticorruption layer goes directly to the legacy database. It's also possible to use some parts of the legacy code instead. For example, you could find an already existing data access class, use it to get the data, and then transform the result to the new domain model. In my experience, though, this is often a less optimal solution. It's usually hard to pinpoint the exact component in the legacy code base that does the job you need and so using it posing some challenges. This approach is also more error prone. You can think that you have found the piece of code you need, but it could be that it doesn't do all the required work. For example, it could return only partially initialized data. The database here is the ultimate source of truth. It's the most reliable source you can find to get the full picture in terms of the data you need. In our case we don't have any stored procedures. However, most legacy projects do have them. The guideline here is the same. If you think that you can use a stored procedure to get the required data and you are sure you will get it in full, use that stored procedure, otherwise write your own SQL queries. You also saw how we modified the new domain model. We got rid of all public setters and thus made the model encapsulated. Remember, never compromise on the code quality. Proper encapsulation is one of the key aspects of that quality. If you want to learn more about how to build encapsulated domain models and why it is important, check out my course, Refactoring from Anemic Domain Model Towards a Rich One, here on Pluralsight.
Implementing the New Requirement
We have the delivery repository ready. We can now use it to get delivery from the database by its id. Now we need to do the same for products. Let's create a product repository, which would be the second part of our anticorruption layer. Cleaning it up. Okay, the implementation is going to be mostly the same. The GetById method, method for retrieving the legacy product, a private class dedicated specifically to holding the legacy data. Again, we need to keep only the information we need in our domain model, nothing else. In the case of products, that means the weight columns and the id. Here they are. No need to keep the name or the description of the products. Now we're ready to do the querying, creating a connection, declaring a query, and using Dapper to execute it. Pretty simple, as you can see. I'm calling this method in GetById and now we need to map the legacy product into product and return it to the caller, using Resharper to generate the method. Now just as with the legacy delivery, here we also need to do some conversions. We need to merge the two fields with information about weight into one and because they both are nullable, here we can also end up in the situation where both fields contain nulls. That would be an illegal state from our domain's perspective because we expect the product to have at least one of these fields. In this situation, you need to once again talk to the domain experts and decide what to do. You could come up with some default value just for the sake of having one when doing the estimate or you might decide that a nullable weight is okay and you will use some predefined price tag for such deliveries when calculating the estimate. Regardless, that needs to be a business decision; you cannot make it on your own. For the purpose of this course we will just throw an exception and allow the application to fail fast. This could be an option for you, too, but again you need to discuss it with the business people. Alright, now we can calculate the weight in pounds. It is either the content of this field or if it's empty, the number of kilograms multiplied by 2.2. It's a good idea to define a constant that would represent this value in order to avoid magic numbers in code. Good. Returning a new product, with id, and weight in pounds. Very good. We have everything put together. We can now go to the calculate estimator and start implementing the actual calculation functionality. We have four product IDs here. They all are nullable, but we need at least one of them in order to calculate the estimate. We can indicate this precondition explicitly like this. Next the calculator needs to get the information about the delivery and the products from the database using the anticorruption layer. So I am defining a constructor and initializing both of the repositories here, saving them to private fields, and making those fields readOnly. Good. Now we can use the delivery repository to get the delivery by its id. If the delivery is null, that means that the client code provided an incorrect id so I'm throwing an exception. Alright, here is the estimated delivery cost formula again. We will get the weight from the products. As for the distance, we need to somehow convert the delivery address, which is part of the delivery object into the distance in miles between the storehouse and the point of destination. We will not dive into the details of how to do that for the simplicity sake and because it's not the primary focus of this course. Let's create a simple class, addressResolver with a single method, GgetDistanceTo that accepts an address and returns the distance in miles or nothing if the address is incorrect. Usually you would use some third-party API for that purpose. So let's assume that you do. Here goes a call to such an API. We'll be using 15 miles as the stop value. Okay, we need to create an instance of this class and save it along with the two repositories. Good. Now we can use it to get the distance. Once again, we need to put a guard clause here. If the distance is null, that means that the address is not found, which in turns means that the address the client provided is incorrect. Alright, we have the distance. Now we need to get the products and their weights. To do that, let's convert the four IDs and their amounts into a collection so that it's easier to deal with them. The collection will consist of the tuples that I'm declaring like this. Product1, amount1, and so on up to product4 and amount4. We need only those products that are not null, hence this where clause and another select statement that converts the tuple into a product line. I actually forgot to define a constructor here so let's add it real quick. Good. The product will come from the repository and we can use the amount as is, calling toList, declaring a collection, and let me collapse the constructor invocation into a single line. Very good. And if any of the product lines contain a null instead of a product, that means that the product id that the client provided is incorrect so I'm throwing an exception here. We have the distance and the product weights and that means we can finally calculate the estimate. We can do that here, but the actual calculation logic looks more like a responsibility of the delivery class so instead, let's create a method in that class, GetEstimate, that will accept the distance, the product lines, and return the cost. Generating it, renaming the distance parameter into distance in miles. Alright, if the distance in miles is less than 0, that's a bug, too, and we need to throw an exception. The productLine count should also be strictly between 1 and 4 because in our current domain model, that's the physical limitation of our software and any value outside of this range means we did something wrong and provided incorrect, probably duplicated product lines. Now we are ready to calculate the total weight. It's the sum of all product weights multiplied by their amount. This is the total weight in pounds. The final estimate is going to be total weight multiplied by distance, multiplied by 4 cents, plus $20, and let's once again introduce constants to get rid of the magic numbers in code. The 4 cents is the price per mile per pound and the twenty dollars is an unconditional charge that doesn't depend on the weight or the distance and we need to also round all this up to 2 decimal points. Very good. The calculation logic is ready. Here you can see the estimate calculator's code. Let's verify that the new feature works. Opening a delivery and you can see the cost has changed. It is now more precise that it was before. The new algorithm takes into account much more details than merely the number of products and if we save the product and open it again, you can see that the estimate is saved. Note that the saving functionality is part of the legacy code. We didn't implement it in the bubble context. What the new bounded context does is it provides the estimate itself and the legacy code base uses that estimate in place of the old one. The rest of its code base remained the same. Let's try one more. Recalculate, opening again, changing the amount. Very good. Everything works as expected.
Validation Errors vs. Preconditions
Let's take a look at all the checks we now have in the estimate calculator. Here they are. The check that the client code has provided at least one product, that the delivery exists in the database, the address is valid, and that all products also can be found in the data store. They are all errors, but not the same kind of errors. Now look at the first one for example. The situation when all four product IDs are null is a user input error. It means that the user has not yet selected any of the products, but still tries to calculate the delivery estimate. Obviously, it wouldn't work because we need at least one product to provide the estimate, but should we throw an exception in this case? No, we shouldn't. Exceptions are for exceptional situations only. Situations that represent a bug in our software. If all four product IDs are empty, it's not a bug and we shouldn't treat it as such. In this case we need to show the user an error message and the best way to do so is to incorporate the information about the potential error in the return type itself. We need to return an object that would contain either the delivery cost or an error message. Let's create a separate class for that. This is going to be the result class. If you are familiar with my other courses, you have encountered it already so I will not dwell on it here. The idea behind it is simple. It shows whether or not an operation was successful and if it wasn't, this class also provides the details of what exactly went wrong. Here you can see the corresponding properties with that information. These are static helper methods that simplify the work with this class and here is the typed version of this class. You can use it to specify a value in case of success. Alright, we can now change the return type from decimal to result of decimal. That will show the client code that this method can return an error and that this error must be processed somehow. As we decided earlier, the first error here is a validation error. So instead of throwing an exception, I'm now returning a Result.Fail with the same message. Now, what about the second one? The second error is not the validation one. If the delivery is not found in the database, that means something is seriously wrong in our software. It's not the user who causes this error, it is we programmers mess the thing up. The user actions could not possibly lead to this outcome. The delivery should always be in the database. This is a bug in our software and the best thing to do here is to fail fast, which we already do with this exception. No need to return a result object. Alright, what about the next one? Try to pause the video and categorize this error yourself and pause it back when you are ready to compare our answers. Okay, so this is also a validation error. We assume that we use a fully functional address resolver here and the only reason it may not return a distance is because the address we provide is incorrect, which in turn means that the user made a mistake when they typed it in. So we need to replace this exception with the Result.Fail, too. Okay, the last if statement here is also a bug. If any of the IDs the client code provided are incorrect, that means there is a disconnect between the IDs in the database and the IDs the UI provides us with. No need to modify this exception either. Finally, if everything goes well, we can call Result.Ok and pass the cost estimate as the return object. So the idea behind this distinction between the two types of errors is in whether or not the user can do anything about it. If they can, it's a validation error and we should return a failed result. If not, it's a bug and we should fail fast with an exception. You can read more about the differences between the two in this article. I also covered them in my Functional C# course so you might want to check it out as well. Now that we modified the signature of the calculate method, we need to adjust the client code. The returned value is either an estimate or an error. If it's an error we can show it to the user and return from this method. Otherwise assign the value to the CostEstimate property. Let's verify that everything works. Let's create a new delivery with some customer, address, in Washington, D.C., and specify a zip code. Now if we open it again and try to calculate the delivery cost, it shows an error that we must provide at least one product. Should we not change the behavior, the application would just crash and if we select a product and set its amount to 1, then everything works. Very good. Aside from distinguishing between validation errors and bugs, it's important to also explicitly specify preconditions in domain classes. Preconditions are the conditions that must be met by the client code when instantiating a domain object or calling one of its public methods. Let's take the product class for example. We instantiate it using an id and weight in pounds parameters. Are there any restrictions on what values those parameters can take? Well, the id should not less than 0. It would be an incorrect value for an id in our domain model. So we can specify this precondition explicitly like this. As for the weight in pounds, it should not be less than or equal to 0 because that would be a meaningless weight for a product. So here I'm specifying this precondition explicitly, too. Now there is a better and more readable way to express these preconditions. We can create a special class called contracts. Here is its content. It's a simple static helper class that accepts a Boolean precondition and an error message, checks that precondition, and throws and exception if it's false. Here's how we can use it. Instead of the if statement, just call Contracts.Require id greater than or equal to 0, and the same for the second precondition. The weight should be greater than 0 and we can optionally indicate the message which will be used when throwing an exception. Compare the two ways of declaring a precondition now. The option with the contracts class is not only more succinct, it's more readable, too. In the previous version, you had to invert this expression to make sense of it. That's because this expression is negated. It shows cases that are not acceptable from the domain class' point of view. By inverting this if statement, we save ourselves precious mental cycles. Now it's clear what this class expects, what the precondition here is. We don't need to infer it from the if statement with the inverted Boolean expression and if you wonder why I called this class contracts, that's because the notion of precondition is part of that proj names Design by contract. You can learn more about it here. I can delete the old version now and clean the file. Also note that the contracts class throws a special type of exception, ContractException. That's a useful way to indicate why the application has crashed. If you see this exception in your logs, you'll know right away that the reason for the crash is a violation of one of the preconditions. Precondition violations are also bugs. They indicate that we programmers introduced an error in the code base. Alright, let's also do the same in the product line class. Here the product should not be null and the amount should be greater than or equal to 0. In the address value object, all the input parameters should not be null and in the delivery, we can specify the preconditions for the id and the address. Note that we are using the if statement in the GetEstimate public method. We can invert the expressions in them with the help of the Contracts class2 for the sake of readability. DistanceInMiles greater than 0 and the product count is greater than 0 and not more than 4. Again, notice how more readable these two lines are comparing to the if statements just because you don't have to invert the Boolean expressions yourself when reasoning about the preconditions. It saves a lot of mental space.
Recap: Implementing the New Requirement
In the previous two demos we implemented the new requirement, calculation of the delivery cost estimate. Note how the presence of the anticorruption layer allowed us to protect the new domain model from the influence of the legacy code. That's the main benefit of having such a layer. It essentially allows you to combine the benefits of a full rewrite with a gradual refactoring, without having to deal with their drawbacks. Let's elaborate on this point. As we discussed in the previous modules, the benefit of a full rewrite is that you can implement a new domain model from scratch. It allows you to have a high development speed as you are not dragged by the existing legacy mess and it also helps with the team's morale. No wonders here, who wants to work on an unmaintainable legacy code base? The problem with the full rewrite is that it's a big investment. You will need to work for a prolonged amount of time just to catch up with the functionality already existing in the legacy application, which can be months or even years' worth of effort. On the other part of the spectrum is refactoring of the existing code base. Here the biggest benefit is having the working software at each step of the development process. When you move gradually, you can improve a single piece of code at a time while still delivering the immediate business value in the form of new functionality. That's a very business-oriented approach and you should definitely prefer to take it when you can; however, the major drawback here is that you need to deal with the legacy mess and depending on the scale of this mess, it can slow your refactoring efforts significantly, maybe even to the point where you don't show any progress at all, which is also unacceptable from the business perspective. The approach with the new bounded context, the bubble context, helps you combine the benefits of the two. You are able to both build a clean domain model and at the same time, it doesn't take you too much time to do that because the scope of this model is very narrow. As a result, you have a working piece of software at each step of the development process. Your project employs both the old and the new code bases, which is very similar to the approach with gradual refactoring. The anticorruption layer is a crucial part of this approach. Again, it is the component that allows you to decouple the new domain model from the existing code base. This decoupling is a great benefit which is hard to underestimate. With it you are able to reason about the new domain model separately as if the surrounding legacy mess never existed and you are also able to build the model itself in a proper way using all the best practices and modern coding techniques and not compromise on its quality. In our case, it allowed us to build a highly encapsulated and clean domain model. Before moving on to new business requirements, let's touch upon a couple other important points. The anticorruption layer we ended up with is read-only, meaning that it queries data from the legacy database, but doesn't save it back. That's because in our particular case we didn't have to do that. We incorporated the use of our bubble context into the existing user interface and so the cost estimate was events saved by the legacy code base, but the non-read-only version would work similarly. You could use the same repositories to codify the backward conversion from the new domain model to the old database. You should always remember this guideline we discussed earlier in this module. Bring in only the data you use. Don't introduce new data just in case because you might need it at some point in the future and when you do bring in data, remember this, adding data is a modeling job. It isn't just a field. In the legacy system that field means something, even if that meaning is a bit muddled. Adding it to the new domain model requires that you figure out that meaning, understand how it relates to other concepts in the bubble context, and work out the mapping between the two. All data should be properly encapsulated in a sense that you should not allow the client code to freely change that data in the new domain model. All operations with it should be properly guarded to disallow entering an illegal state. Don't just bring data from the legacy database for the convenience of viewing it alongside the other data. If you choose this lazy way, you will introduce a hole through your anticorruption layer and compromise your ability to evolve the new model independently from the legacy system. So again, proper translation means two things. First, you need to understand what the field means and how it relates to the existing concepts in the bubble context. And second, you need to make sure you maintain all invariants and preconditions in the new domain model. Finally, carefully plan the new features. Try to group features that work on similar data from the legacy system. It will help lower the cost of maintaining the anticorruption layer.
In this module we implemented the anticorruption layer. We created delivery and product repositories that queried data from the legacy database and converted it into the classes of the new domain model. Instead of directly querying the legacy database you could also use the data access code of the legacy project and then convert the results into the new domain model; however, this is often a less optimal solution. The database is the ultimate source of truth, the most reliable source you can find to get the full picture in terms of the data you need. We implemented the delivery cost estimation functionality. For that we build a new domain model with proper encapsulation. We also put an extensive amount of validations in our new code base. Some of them in the form of input validation and others in the form of assertions and preconditions. The difference between them is that input validations should not be treated as exceptional situations and thus should not result in throwing an exception. On the other hand, assertions and precondition violations represent bugs in our software and the application needs to fail fast when it encounters them. We built the new domain model as if it was a greenfield project and not a legacy one. We were able to do that due to the anticorruption layer, a crucial component that helps decouple the bubble context from the legacy code base. Note that even if the outside system is not a big ball of mud, you might still want to build some functionality differently from the rest of the system. If that's the case, the anticorruption layer will help you here, too. You will be able to introduce a new domain model that better suits the problem domain. Always maintain the bubble's isolation. Don't introduce data into it without translation and only do that when you need this data for implementing new functionality. Don't bring it in otherwise. We discussed the benefits of creating a bubble context with an anticorruption layer. It combines the advantages of a full rewrite and the gradual refactoring without having to deal with their drawbacks. That is, you create a brand new code base using proper coding techniques and best practices. At the same time, you have the working software at each step of this process. In the next module we will grow our bubble by making it autonomous. We will introduce a dedicated data storage for our bubble and we'll cut all its connections to the legacy database.
Making the Bubble Autonomous by Introducing a Separate Data Storage
Hi. My name is Vladimir Khorikov and this is the course about domain- driven design and working with legacy projects. In this module we will have yet another feature request, this time a more complicated one. It will require us to grow our bounded context into an autonomous bubble by introducing a separate data storage for it.
In the previous modules we implemented the delivery cost estimation functionality. We did a good job. We were able to implement it quickly despite the messy legacy code base and the database underneath it. The stakeholders are happy with it and they want us to continue and so we received a new feature request. And now the business wants to be able to create deliveries with more than four products. Let's look at our software again. Right now, if you go ahead and added one of the packages you will see that there is room for only four different products and if you need to add more, you will have to create two separate deliveries for the same customer and distribute the product lines between them, which is very inconvenient to say the least and it's understandable that the stakeholders want to improve this functionality. And so here we are with this new task. How could we implement it? This is the legacy database and these are the two tables that store information about deliveries in the system. After looking at them, it becomes clear why the application UI was designed this way and doesn't allow for specifying more products. It's because the tables are not normalized. All four products are hardcoded into the delivery tables' structure with these columns. Instead of extracting them into a separate table, the original developers just inlined them into the parent table, hence these two repeating column names, product line 1, product line 1 amount, and so on. As we discussed in the second module, it seems as if the application initially allowed for only two product lines to be added to a single delivery because the main delivery table has room for only two products and amounts and later when the business asked for the ability to add more, the developers instead of eliminating this artificial limit, they just extended it by introducing another table with two more sets of columns. It could be because the project's code base was a mess already at that time and so they decided to conform to the existing architecture instead of refactoring it, or maybe they felt a time pressure and needed to shape this update quickly with the intention to refactor it later. Regardless of the reason, now we have a suboptimal design decision we need to deal with. So what can we do here? One option we have at hand is to continue down this path and just extend the second delivery table with more columns. In theory we could introduce up to several dozens or even hundreds of new columns to this table and that would satisfy the business needs, but of course, that would increase the maintenance costs of this project beyond all reasons. We would need to deal with an enormous amount of duplication and if we would need to change anything in the structure of the product lines, for example, store a discount for each of the product lines that would be a disaster and so we will not go down this path. Obviously, the database structure needs to be updated. Instead of all these columns we need to have a separate table that would keep the information about the product, its amount, and the delivery it belongs to, but how to do that exactly? We cannot change the database because the existing legacy software depends on it and we definitely don't want to deal with a legacy code base. That was the whole point of having an anticorruption layer with the new domain model. That is where the concept of the synchronizing anticorruption layer will help us.
Synchronizing the Anticorruption Layer
Let's take a look at our existing bubble context once again. It consists of the new clean domain model and the anticorruption layer which we implemented in the form of repositories. The anticorruption layer does a good job of separating the domain model from the legacy mess, but it has limitations. Right now we are limited to operating only the data that exists in the legacy bounded context. It means that when we design the bubble domain model we can only introduce data that already exists in the legacy database. We are restricted by its structure. The bubble context is completely dependent on the parent context. Any new information that doesn't already exist in it must be added to the legacy system somewhere and in theory we can add it there, but this approach has its own drawbacks. If we add a new ProductLine table to the legacy database and use it from the bubble context it would contribute to the already high technical debt in the legacy system. When it comes to working on legacy projects, you generally don't want to build up upon the existing legacy code base or the database because that would inevitably lead to increasing software entropy. This additional table, even if designed and implemented well, would increase the amount of code we would need to deal with when working with the legacy code base and it will also introduce additional confusion when trying to make sense of it because it wouldn't be obvious why there are two tables with information about product lines. So we will not do that. Just as trying to refactoring the existing database, adding new tables to it is not the best option. What we can do instead is we can create a separate data storage for this information and synchronize it with the legacy one. Here's how it will look. The new data storage will hold all the information our bubble context needs. That means the bubble will become independent from the legacy database. It will now retrieve all required information from its own data store. The bubble will become autonomous and because of that it shouldn't reside inside the legacy project anymore. That's why I'm depicting it as a separate independent component and just as any other independent piece of software, it will have its own database. We will need to upgrade the anticorruption layer, too. It will need to synchronize the old database with the new one, not merely query the old database as it does now. Hence the name Synchronizing Anticorruption Layer. Because the anticorruption layer will significantly increase in scope, it would be a good idea to treat it as a separate bounded context from now on. It will also become a completely independent piece of software. We will discuss how exactly the anticorruption layer will do that in the next module. You can view this new bounded context as the next step from the bubble. It is now autonomous bubble as we are caught in the connection to all the legacy data. This autonomy will help us moving forward because we will not be limited to the legacy database anymore. You can see that it is more scalable. We are also preventing the legacy ball of mud from growing as we don't introduce new tables to its database.
Creating a New Database
Alright, I created a new database behind the scenes. This is its script. Use it to create your own copy of it. You can find it in the course's exercise files or on GitHub in this module's folder named SyncACL. You don't need to read it. I am opening it to just show that there is a bulk of SQL statements in here. You can run them in SQL Management Studio. And let's now look at the database more closely. First of all, look at its name. I entitled it PackageDeliveryNew to match the name with the name of the bubble-bounded context. It's always a good idea to name databases after your bounded contexts to avoid any misconceptions. It has three tables in it. Delivery, Product, and ProductLine. Here they are on a diagram. Let's compare to the legacy database' structure. The first thing you might notice here is that the names of these tables and columns in them are not weird, which is a big plus to the readability. Next, just as with the bubble domain model, here we also bring in only the data we use. Note that there is no customer table in here and the product table contains only a single column with the weight of the product, and by the way, this single column is named WeightInPounds to avoid any potential confusion about units of mass we are using. Note that there is no separate table for address either. All its fields are inlined into the delivery table. You can see here destination street, city, state, and zip code. That's because the concept of address in our domain model is a value object and has a one-to-one relationship with a parent delivery entity. Meaning that there can be only one address for a single delivery. The best way to handle data for such a value object is to inline it into the entities table. You can learn more about it in my Domain-Driven Design in Practice course or in this article. In short, there is a nice mental model you can adopt in order to understand why we are inlining this data. You can think of such value objects the same way you think of numbers. You wouldn't create a separate table for cost estimate, right? Although in theory you could. You could introduce something like CostEstimate table with an id, a value, and the delivery id, but it doesn't make any sense because it's just a single value. You don't need to keep track of it and if it changes, you just replace the number with a new one. You don't need to preserve the identity of that specific number. The same is true for these fields. They too represent a value with no identity, a value object. The only difference here is that this value spans cross multiple columns. Alright, and note also that I used a prefix named Destination for each of the address fields. This actually differs from the name of the property in the delivery entity so let me change it. Here it is. Renaming it to Destination, also renaming the related variables. Here it offers to modify some of the texts, but we won't do that. Good. Let's get back to the database. Of course, the city and state portions are separated in the new database. There is no need to keep them merged together in this column. All tables now have proper foreign key constraints, which is a great thing as it allows you to avoid data inconsistencies such as when a row in one table refers to a row in another by its id, but that second row no longer exists. Having foreign key constraints prevents such situations from happening and by the way, notice that the IDs themselves don't have an identity specification. That is done for a reason. We don't allow the database to automatically set IDs for deliveries or products. We need those IDs to match what we have in the legacy database. The only exception here is ProductLine. In it we do allow for automatic identity assignment because there is no corresponding table in the other database. We have full control over it. Note that I changed the columns' data types, too. No fixed-size strings anymore, only nvarchar and instead of flawed, the CostEstimate is represented with a decimal, which is a more appropriate data type for a money field, and finally the nullability of the columns reflects the preconditions of the domain model. For example, delivery should have a destination and all fields in the address value object itself are not nullable. Thus the fields related to the destination address in the delivery table are marked as non-nullable, too. The weight of the product should be specified as well, hence no nulls in this column either. Alright, here it is, a brand new database for our bubble context. Note that as you start everything from scratch, it's a good idea to follow database delivery best practices right off the bat. Put the database to the source control and implement a migration-based approach to keep track of any changes in its schema or reference data. Refer to my other course to learn more about it.
Adjusting the Domain Model and Persistence Logic
Now that our bubble context has its own database, we need to adjust the persistence logic. We will also modify the domain model itself. We need to introduce more information to it in order to handle the new requirement independently from the legacy database. For example, right now the delivery class doesn't reference the product lines because they are provided by the legacy UI. This will not be possible anymore as that UI simply cannot handle more than four product lines. So we will have to build a new UI window and the view model for that purpose. We will talk about the UI and a new entry point for the autonomous bubble shortly. For now let's focus on the persistence component. We currently have two repositories. They are both part of the anticorruption layer that retrieves data from the legacy data storage. With the autonomous bubble we won't need them anymore. Our bubble will no longer now how to convert the legacy data. That is going to be the responsibility of a new synchronizing anticorruption layer, which we'll introduce later in this module. Instead, let's add new repositories that work with the new database because these new repositories don't work with the legacy data, we will put them to the deliveries folder along with the domain model classes. The first one is doing to be the delivery repository. Changing it up, good. Note that as the database structure is robust and straightforward, we can use an ORM here. There is no need in handcrafted SQL anymore. The mapping between the domain model and the database is going to be simple and smooth, but because the use of ORMs is not the primary goal of this course, I'll skip them and we'll continue with Dapper. The scope of our project is not too large anyway so there is not going to be that big of a difference in terms of the amount of code here, but keep in mind that your project is most likely larger than this and so the use of fully fledged ORM such as in Hibernate or Entity Framework would be justified in your case. Alright, first of all, let's introduce some data classes. These are the classes that reflect the shape of the underlying database. This one is for the delivery table along with the data from the delivery table, we will also need information about product lines. Here's a class for it, too. Note that we are combining both tables into one, need to have a separate class for the product one. The main API here would be the GetById method. It will accept the id and return a delivery. Inside we will retrieve the raw data first and then map it into delivery object. So the algorithm is pretty much the same as with the old repositories. The difference here is that the source data comes from our own native data storage. Okay, the first method is GetRawData, it will accept an integer id, and return both a delivery and the list of product lines in it. I'm using the new C# 7 syntax for turning a value tuple, creating a new SQL connection, and here is the SQL query. Note how clean it is. No weird names or joints. The first select statement retrieves data from the delivery table and the second statement combines data from both the product line and product tables. The SQL returns two result sets so we need a special Dapper functionality for that, QueryMultiple. It returns a reader which we then use to read the results one by one. The first result is the delivery data and the second is a list of product lines, returning them both, good. Note that Dapper handles all the mappings between the properties in the data classes and the data from SQL. We don't need to worry about that. Alright, using that method to get the raw data, and mapping it to delivery instance. Creating the method, good. Now with all this data we can instantiate a delivery object, passing the id, the address, it will consist of the destination street, and three other parameters. Let me clean it up. Perfect! At this point we need to extend the delivery class. As you can see here, it only contains the information about the destination address. As I mentioned earlier, that's because the current implementation of the bubble doesn't need to store the product lines. The legacy code passes them to the bubble and it also doesn't need to keep the estimation cost because the bubble returns this estimate to the legacy context and it then saves it on its own, but because we won't have this tight integration with the legacy UI anymore, we will need to keep all this information in our domain model. So let's add a list of products to be delivered and an estimate. I'm using Resharper to initialize these properties from the constructor. Good. And let me also put a couple more preconditions here to make sure the entity is always valid. The costEstimate should not be negative and the collection should not be null. Now we can pass the estimate to the constructor. As for the lines we need to first convert them from the lines data. So for each line data instantiating a new product line, the product, and the amount. Saving to a local variable, using the delivery constructor, and returning the delivery. It looks good, but we are not done yet, actually. As you can see here, the ProductLine table also contains an id and we will need this id in our bubble context, too. We will use it to match the product lines in our domain model with the product lines already existing in the database to see which of the lines are new, deleted, modified, or unchanged. And because of that, we will need to treat the ProductLine domain class not as a value object, but as an entity in order to be able to assign an identity to it. It's easier to deal with value objects instead of entities and so it's unfortunate that we need to do such a modification to our domain model, but you will need to make tradeoffs like this one from time to time; they are inevitable. We just will not be able to easily differentiate between product lines if we don't keep the IDs. In our case, the tradeoff is between the domain model purity and the simplicity of the persistence layer. So replacing the value object base class with entity, I need to accept an id in the constructor and pass it to the base 1, adding a new precondition, and removing the GetEquality components method. Unlike ValueObject, the base entity class handles the quality by comparing the id of the entity, not its components. Now I can pass the id to the ProductLine class. Good. The delivery repository is ready. Let's create a product repository real quick. Create a new class. Good. So here is the product data with the id and the weight in pounds, a GetRawData method, a method that converts this data to a product instance; the public GetById method will call both of them one by one. Good. And because we now use the new database we also need to pass the new connection string to the bubble context. So let's rename this one to LegacyDatabaseConnectionString and introduce a new one, BubbleDatabaseConnectionString. It will target the PackageDeliveryNew database because remember, this is how we entitled the new database. The new connection string will go to the Settings class, which is part of the bubble and just to show you the difference between them, the DBHelper class is part of the legacy code base; here it is.
Recap: Creating a New Database
Let's recap what we did in the previous two demos. We've promoted the bubble context and made it autonomous. The bubble now has its own data that is stored in the native format. In other words, we don't need to make convoluted translations between the data storage and the domain model. The mapping is simple and straightforward. Our bubble context has grown up as we've cut all connections to the legacy data. When creating the database we followed the same guideline we discussed in the previous module. We designed it as if there was no legacy database whatsoever. The database structure is clean and follows all the best practices, like for example, having foreign key constraints. We didn't allow the legacy code base to influence our design decisions. Note that although we used plain SQL to retrieve data from the database with the help of Dapper, it is now possible to employ a fully fledged ORM like in Hibernate or Entity Framework and if your project is more or less complex, you would probably want to do that because such an ORM will save you a lot of time that you will not have to spend writing custom SQL queries. Another benefit here is that you can now test the new code base without any mocks or other test doubles. It is possible to write integration tests that involve the new database without touching the legacy system. We introduced the new versions of the delivery and product repositories. Unlike the old versions that comprised the anticorruption layer, the new ones are not part of that layer anymore as they work with the bubble's database instead of the legacy one. We still have the remnants of the old anticorruption layer in the bubble, the old delivery and product repositories, but we will soon move them into a separate bounded context, a synchronizing anti- corruption layer that would not be part of the bubble anymore. After that, our bubble will not know anything about the anticorruption layer. We also updated the domain model to include data that we didn't have to include previously, the delivery cost estimate and the collection of product lines. All in all, the presence of its own data has made the bubble more scalable. It's now much easier to scale and evolve it as we are free to introduce data that is not present in the legacy system. Note that upgrading the bubble to an autonomous bubble is quite a large leap. The separate database means that the bubble's maintenance cost will increase significantly, but if you have long-term plans for this project, the leap is worth it. It's an investment that will pay off in the long run.
Identifying a New Entry Point for the Bubble
Now that we have the new domain model and a separate storage for it, it's time to think about the user interface. Look at the existing UI window again; it has space for only four different products and we will not be able to reuse it because of that. Even if we add several more lines here, this design is fundamentally flawed as the number of product lines is fixed. What we need to do instead is allow the user to come up with as many or as few lines as they want. We can add some technical limitation, for example, set the maximum number of lines to 100, but we shouldn't have to list all those hundred items up front on the UI. So we will create a separate window for our new functionality and that is going to be a new entry point for our bubble context, which is good news as we will be able to decouple our code base from the legacy system even more. Previously, we had to inject the bubble into this window and call it from the legacy ViewModel. Now we will have our own UI code working with the bubble. As we discussed earlier, we cannot have a separate executable with our own UI for the bubble and so we will still need to incorporate it into the legacy application, but it's not going to be a lot of work either. What we'll do here is we will add a new Edit Package window alongside this one. The main window then will have another button that will show up this new window. When it comes to anything that relates to the bubble context, it's a good idea to keep it as isolated from the legacy as possible. So we will create a new folder in the UI project that would contain our new user interface. I'm calling it DeliveryNew to match our naming convention. Alright, the details of developing WPF interfaces is not the primary goal of this course so I will not bother you showing the XAML files. Here I added a XAML view for editing a package offscreen. I also created a stub view model class that would glue our domain model and the view together. It already contains all the orchestration. For example, here is the delivery domain object the view model will represent. It will display its product lines and the cost estimate and here are the commands that correspond to the buttons on the UI. The user will be able to add a new product line, delete existing one, and recalculate the estimated cost. What we need to do is we need to fill out the gaps in the logic. I'll show you how the stub looks, but before that we need to fix the compilation errors that we introduced after refactoring the domain model. The first one is in the delivery repository. This repository is part of the old anticorruption layer that we used before making our bubble autonomous. The error is due to the fact that we added two more parameters to the delivery's constructor. We won't be using this class anymore as we will introduce a new version of the anticorruption layer soon. So let me comment this line out and return null instead. The second issue is in the EstimateCalculator class that the legacy UI called in order to get the estimate. We won't use this class either because we will have a brand new UI window for this functionality that will work with the delivery domain object directly so I'll comment out this code as well and replace it with returning 0. If we run the application, edit this package, and recalculate its cost, you can see that it turns to 0 due to our changes. Note that it might be a requirement for you to keep the functionality of the old UI intact. In this case you will need to maintain the old anticorruption layer and keep it functioning. This is rarely the case though, because what's the point in investing into this activity if we have a new, better UI for the same task? So we won't do that and here is the new UI itself. It's a stub and doesn't exhibit any functionality yet, but it's still helpful to see what the interface will look like before starting code in it. This textbox will display the cost estimate and this grid will list all the product clients in the delivery. Alright, we now start filling the gaps.
Implementing the New User Interface
First of all, we need to instantiate an address resolver, a delivery repository, and get the delivery to work with by its id, and next we will handle the recalculation functionality. Here is the method that will be responsible for that. In it we need to first check that there are some product lines selected. After that we'll use the addressResolver to locate the distance between our warehouse and the destination address. If the return distance is null, this means the destination address the user specified is not found. Finally, we can calculate the delivery cost. Here we already have the method that does it for us, but we will need to modify it. It was written with the assumption that the delivery class doesn't store either the cost estimate nor the list of products and so this method accepts this list as an input parameter and returns the estimate as a result in value. What we need to do instead now is we need to remove the parameter and replace it with a property. We don't need this part of the construct anymore because we got rid of this limitation. Our packages will now be able to have more than four products. Changing the text, and this last bit. Good. Aside from getting rid of this parameter we also don't need to return cost estimate. Instead we can update the CostEstimate property like this. And of course, we need to introduce a setter for this property in order to be able to do that and let me also rename this method. It doesn't return anything anymore so I'm calling it RecalculateCostEstimate instead of GetEstimate. I can now use it in the ViewModel like this. And I also need to notify the UI that the CostEstimate property has changed so that it refreshes the textbox. Let's start the application once again. This method will be called after we click the Recalculate button. For now we don't have any product lines, hence this error message. The other two buttons on this UI screen are adding a new line and deleting an existing one. Let's implement the corresponding commands for them. The first one for adding a product line and the second one for deleting it. If you wonder what this link expression is about, that's for determining when to enable the button. This expression means that the Delete button will only be enabled when the user selects a line in the grid. The input parameter here is of type ProductLine because the grid's data source is a list of product lines; here it is, the data source. Alright, to delete the line we need to create a corresponding method in the delivery class, DeleteLine. I'm using the Resharper to generate a method for me. Okay, currently the lines are represented as a read-only collection. So we need to introduce a private backend field with a collection that we can modify along with the property for the public consumption. Here it is. The property will now act as a proxy on top of this field. In the constructor, we need to assign the list of lines to the field instead of the property and to delete a line we can use the remove method on the mutable list of lines. Good. After removing the line we need to also notify the UI about the change in the list of products. Let me run the application. There is nothing to delete yet so I cannot show you the deletion functionality, but you can see that the button is now disabled. That's because we need to select a line in the list in order for it to become active. Note that the grid contains a name column. That's the name of the product and it makes sense to display it along with its weight and the amount the user has ordered, but if you look at the product domain class, we didn't include this field to our domain model initially and now it is clear that we do need the product name so let's add it here, the property itself and initialize it from the constructor. It's also a good idea to mention it in the contracts. All products must have a non-null name and now let's change the product repository, modifying the product data class that represents our data model and passing the new field as a parameter to the constructor. Note that we don't need to do anything else here because our select statement retrieves all the fields from the product table and Dapper does the matching between those fields and these properties automatically for us. There are a couple more compilation errors left. The first one is in the product repository. This is also a part of the old anticorruption layer. We won't be using it anymore so I'm commenting this code and returning null instead of the product itself. The second error is in the delivery repository. Here we also need to select a product name when we query the delivery with its products from the database. So I'm adding a product name here and here in the SQL query. By the way, we have an error in this query. It should be p.ProductID, not l.ProductID. Now we can pass the product name to the constructor. Very good. Let me close all these tabs and return to the ViewModel. Let's test what we have so far. For that let's manually add a product line into the database. Any product would suffice here. Running the application. Very good. Here it is in the data grid. And we can even delete it from here. Of course, we haven't yet implemented the save functionality so the change will not be persisted in the database. If I reopen this package again, you can see that The Best Pizza Ever is still there. We can also recalculate the delivery cost estimate. Very good. Let's now implement the addition of new product lines. That is going to be another window with its own view model, AddProductLineViewModel. It will allow us to select a product and specify the amount. Let me run the application. It will look like this part of the old UI, the product itself and the amount. The difference would be that we will only allow for selection of one product. Another window will be responsible for choosing a product among all products in the database. It will open after clicking the Change button. Alright, just as before, I'm adding XAML views and stub view models for these two windows of screen. Here is one for adding a product. Here you can see it has the product and the amount properties, which we can use after showing the window to the user. So if the dialog result is true meaning that the user has clicked the OK button, then we will add a line to the delivery and update the UI. If we run the application here's how it looks on the interface, the two fields just as in the old UI window. This window doesn't have any functionality yet so let's implement it. The OK button will close the window by setting the dialog result to true. The CanSave method will indicate when the OK button is enabled, when there is some product is selected and when the amount is greater than 0. The Cancel button will just close the window without checking anything and the Change button will allow us to choose the product. Here I also instantiate a new view model. This one will show all products in the database. Here is the field that the UI will bind to and here is the selected product. To show all products we need to add a new method to our repository, GetAll. It will assign them to this list, generating the method and here is its content. As you can see, it selects all products from the database and maps them to the product domain class. The OK command will close the window with the precondition that some product is selected by the user. This product will automatically be saved to this property. This is done by the XAML binding. The Cancel button will also close the window, but without indicating a success. Very good. We can now show this window to the user and if they click OK, record the selected product into the property field and notify the UI about the change. Finally, we can add a product to the delivery. For that we will introduce a new method in the delivery class, generating it, changing the name of the parameters, and adding a new line to the line collection. The id is going to be 0 because there is no such line in the database yet. Running the application. Let's select the fridge and the number of 3 items, good. You can see the fridges have appeared in the list. We can also recalculate the cost and the textbox will show us the correct cost estimate. Finally, we can delete the pizza. Very good. Everything works intended, everything except saving the delivery to the database. Let's fix that.
Saving the Delivery to the Database
We have two buttons left to implement on our new user interface, OK and Cancel. So let's go back to the ViewModel. Here are the commands for these buttons again. Resharper marks them as non-used. The Cancel button will not do anything side from closing the window. As for the OK one, it needs to save the delivery package to the database. For that we will be introducing a new method in the repository. Let me generate it, move it up. Okay, here we also create a new connection and execute this query first with these parameters. What this query does is it updates the delivery table with a new cost estimate and deletes all the product lines belonging to that delivery. This is the database schema. We are updating this field and removing product lines from this table. After all, product lines are cleaned up we are recreating them with this query. Here is the execution of it. Let me move all these parameters to separate lines. Good. Note that we are passing a collection of objects with this SQL statement, not just a single object. Dapper will iterate through this collection for us and run the query for each of them. Pretty handy. Let's launch the application and see how it works, adding a couple of fridges, deleting the pizza, and recalculating the delivery cost, reopening the package. Very good. You can see that all the changes have been successfully persisted. Two new fridges, the pizza, and new delivery cost. Alright, let's quickly fix a small issue with our code. As you can see on this diagram, the cost estimate is actually nullable whereas if you look at our data model, it is marked as non-nullable and the same is in the domain model. Let's bring them to consistency. Changing the CostEstimate's type to nullable decimal and doing the same in the delivery domain class. We need to adjust the view model. If the CostEstimate is null we are showing 0. Very good, and let's also look at how we deal with product lines. We are not actually using the ProductLine id here. We only operate the ProductID, the Amount, and the DeliveryID. Here these fields are on the database diagram. The ProductLineID is managed by the database itself. It assigns a value to it on each product line insertion. The reason why we are able to disregard the ProductLineID in our code base is because of our approach to persisting them. We just delete all the product lines in a given delivery and recreate them from scratch. We could do matching instead when we determine which lines are new, which are deleted, and which are unchanged, and that could potentially bring some performance benefits because we would be able to reduce the number of SQL queries needed to update the delivery, but that would complicate the code. In most enterprise-level applications, performance is not that big of a concern to introduce such a complication and so it's a good idea to aim at code clarity and simplicity first and only after that target performance. There is another benefit to this approach; as we don't need to deal with the ProductLineID we can actually drop it from our domain model. Remember when we first defined the new domain model for the autonomous bubble we had to turn the ProductLine class from a value object to an entity in order to be able to keep track of its identity. Now we don't need to do that anymore. Its identifier is irrelevant for us and so we can turn it back to a value object, which is great news because one of the most important guidelines in domain modeling is that you need to prefer value objects over entities. So let's refactor the ProductLine class once again, changing the base class to ValueObject of ProductLine, implementing missing members, removing the id, we don't need it anymore, and in the GetEquality members, return the product and the amount. Very good. Keep in mind that the process of domain modeling is highly iterative. There are a lot of new insights and tradeoffs you need to make along the way and sometimes during refactoring you will come full circle to a design decision you had at the beginning. You shouldn't view this activity as wasteful; it's a learning process and you gain a lot of domain expertise as a result of it. I'm showing you the full picture of such a modeling and not just the end result to give you an example of how this process might look. Alright, we've introduce some compilation errors; let's fix them. In the delivery class, I can now get rid of the 0 and in the delivery repository we don't need to pass in the ProductLineID to the constructor either. Let me remove it from the ProductLine data and from here as well. Let's verify that we didn't break anything. Adding our pizza back, the TV set, and the number of 3 items, deleting the fridge, and saving it. Very good. You can see that everything works as before.
Recap: Introducing a New User Interface
In the last three demos we implemented a new user interface for our autonomous bubble context. Just as the old database couldn't handle the new requirement, the old interface couldn't accommodate our needs either. It had place for only four product lines and so we had to come up with a new window in order to be able to add an unlimited number of product lines to a delivery and we did it. The new interface perfectly fits our needs. It allows for both estimated cost calculation which was the first requirement on this project and addition of any number of product lines. The introduction of new user interface is quite a lot of work, but at the same time, it's easier to work with. That's because it's perfectly lined up with the boundaries of our bubble context and now all the layers of the bubble are brought together. The database, the domain model, and the UI all work within the same bounded context. Except for the small amount of code we need it to inject the new UI into the legacy application, the bubble has no connection to the old code base whatsoever. The separation between the old and the new code bases has become more explicit. It's important to maintain the separation. Don't allow the bubble to query data directly from the legacy database. It would quickly lead to erosion. It would damage the new code base. Any new data the bubble needs should be added to its database first. Because the domain model and the database are now perfectly aligned together, we are able to use a fully fledged ORM here like in Hibernate or Entity Framework. There is no need in manually written SQL queries anymore. Note that we didn't remove the old window, but rather we now have both the old and the new windows working on the same functionality. You typically don't want to do that, though. The whole point of writing the new interface is to replace the old one with it. We will keep the old window for illustration purposes to show how the data is synchronized from the new database to the old one in the next module. Also note that while we didn't use a lot of value objects in our new domain model, in a larger and more complex system we definitely would. For example, we could introduce a dollars value object to represent the cost estimate, which currently is of type decimal or we could introduce a separate type for the concept of product weight. We could call it pounds, which would eliminate any misconceptions about the unit of mass. We are using it to measure the weights of the products. Overall, getting rid of primitive obsession is a good thing, but there is not much benefit in doing so in our particular situation aside from increased readability so we will not do that. Should the situation change, however, for example, should we need some additional logic associated with the delivery cost or the product weight, we would definitely add those value objects to our domain model. If you want to learn more about primitive obsession and how to get rid of it, check out my course about functional programming here on Pluralsight.
In this module we dealt with the new business requirement, adding more than four product lines into a single delivery. We couldn't implement it using the existing database because it only allowed us to introduce maximum four lines to a package and so the strategy we employed in the previous modules wouldn't be enough here. The old bubble could only work with the data that exists in the legacy database and it could only do that in a way that is translatable back. So we had to grow our bubble into an autonomous bubble by cutting all connections to the legacy system. We introduced a separate database for that autonomous bubble. We also defined a new user interface for it. The new bounded context now has the full vertical slice of functionality from the UI to the database. Cutting the connection to the legacy system means that we removed the anticorruption layer from the boundaries of the bubble context. Well, technically it's still there, at least the remnants of it, but we will soon transform it into a bounded context of its own. It will synchronize the old and the new databases. While previously it was only the legacy system who was unaware of the anticorruption layer, now both the bubble and the legacy system will not know anything about it. When creating the database, just as with creating the domain model, we didn't allow the legacy system to influence our design decisions. We implemented the system as if it was a greenfield project, only looking at the requirements we had at hand. We didn't introduce any data we didn't need to implement the new requirement. While working on the new functionality, we built upon the domain model from the previous modules. We modified it several times until we came up with a satisfactory solution. Note that while doing so we came full circle with the ProductLine class. It started as a value object which we then converted to an entity, only to convert it back to a value object when we realized that we don't need to handle its identifiers. It's not uncommon to have your design decisions reverted. View it as a natural part of the process of domain modeling. In the next module we will talk about the anticorruption layer. We will promote it to a separate bounded context that would synchronize the legacy database with the new one.
Promoting the Anticorruption Layer to Synchronizing Anticorruption Layer
Hi. My name is Vladimir Khorikov and this the course about domain- driven design and working with legacy projects. In this module we will be working on the anticorruption layer. We will promote it to become a synchronizing anticorruption layer within its own bounded context.
Upgrading the Anticorruption Layer
In the last module we promoted our bubble context into an autonomous bubble by cutting all connections to the legacy system. The bubble is now essentially a piece of software in its own right. Despite the fact that it's still integrated into the legacy application, it has a separate set of UI windows, a database, and of course a domain model. This autonomy made the bubble more scalable. We are now able to introduce functionality that would not be possible otherwise. For example, in the last module we overcame the issues of the old database that didn't allow a delivery to have more than four products in it. The new database doesn't have these issues. Here you can see the two UI windows. On the left there is the old window that has space for only 4 products. On the right there is a window we developed that can handle a variable number of product lines in the delivery and that doesn't have this limitation. This is all good, but the benefits of decoupling the bubble context from the legacy code base came at a price. We now have two independent databases, the new and the old one that handle the same data and we need to build a synchronization mechanism that would sync the changes between the two and do that in a way that would avoid any inconsistency issues, but before we do that, let's talk about why we would synchronize the changes in the first place. Indeed, can't we just remove the corresponding tables from the legacy database? After all, we have developed the new UI that handles the deliveries so why keep the old delivery tables in place? Well, when it comes to working on legacy projects, it is virtually impossible to delete anything from their databases. In our case it's impossible to delete the deliveries because the legacy system has other features and they rely on the presence of these tables. For example, when we create a new delivery by specifying a customer and the destination address, this delivery goes to the old delivery tables and when we are marking a delivery as in progress, this feature too looks at the old tables. So the old package editing functionality is not the only one that relies on them. Should we remove the old delivery tables, we would need to refactor the legacy code base to make it work with a new database instead and that is something we are trying to avoid due to the high maintenance cost of the legacy system. And even if we were able to isolate some part of the database and replace the functionality that works with it using our autonomous bubble, it's likely that some other external system still needs that part. Unfortunately, legacy systems are usually built without putting too much thought in proper integration techniques and so different legacy applications can work with the databases of each other directly by passing each others' code bases, which means that we are pretty much stuck with the legacy database for as long as all those other applications need it. Note that sometimes you can remove some tables from the legacy database, but those would likely be something insignificant, not the core delivery tables like in our case. If you are 100% confident that those tables are not used by anyone, there is no reason to keep them around and so you definitely should delete them, but once again, the core legacy tables would most likely have to stay for an indefinite amount of time. Because the original architects of the legacy systems have chosen integration through the database directly, those tables are your integration point from now on. You will communicate with external systems through those tables and will also receive updates via them. Note that integration through the database is one of the worst kinds of integration out there, precisely because it cements the database structure and prohibits you from ever changing or refactoring it. A better approach would be to keep a dedicated database for each application and communicate not via the databases, but using explicit messages. Alright, let's get back to our application. Our goal now would be to build the synchronization mechanism between the old and the new databases in a way that doesn't lead to any inconsistencies between the legacy system and the bubble context.
Deciding on Data Ownership
The first step towards synchronization between the databases is to decide who owns what data. This will allow you to build the data flow between the legacy system and the bubble. Let's recap how the processing of a delivery looks in our case. The first step is creation of a delivery. All delivery packages in our system are created by users. Here you can see on the UI that we can click New Delivery, select a customer, say Devices for Everyone, choose some address, and the delivery appears in the list. During this first step the legacy application creates a new role in the delivery table and a new one in the address table. At this point the delivery has a customer and an address assigned. Also, its status is set to new. Note that no rows are created in the second delivery table because we don't specify any product lines just yet; that would be next step. At this point we can edit the package by selecting one or more products here and calculating the estimated cost of delivery. When we do that, the status changes from new to ready. So during the second step the legacy application modifies these columns related to the product lines, creates a new row in the second delivery table, and updates the status field to Ready. Let's outline this tab here as well. Finally, the last tab would be to mark the delivery as In Progress. This doesn't do anything other than modifying the status field of the delivery. Here is how we can depict it on the diagram. Alright, so this how this process looked for the legacy application. As we have developed the bubble context, the second step is now handled by that bubble so the data flow should look like this. As soon as the legacy application creates a new delivery, the anticorruption layer should pick up that delivery and push the information about it to the new database. Then when the bubble bounded context modifies the product lines, the anticorruption layer has to pick up those changes, push them to the legacy database, and also update the status of the delivery to Ready and this would be our dataflow. Note what information we pass between the databases. After the first step, we pass the information about the delivery's id and address. We don't pass the status nor do we pass the customer. The bubble doesn't use that information and so there is no need to copy it. On the way back from the bubble to the legacy storage, we copy the cost estimate and the product lines. Let me show it on the database diagram. We copy the estimate to this column and we also transform the product lines into these columns in the two delivery tables. That means that we create a row in the Delivery To table if needed. Also, despite the fact that we don't store the status of the delivery in the bubble, we know that each time we update the packages in the delivery, we need to set the status to ready and so the anticorruption layer will do that, too. So here it is, the flow of the data between the old and the new bounded contexts. Now that it's clear who modifies what, we can set up explicit data ownership, that is we can specify which bounded context is the master of what data. Who is the source of truth so to speak and who is merely a reader of that data? It's clear that the legacy system should be the master of the delivery identifiers because it creates new deliveries. It also must handle any changes in its address and customer because that's what it does in step 1. The bubble context reads that data, reads the id and the address of the delivery, but it should not be allowed to change it. On the other hand, the bubble is the master of the product lines and the cost estimate. The legacy code base should not be able to modify it anymore. It should get it from the bubble from now on. It's important to explicitly outline these responsibilities because that would allow you to avoid any raised conditions, situations when both bounded contexts simultaneously change some set of fields and bring the whole system into an inconsistent state. It's also helpful if you need to debug a synchronization failure. If you see that the two systems have different copies of data, for example, different copies of product lines for the same delivery, you don't need to manually merge those lines between the two systems. You know that the bubble context always has the latest version and so we can disregard whatever version the legacy database has and replace it with a copy from the bubble. Note that when pushing the product lines back from the bubble context to the legacy database we will be able to copy only four of them, simply because the legacy system cannot handle more. In our particular case, we don't actually need to do that because the application doesn't do anything with those product lines after we specify them. We are copying them to the legacy database for demonstration purposes only to show how the synchronization works in both directions. However, in a real-world application you will need information about all the lines when you send the delivery to the customer. That means this feature about extending the number of products a single delivery can handle, it will become a cross-cutting feature and you will need to extend the bubble to handle this part of the application as well. Okay, to enforce the boundaries between the two systems we need to prohibit changing the product lines and the cost estimate by the legacy part of the software and the easiest way to do that is to disable the OK button on the UI so that the user will not be able to save any changes here. Let's do that. I'm changing the OK command. The first delegate here indicates whether or not this command can be executed and if I make it so that it always returns false, it should do the job. Running the application, and you can see the OK button is disabled now. I can recalculate the cost, modify the amount, nothing will be saved because the button is always disabled. Very good. There is now no way for the legacy code base to modify the product lines and the cost estimate.
Along with the data ownership and data flow, another important concern you need to consider before implementing the synchronizing anticorruption layer is the synchronization strategy. That is how exactly the anticorruption layer will do the synchronization. Let's look at our data flow again. When the user create a delivery the next step for them is to specify the product lines and because it's natural to want to do that immediately, it's important that the delivery is copied right away without any delays because if it's not, the bubble context will not be able to find the delivery in its database and thus will not be able to show the information about it to the user and so having this requirement at hand, how should we implement the synchronization? We will need to do the polling. The anticorruption layer will have to actively sample the legacy database and see if there are any changes in delivery tables. The same is true for the other way around. The anticorruption layer will be polling the bubble context's database in order to see if there is any information it needs to copy back and for that we will need a mechanism to differentiate deliveries with changes. There are two options for that. You can introduce a flag in the delivery table that would tell if the delivery has changed and create a database trigger that would set that flag on each insert or update. The anticorruption layer then would read rows with that flag set to true and reset it back to false. This way you won't have to modify the legacy code base. The change will be limited only to the database itself. The other option is to create a dedicated table or even a database for the anticorruption layer and keep a copy of the legacy data in it. The anticorruption layer then will be constantly comparing its copy with that of the legacy database and push the changes to the bubble as soon as it notices them. A variation of this option is to keep not the full copy, but just a timestamp, but that requires the legacy database to actually have such a timestamp that would automatically change on each update. In our project, the legacy database doesn't have it. This second option requires no changes to the legacy database, but it takes more effort to implement. We will go with the first option though. In most cases it provides a better balance between purity and the amount of work we need to put into it. So here is our plan for the delivery table. We will introduce a new flag to it and a trigger that would update that flag each time the row is modified or inserted. Along with that, we will also create a small table named synchronization with another flag in it. The trigger will update it as well. We will do the same in the bubble's database, too. The reason for this other flag is that the anticorruption layer will be sampling changes in the deliveries pretty often and may potentially put a lot of pressure on the legacy database. This separate table will allow us to offload that pressure and now the anticorruption layer will be sampling just this one table instead of all the existing deliveries. It will query the deliveries only if this table's flag is set to true. Alright, that's it about the delivery tables. We have another table that we will also need to synchronize, products. Here it is on the diagram. The bubble keeps information about them as well. For this one, though, we can implement another synchronization strategy. As the products don't change as often as deliveries, we don't have to comply with the same time in the requirements. It is most likely enough to run the query once in some period of time, maybe even just once a day. You need to talk to the domain experts to determine what would be the optimal interval between the updates. In our sample project, we will go with one update per hour. The update itself is going to be simpler, too. The legacy bounded context is the master of all data in the products table and so this data will flow in one direction only, from the legacy database to the bubble.
Preparing the Databases
I've made the modifications we discussed previously. Here you can see the synchronization table in the legacy database. The anticorruption layer will monitor this table in order to see if there are any changes in this database. The table itself is just a single column and if I open it, there is only one row here. Having this separate table will allow us to offload the pressure from the main tables. I have also added a new column to one of the delivery tables. Is synchronization needed? Note that while it is not nullable, it has a default value attached to it and so the legacy software will continue working just fine even though it doesn't explicit a value for this column. I've also created a trigger named AddressSync that fires every time an address is updated or inserted. There is no need to create a trigger for delivery itself because we assume that all deliveries are created with a corresponding address and we don't need anything from the delivery table aside from its id. Here you can see the trigger sets the IsSyncNeeded flag to true for the corresponding roles in the delivery table. It looks at the inserted and deleted rows, joins them together, and if they don't match, updates the flag. Inserted and deleted are the special in-memory tables. If you do an update, Deleted will contain the old version of the updated rows and inserted the new one. With this comparison, we make sure that the address was indeed changed because technically you can do an update that sets the same value in those columns and this trigger will still fire and this one is for insertions. If the street column in the deleted row is null, that means it was an insert, not an update. We want to capture such events as well. So the IsSyncNeeded flag will be set in these two situations when a new address is inserted, which means there is a new delivery and when the address itself is updated. Along with modifying the flag in the delivery table, we are setting it in the synchronization table as well. This will signal the anticorruption layer that there are some changes in the legacy database. Note that there are the same filtrations here. Alright, let me close all these and let's review the bubble database. Here you can see the same synchronization table with the same structure as in the legacy database. This is done for the same reason, to offload the pressure from the main delivery table when the anticorruption layer samples this database for the new changes. The delivery table itself also gained a new column, IsSyncNeeded. And here is a trigger for this table. The idea behind it is the same. Whenever a delivery row is updated, we set the IsSyncNeeded flag for this row to true and update the synchronization table. Here we check for the cost estimate only because it's the only information we need to sync from the delivery table back to the legacy database. Also note that this trigger is for updates only. We don't need it to fire after inserts because the only way new deliveries will come to the bubble database is via the anticorruption layer. We only want to capture changes incurred by the ad user. There is another trigger for the product line table which monitors changes in the delivery's product lines. Here we set the flat for the corresponding delivery row whenever we insert or delete a product line. These are the two pieces of information we need to pass back from the bubble to the legacy database. The CostEstimate and the ProductLines. I've added SQL scripts to the solution so that you can run them on your own to create the triggers and update the databases. This one is for the legacy database and this one is for the bubble. Alright, it is time now to create the anticorruption layer.
Creating the Orchestration
As we discussed earlier, the anticorruption layer will now reside in its own bounded context and so we will introduce a separate executable for it. I'm creating a new project, a Console App. Changing the location, and the name. Note that in a real-world project you would want to create something like a Windows service, not a console application. We will go with the console for the demonstration purposes, but keep in mind that all the code from here can be used in a Windows service as well. So how the anticorruption layer will look? First, we need to create a separate thread in which we will do the synchronization work. I'm adding a task for that purpose named deliverySyncThread. Second, we will need a class that does the actual work for us. Let's call it DeliverySynchronizer and we will need the same for the product synchronization, a separate task and the new class that will do the job. Note that the separate task is needed because these two jobs will be executed with different intervals. Also note that we could use a timer instead of explicit threads, but timers have one shortcoming; they tend to fire once in a given period of time regardless of whether the previous job is completed and so because we are going to check for new data every second, it could be that a new timer tick will come before the previous synchronization event is finished processing. This is not good because events could start piling up one upon the other and we could potentially end up in a situation of thread starvation when we don't have threads left in the thread pool to handle new timer ticks. We could avoid that by disabling the timer for the time of processing the event and then reenabling it back again, but it's easier to just use threads instead. With them we are guaranteed to have only one active worker thread at each given time. Alright, we also need a cancellation token to cancel the work of two threads when we want to stop the anticorruption layer. The two connection strings, one for the legacy database and the other for the bubble, and the intervals between the synchronizations for both threads. Let's start initializing all this machinery. Cancellation token, delivery synchronizer, it will accept the two connection strings. Let me create the constructor. Good. I'm instantiating the task with some sync method, which we'll create in just a second, and with the task creation option saying that we need this task to be long-running, which essentially means that there will be a separate thread created underneath it. It will not be running on threads from the thread pool. Creating a stop for this method and starting the task. In the sync method we will run a loop until a cancellation is requested. Inside this loop we will ask the delivery synchronizer to sync the deliveries and then go to sleep for the given period of time. I am passing the cancellation token here in order to be able to interrupt the delay if we need to cancel the task. We need to mark the method as async to use the wait keyword and wrap everything in a try-catch statement because it's the highest level possible where we can and should handle all exceptions from our anticorruption layer. After we catch an exception, we can log it and re-throw. Re-throwing is optional. You might want to just continue the loop until an explicit cancellation is requested. Let me create a stop method for logging and another one for syncing. Good. After all the initialization is completed, we can output this message and wait until any key is pressed. After that, we are cancelling the tasks using this cancellation token source, wait for the first one, and for the second. This wait will not interrupt the synchronization in the middle of it. We will always wait for the current job to complete and because we are using the cancellation token here, the task delay can potentially throw a task cancellation exception so we need to catch that exception separately. This exception doesn't signalize any errors. It's used solely for interrupting the delay so we are not doing anything here. Very good. The remaining part here is the product synchronizer. Instantiating it, creating a new task with another sync method, and starting that task. Let's write some stub code in the synchronizers to verify that the orchestration works. Here we will output this message and another one in the product synchronizer. Okay, the last piece here is this Sync2 method. We could copy this method and modify it for the needs of the product synchronizer, but we will generalize it to avoid code duplication. We will pass a delegate and a time interval in this method. Call this delegate and use the interval. Now we can call this method passing a reference to the delivery synchronizer's sync method and the time interval. Let me put them on separate lines and the same for the product synchronizer. Good. We can run the application now to see how this whole orchestration works, making the console at the start new project, and you can see that the threads successfully outputted the messages to the console. There is only one message from the product synchronizer because it fires only once per hour while the delivery synchronizer is called every second. Perfect.
Synchronizing the Bubble with the Legacy
We can now proceed with the delivery synchronization. Let's put both synchronizers to separate files. Okay, we will need the connection strings this class accepts in the constructor. So I'm saving them to class fields. The sync method will do two jobs, synchronization from the legacy to the bubble, and from the bubble to the legacy. We will start from the first method. Let's outline it before starting the implementation. If the synchronization from the legacy is not needed, return, otherwise ReadUpdatedLegacyDeliveries, convert them into bubble deliveries and save the end result. This is what our plan looks like. To check if the synchronization from the legacy is needed, the anticorruption layer will query the synchronization table. We'll be using Dapper here so let me install this package. Installing, good. So the synchronization table is the first place we're checking to see if there are any updates in the legacy deliveries. If there are any, we are reading them from the database and we use a special delivery legacy class that represents that the legacy data structure. As we discussed earlier, we will need the id of the delivery and the columns related to its address. So I'm adding them to this class. Inside this method we need to create a connection to the legacy database and run some query that would select the legacy data. Now let's talk about what this query should look like. First of all, we need to select the deliveries that are marked with this IsSyncNeeded flag. Note the update lock SQL hint here. It is needed to avoid deadlocks. With it we are locking those deliveries beforehand in order to update this flag on them later, which is the second step of this query. With this second step we are resetting the flag back to false and finally we are resetting the synchronization table as well so here it is, our main script. The triggers set these flags to true and the anticorruption layer reads the marked deliveries and resets the flags back. The next step would be to convert the legacy data into the new format. Let's introduce another class to represent this new format, delivery bubble. It contains the fields that the bubble expects from the legacy database, the id and the address. In the method we are mapping each legacy delivery. Let me create another method for that. Here we can actually reuse the code from the old anticorruption layer. You can see in the old delivery repository we have this code for converting the legacy delivery to the new domain class. Our code here would be essentially the same. It will validate the city and state, split it, and record it to the new class along with the other address fields. Finally, we can save the bubble deliveries, creating a connection to the bubble database, and here is the query we'll be using. The anticorruption layer will first try to update the delivery and if there is no rows with such an id, it will insert a new one. Note once again that no triggers will activate here because we are not touching the fields that those triggers listen for. The trigger on the delivery table monitors the cost estimate column whereas we insert or update only the address and the delivery id. We are able to do that because we explicitly decided which bounded context owns what data and only copy that data to maintain the unidirectional data flow. Alright, and here's how we execute this query for each delivery. Dapper will iterate through this collection for us and will execute this query for each item separately. Let's comment this method for now and see how the synchronization works so far. I'm adding a new delivery, selecting a customer, typing in some address in Washington, D.C. Okay, if I open the legacy delivery table you can see that the trigger has successfully marked this row with this flag and it also updated the synchronization table. Now let's run the anticorruption layer, making it the startup project and launching. If we go to the database and run this query again, you can see that the flag is reset and the same in the delivery table. The delivery is no longer marked for the synchronization. Let's verify that it was copied to the bubble context. And here you go. It is here, which means that I can now open it in the new UI window, and the application will successfully do that for me. The two databases are fully synchronized. Very good. Let's do the backwards synchronization from bubble to legacy. The plan is going to be the same. We are looking at the synchronization table to see if there are any new updates, read them, convert to the legacy deliveries, and update the legacy database. Pretty straightforward. Here is the first method. It is similar to what we wrote previously. In the ReadUpdatedBubbleDeliveries 1, I am creating a connection to the bubble database and declaring this query. In it we too lock the delivery table using this update lock hint and select the CostEstimate. Next, the query selects the product lines that correspond to the updated deliveries. Then it resets the flags in the deliveries and in the synchronization table. Again, the bubble context is the master of the cost estimate and the product lines. That's why we are selecting only that data and nothing else. There are two selects in this query so I'm using the QueryMultiple feature of Dapper. With the first one we are reading the deliveries and with the second, the product lines. Let me introduce a new class for product lines. I called it ProductLinesBubble to be consistent with our naming convention and I'm also introducing a new property in the delivery class. It will hold data about the deliveries product lines. So here after we selected all the deliveries and product lines, we can assign that new property to each delivery. Now the delivery class is complete. It contains both the cost estimate and the products. Okay, the next step is to map all bubble deliveries into legacy deliveries. We will do the mapping separately for each element of the collection. Now look at the DeliveryLegacy class. It actually misses some columns here. We need to convert the cost estimate and these product lines into these nine columns and in order to do that we would need those columns in this class, too. So here they are, all nine of them. Okay, let's get back to the method. For each bubble delivery we need to create a corresponding legacy instance with the id and the estimate. Now we need to map the product lines. If the number of lines is more than 0, then it means that there is at least one product in the delivery so we can do this. Assign product id 1 and amount 1 and the same for the other three product lines. We basically convert the collection into this old and quirky data structure, adding it to the resulting collection and returning it to the caller. Finally, the last step is to save the deliveries to the database, creating a connection to the legacy storage, defining this query, and executing it. Note that as we only assign data about cost estimate and product lines here this query will not activate any triggers either. The trigger in the legacy database only monitors the data of which the legacy bounded context is the master. Alright, let's make sure the anticorruption layer works. I'm launching the layer itself and the application. Let's open the new Edit Package window and add a product line. Let it be a couch, in the number of 2, and I will also recalculate the delivery cost. Saving it. If I hit Refresh, you can see that the status has changed to Ready, which means that the anticorruption layer has already synchronized this delivery to the legacy data storage. And indeed, when I open the old Edit Package window, which works with the legacy database, it does show me the product in here and the cost estimate. Let's quickly review the database as well. There is no IsSyncNeeded flag in here which means that the anticorruption layer has successfully reset it after copying the data to the legacy database. Let's do one more test. If I delete this line and save the delivery, you can see that it gets deleted from here as well. Perfect. We have a fully functional synchronizing anticorruption layer now. The only remaining piece here is the product synchronizer. I will leave it as an exercise though. There is no need in such a complexity here because we decided to synchronize the products only once every hour and it would be more straightforward, too. Just select all products from the legacy database and update or insert missing products in the bubble database.
Recap: Building the Synchronizing Anticorruption Layer
In the last three demos we built the synchronizing anticorruption layer. We started with the databases. In them we created triggers that monitor the changes in the deliveries and mark those deliveries for synchronization. Note a couple of things here. First, the triggers only monitor the data the bounded contexts own. For example, the legacy bounded context is the master of the delivery id and its address and so the triggers only mark a legacy delivery for synchronization when the user creates a new one or updates the address of it. This allows us to set up a clear data flow between the bounded contexts and avoid merge conflicts, and second, we also added a separate table named synchronization that the triggers update as well. We did that in order to reduce the pressure on domain tables in the database. Because the anticorruption layer has this requirement to propagate the changes between the bounded contexts very quickly, it would have to sample them every second and that would create a lot of stress on the delivery tables. Of course, we wouldn't notice it because the databases in our sample project are small, but in a real-world application that could become an issue. Having a separate table takes off all this pressure and allows us to avoid this potential performance problem. After we prepared the databases we implemented the anticorruption layer itself. It now resides in its own bounded context and is hosted by separate process. Unlike the previous version of the anticorruption layer, this one does the synchronization asynchronously in the background. Because of that you need to think about eventual consistency between the bounded contexts and decide what kind of synchronization strategy each piece of data needs. In our case, the deliveries needed to be synchronized very often in order to enable the user to edit the packages with a new UI without delays. The products, however, could be synchronized in a different pace. There is no rush to bring the two databases to consistency here. Because the anticorruption layer resides in its own bounded context, the bubble now doesn't know anything about it anymore. At this point it should be clear that the anticorruption layer is a significant piece of software in its own right. It should be explicitly addressed in budgeting and planning and that is its main drawback. It requires quite a lot of effort to implement. Still, if your legacy application is unmaintainable, which may legacy applications unfortunately are, this could be your only option in order to avoid the full rewrite; however, the benefits of this approach are great, too. As we discussed earlier, it allows you to combine the pros of full rewrite and gradual refactoring in a sense that you will be able to work on a greenfield domain model while still delivering business value, which is the best combination you can get in the world of legacy projects.
In this module we created the synchronizing anticorruption layer. It synchronized the autonomous bubble with the legacy database. To do that we explicitly outlined the data ownership between the bubble and the legacy bounded contexts. This is an important step that helps you avoid raised conditions and merge conflicts. After that we built a data flow diagram between the older and the new systems. We discussed synchronization strategies. Different types of data require different approaches in terms of the maximum delay before the data has to appear in the other system. For example, deliveries must be synchronized very quickly within a couple of seconds. Products on the other hand can wait. We used database triggers in order to mark the deliveries that changed. The anticorruption layer then propagated those changes and reset the marking. Another option could be to keep a copy of the legacy data in a separate data storage so that the anticorruption layer could compare the two versions. This would allow us to avoid adding the flags, but this approach requires more work and usually is less performant. That's the reason why we went with the triggers. The anticorruption layer was the final step in ensuring the autonomy of our bubble context. Now we have essentially two separate pieces of software brought together and it opens up a lot of options for enhancing the existing functionality. We are no longer bound by the shortcomings of the legacy code base. We can create our own and evolve it in parallel with the existing legacy mess, which is great news for both as developers and the business people as we both get what we want, the ability to work on a greenfield domain model on one hand and as a steady stream of new functionality on the other. In the next module we will talk about further ways of dealing with the legacy project. We will talk about turning it into a microservice with its own Rest API and about building a domain event channel on top of it.
Exploring Further Ways of Dealing with the Legacy Project
Hi. My name is Vladimir Khorikov and this the course about domain- driven design and working with legacy projects. In this module we will explore other ways of dealing with legacy projects such as turning them into microservices and building a domain event channel on top of them.
When the Anticorruption Layer Is Not Enough
In the preceding modules we did an excellent job building the new functionality without touching much of the legacy code. The anticorruption layer, especially the synchronizing anticorruption layer is a great tool that enabled us to do that, but there still are situations where it is not enough. The anticorruption layer is a flexible, but highly specialized piece of software. We made it synchronize our bubble bounded context with the legacy system, but what if there are several bubbles like ours? Each of them would have their own domain models, otherwise they would be just one bounded context and each of them would work with the legacy application in a similar, yet different way. What would we do in this case? If we were to apply the same approach we used previously, we need to implement a separate anticorruption layer for each of the bubbles. We'd not be able to reuse the same one and that could result in a lot of repetitive work. This additional effort could easily become prohibitive. One option here could be to keep only one bubble and put all new functionality to it and that would work in most cases. However, there are situations where having only one bubble just doesn't make sense. For example, if you are dealing with a large legacy application and need to work on different parts of it simultaneously, you would want to split this work among several developers or teams of developers to reduce the communication overhead and improve the speed. The approach with the anticorruption layer could become an obstacle in such a situation. Are there any ways to overcome it? There are a couple. Let's review them in this last module of our course.
Exposing Legacy Assets as a Microservice
One option to improve the situation and make the anticorruption layer reusable is to expose the legacy application to other bounded contexts as a microservice. Here's how it could look. Because most legacy code bases are not designed to be used in scenarios with network communication, you would need to introduce an adaptor on top of its code base that would do that. This adaptor will expose some JSON based REST API that would work either with the legacy code base or with its database directly like in our sample application. The adaptor would essentially act as an anticorruption layer; just as an anticorruption layer, it would do the translation. The difference here is that this translation would be not between the legacy system and some particular bubble, but rather between the legacy system and the REST API. This would enable the reuse of the translation effort. Here we are abstracting away one of the recipients of the translation and allow any bubble to connect to the legacy assets using the same API endpoints. Note several important points here. First, you will need to introduce a separate set of data contracts that the client bubbles will use to communicate with the legacy microservice. You should not use the classes that already exist in the legacy application for that purpose. For example, if the legacy code base has a class for delivery, like in our case, don't use it as is in order to serialize data into JSON. Convert it into a nice-looking data transfer object and serialize that object instead. The resulting set of DTOs, data transfer objects, would become the data contracts between the legacy microservice and the external bubbles, or as Eric Evans quoted, the published language. Now when you use the API from the other end, from the bubble bounded contexts, you shouldn't use those data contracts as is either. This published language is still external for the bubbles and they need to convert it into their own domain models with their own ubiquitous languages. Remember, any information crossing a context boundary must be translated. Messages or data contracts are not somehow neutral. They are always expressed in a language based on some system of abstractions and you shouldn't let them enter a bubble that uses a different language. In practice, that means you should always keep the bubble's domain model separate from the data contracts. Note that I'm using the term's message data contract and data transfer objects as synonyms here. In that sense, when you are working in a microservices environment, you always end up having an anticorruption layer. That's because you always translate the incoming messages into the language of your domain model before reacting to them. This very act of translation protects your bounded context from the external influence as you are able to keep its model isolated at all times. You can see that each microservice has a built-in anticorruption layer inside.
Building a Domain Event Channel
Another option is to build a domain event channel out of the anticorruption layer. The main idea here is to take the synchronizing anticorruption layer which we ended up with in the previous module and instead of pushing updates from the legacy database to the bubble, make it push events on a message bus. This way it will announce all the changes happening in the legacy system and all interested parties will be able to subscribe to those changes and act accordingly. The main difference between this approach and the microservice one is that the event channel provides a push model as opposed to pull one. With the microservice you need to pull the legacy system yourself, whereas the event channel will push the changes to you automatically. This is useful when you build a bubble that doesn't really need much information from the legacy application and doesn't need to modify anything in it either. For example, if you need to log how many deliveries the legacy system ships or keep any other kind of statistics, the domain event channel would be a perfect fit for you. With it, you are able to encode any required information in the events themselves so that the bubble won't ever need to reach out to the legacy application. The system will tell you what's going on with it in a series of events. Note that it only works if the bubble doesn't need to change anything in the legacy system. If it does, then you could still employ the event channel, but you will have to combine it with the microservice approach. You need to apply the same guidelines when you subscribe a bubble to this event channel. Events, just like regular JSON messages are expressed in some language, too, so you need to translate it into your own domain language when processing those events. Both these approaches scale really well and you can even apply them together. In both cases you are able to effectively reuse the anticorruption layer when connecting external bubbles with the legacy application.
Here are some useful links that I referred to in this course. The source code for this course is available on GitHub. It contains three folders with different versions of the code base, initial anticorruption layer and synchronizing anticorruption layer. You can implement all the requirements yourself or go ahead and look at the end result. If you need an introduction or a course on the topic of domain-driven design, I recommend my Domain-Driven Design in Practice one. This is the original paper on the topic of anticorruptional layer from Eric Evans and this is the article about the differences between entities and value objects.
We've made a great progress in this course. You have learned about the anticorruption layer, what it is, and why it is useful and how to implement it in practice. To summarize, it is useful because it allows you to combine the benefits of a full rewrite and gradual refactoring in the sense that it allows you to work on a greenfield domain model while still delivering business value. We implemented new feature requests in several steps. For the first change in the cost estimate calculation logic we introduced a bubble with an anticorruptional layer in the form of repositories. It worked with the existing legacy database. For the next feature we had to introduce our own data storage because the old one wasn't enough for us. We promoted the bubble into an autonomous bubble by cutting all connections to the legacy system and turned the anticorruption layer into a synchronizing anticorruption layer. This autonomy allowed us to scale the bubble separately from the legacy application. Finally, we discussed further ways of dealing with the legacy system, exposing it to the outside world as a microservice and building a domain event channel on top of it. Both techniques help you reuse the anticorruption layer among several bubbles and scale your software even further. Be sure to subscribe to my blog where I will be putting announcements about more courses about domain-driven design. This is the short link for you to do that. Also, feel free to get in touch with me. Here are my email, Twitter handle, and blog. This is Vladimir Khorikov, thank you for listening.