Introduction to MongoDb with .NET part 44: a summary
July 27, 2016 Leave a comment
In the previous post we saw how to set the read and write preferences for our MongoDb operations in code. We can set these options at various levels: at the client, at the database or at the collection level. We can also specify our preferences directly in the connection string.
The previous post was also the last in this series dedicated to MongoDb in .NET. We’ve come a long way and it’s time to summarise what we have learnt.
MongoDb is a document based database that stores its data in BSON documents. In fact it is the most popular NoSql database out there at the time of writing this series. It’s used by a wide range of companies as their data stores. The default choice for storing data in a .NET project has most often been SQL Server. While SQL Server is probably still the most popular choice for .NET developers they can choose from other well-tested alternatives depending on their project needs. MongoDb is very easy to set up and start working with.
It is a very flexible storage mechanism that lacks a fixed schema, i.e. there are virtually no constraints. We can store just about any JSON in any collection. The most important advantages of MongoDb are the following:
- Dynamic data structure with flexible schemas: you don’t need to define columns and tables. You can in fact store pretty much anything within the same collection
- Data migrations become a lot easier. If you change your domain structure the document will store the objects correspondingly. You can force a change in the schema through changing your custom objects automatically
- MongoDb collections can represent our records in a much more object oriented way than relational databases. Object graphs can be directly stored in a document. If you extract a single item from a collection then you’ll immediately get its associated objects: orders with their order items, rock bands with their concerts, making it a breeze to perform operations on those linked objects
- Due to the lack of constraints such as secondary keys updating and deleting items will be easier, e.g. there’s no need for cascading deletes
- Scalability: MongoDb is highly scalable. We can easily create database clusters with primary and secondary nodes to ensure that our data store is always available
- It’s free. You can be a paying customer and get enhanced assistance from MongoDb but installing and using MongoDb at scale doesn’t cost anything
- Speed: MongoDb is very fast and efficient in querying and inserting items in a collection
MongoDb also comes with a number of disadvantages:
- Lack of professional tools: with SQL Server you can use SSMS for some very advanced GUI-based database operations, such as database profiling, SQL jobs, a query editor, IntelliSense and a whole lot more. There’s no equivalent in MongoDb.
- MongoDb doesn’t support transactions
- The lack of a schema is actually a disadvantage: you cannot associate objects through keys, you cannot force a compulsory data structure with rules like “NOT NULL”.
- No stored procedures and triggers
- Business intelligence tools of MS SQL have no counterparts in MongoDb
Things we have gone through
Aggregations with the different stages are a very neat feature in MongoDb that can get you started with complex analysis of data. Aggregations are also an entry point into Big Data analysis in MongoDb. I personally think that the way aggregations are built with stages where one stage passes a transformed document to the next makes them easier to work with than their MS SQL equivalent.
C# objects can be used to “translate” between BSON documents and POCOs. We can decorate the C# objects with Mongo related attributes that declare how a certain property must be serialised. An example is BsonElement where we can specify what a property is called in the JSON, e.g. C# Address is serialised into “customer_address” in its JSON equivalent. It can be argued whether a C# object with Mongo attributes is still a real POCO that can be used as a pure domain object in Domain Driven Design. MongoDb attributes break the principle of persistence ignorance of DDD. I would only use those C# objects in the concrete repository as a middle translation layer between the domain objects and their MongoDb collections. However, if your project is a not a good fit for DDD then you can obviously go ahead and decorate your POCO objects as you need.
MongoDb is quick and efficient as it is but your queries can be made even faster and even more efficient by inserting the necessary indexes. MongoDb offers indexes in much the same way as relational databases do. We can insert indexes on individual properties, array fields, text fields, properties within sub-documents etc. The query plan is a very helpful tool when you are trying to find the optimal index mix. The most important output in a query plan is the name of the index used, the number of documents investigated and the number of documents returned. The goal is to read as few documents in the collection as possible.
The write concern
When we write to the database, i.e. insert or update a document, then the new or updated document is at first persisted to memory, not to disk. The records in memory are persisted to disk somewhat later in an asynchronous fashion. The lag is not dramatic, it can be a second, a little more, a little less, depending on how much data there is in memory. However, there is this lag and if the database server dies within that lag then the data not yet persisted to disk yet will be wiped out from memory. It won’t magically be recovered after a server restart.
By default when we send a write or an update to the server then we get an acknowledgement back saying whether the operation was successful or not. If the acknowledgement says OK, then it means that the new or updated document was persisted to memory. It is not a 100% guarantee that the new data was saved on disk as well. The write operation is abbreviated by “w” and has a default value of 1 which reflects the scenario we have just described, i.e. we want an acknowledgement of the persistence to memory. To be more exact, 1 means that we want the acknowledgement from one node in the database. If we have a single node in our database then 1 is the highest value we can specify. In a database cluster, also called a replica set, we can increase this value if we want to get the acknowledgement from 2 or more database nodes which will of course take more time to complete.
The write parameter in inserts and updates is accompanied by another section in memory for MongoDb, called the journal. The journal is a log where MongoDb registers all insert, update and delete operations that were sent to it from the client. The journal is also a collection where the new documents are at first persisted in memory and then later on to disk with a small lag. The log becomes important when a collection needs to be recovered after a server crash. All documents that figure in the journal but not in the collection on disk will be inserted/updated according to the journal. However, if a document was not written to disk AND the related insert/update operation was not yet persisted to disk either then it’s lost forever. When we insert a new document to the database then we don’t wait for the related journal entry to be persisted to disk by default. The journal part of the persistence options is abbreviated by “j” and has the default value of false.
Together “w” and “j” make up the write concern and as we said above their values default to 1 and false. That’s what makes MongoDb write/update operations so fast. At first we only write to memory and persistence to disk will happen asynchronously with a short lag. The trade-off is that we have a short window where the new data can be lost forever.
The “w” parameter can also have a value of 0 which means a so-called unacknowledged write. That’s when we don’t even wait for the initial memory-based acknowledgement. That’s the fastest fire-and-forget option.
Read preferences enter the picture when we build database clusters. A cluster consists of at least 3 database servers – nodes – where one server is the primary node and the other are secondary nodes. The read preference means which node in the cluster you’d like to read the data from: the primary or one of the secondaries. We can remove some load from the primary node, which is by default hit by all write operations, if we read from the secondary nodes. However, since there’s a short delay of data propagation from the primary to the secondary nodes, reading from one of the secondaries may return stale data. At times it’s not a major concern. E.g. it’s probably not a big deal if a comment appears under a blog post with some delay. However, we almost always want to ensure that user data from the users collection is always up to date.
We also briefly looked at some tools built into the MongoDb client that help us with database profiling:
- Logging of slow queries
That’s quite a lot of material to go through and we’ve still only covered the basics. Working with MongoDb in a real-life project will of course be more challenging. However, that’s almost always the case when you read a textbook on some technology with neat test cases and you try to implement what you learnt into a real project.
I hope this series will help you get started with using MongoDb in your next .NET project.
You can view all posts related to data storage on this blog here.