Introduction to MongoDb with .NET part 40: the write concern
June 8, 2016 Leave a comment
Introduction
In the previous post we saw how indexes were handled in the in the MongoDb .NET driver. Normally MongoDb indexes are created through the mongo shell, but the driver also offers a couple of options just in case you need that.
In this post we’ll start the finishing phase in this series on MongoDb and .NET: write and read concerns. These terms are related to durability and consistency of the data in the database.
The write concern
When we write to the database, i.e. insert or update a document, then the new or updated document is at first persisted to memory, not to disk. The records in memory are persisted to disk somewhat later in an asynchronous fashion. The lag is not dramatic, it can be a second, a little more, a little less, depending on how much data there is in memory. However, there is this lag and if the database server dies within that lag then the data not persisted to disk yet will be wiped out from memory. It won’t magically be recovered after a server restart.
By default when we send a write or an update to the server then we get an acknowledgement back saying whether the operation was successful or not. If the acknowledgement says OK, then it means that the new or updated document was persisted to memory. It is not a 100% guarantee that the new data was saved on disk as well. The write operation is abbreviated by “w” and has a default value of 1 which reflects the scenario we have just described, i.e. we want an acknowledgement of the persistence to memory.
The write parameter in inserts and updates is accompanied by another section in memory for MongoDb, called the journal. The journal is a log where MongoDb registers all insert, update and delete operations that were sent to it from the client. The journal is also a collection where the new documents are at first persisted in memory and then later on to disk with a small lag. The log becomes important when a collection needs to be recovered after a server crash. All documents that figure in the journal but not in the collection on disk will be inserted/updated according to the journal. However, if a document was not written to disk AND the related insert/update operation was not yet persisted to disk either then it’s lost forever. When we insert a new document to the database then we don’t wait for the related journal entry to be persisted to disk by default. The journal part of the persistence options is abbreviated by “j” and has the default value of false.
Together “w” and “j” make up the write concern and as we said above their values default to 1 and false. That’s what makes MongoDb write/update operations so fast. At first we only write to memory and persistence to disk will happen asynchronously with a short lag. The trade-off is that we have a short window where the new data can be lost forever.
There is of course an option to change that behaviour. If you want to make sure that the data is really persisted and logged to disk then the “j” value must be set to true in the insertion options as follows:
db.companies.insert({"name" : "Samsung", "phone" : 5345346}, {"writeConcern" : {"w" : 1, "j" : true}})
The writeConcern parameter is also a document.
Setting j equal to true makes the inserts a lot slower but there are times where it is extra important to be sure that the new data was persisted to disk. E.g. creating a new user is a good candidate.
The “w” parameter can also have a value of 0 – besides other values that we’ll take up in the next post – which means a so-called unacknowledged write. That’s when we don’t even wait for the initial memory-based acknowledgement. That’s the fastest fire-and-forget option of course:
db.companies.insert({"name" : "Opel", "phone" : 2446679809767}, {"writeConcern" : {"w" : 0}})
The server will respond with an empty write result array. Occasionally we may go for this unsafe option. I can give an example from our business where I work. The statistics of an ongoing load test are inserted into MongoDB periodically. The data is used to show the statistics graphically to the end users. The write option is set to 0 in order to make the insertion of a large amount of frequently incoming data as fast as possible. It’s fine if a batch is lost here and there though. The next round of batches will then succeed and the graphs can be redrawn then.
We’ll look at the write concern in a multi-database scenario in the next post.
You can view all posts related to data storage on this blog here.