Introduction to MongoDb with .NET part 42: the read preference
June 14, 2016 Leave a comment
In the previous post we first discussed briefly what a replica set in MongoDb is. A replica set is a group of MongoDb servers that behave as a single unit in order to provide increased data availability. There is one primary node and 2 or more secondary nodes. The “w” part of write concern in a replica set can be set to the number of nodes which should all send an acknowledgement of the write/update operation. It can also be set to “majority” where we want an acknowledgement from the majority of the replica set nodes. We can also specify a tag value for “w” so that we wait for the servers with a specific tag to acknowledge the operation.
We eventually want to read from our database as well. An interesting question is where we want to read the data from in a replica set. This is where the read preference enters the picture. There’s also a related term called read concern which we’ll also go through in some details.
With multiple database instances with eventually identical data sets we can specify which node we’d like to read from. To be exact we don’t provide an exact server name or IP address for the node, we rather give a hint to MongoDb where to look for the data.
By default all our reads are directed to the primary node. That’s usually exactly what we want since all our reads will definitely go to the primary node at first. As we said in the previous post there’s some lag between the data entry in the primary node and the data propagation to the secondary nodes. Hence it’s always the safest to read from the primary node if you want to guarantee that the search operation retrieves what you wrote to the database before. This read preference mode is simply called primary.
There are other modes though that can be interesting. If your goal is to spread out your reads across the secondary nodes and you don’t care about the data propagation lag then the read mode can be set to secondary. Since the data is identical across all MongoDb servers once the data propagation has finished we call this scenario eventual consistency. There are times where it’s acceptable to occasionally read stale data. E.g. comments to a blog post don’t need to show up in the very moment they are saved on the primary server, it’s fine to wait a couple of seconds. On the other hand there are scenarios where you absolutely want to read the latest available data set, e.g. in case of the user session properties after a log in. If you read stale session data that could lead to subtle bugs and inconsistent application behaviour.
Then there are two modes somewhere between primary and secondary:
- primaryPreferred: we prefer to read from the primary node, but it’s OK to read from one of the secondary nodes as well
- secondaryPreferred: this is the opposite of the above, i.e. normally we want to read from a secondary node, but it’s OK if the result comes from the primary
There’s one more read mode called nearest. MongoDb keeps track of which node has the shortest latency and the data will be returned from that node. We don’t know whether that node is primary or secondary so we’re not guaranteed to get the latest available data set.
Here’s how we can set the read preference in the Mongo shell:
The read concern is similar to the read preference. It can have two values: “local”, which is the default and “majority”. With the majority read concern we indicate that we only want to read data that has propagated to a majority of nodes. The “local” option returns “the most recent data available to the MongoDB instance at the time of the query, even if the data has not been persisted to a majority of replica set members and may be rolled back.”
The read concern can be specified at multiple levels, e.g. directly in a find command:
The above won’t work unless the MongoDb server was started with the –enableMajorityReadConcern option.
Read the next part here.
You can view all posts related to data storage on this blog here.