Introduction to MongoDb with .NET part 3: document insertions
March 21, 2016 Leave a comment
Introduction
In the previous post we successfully installed the latest version of MongoDb on Windows. I think you’ll agree that it was a very simple and painless process. We started exploring the two most important tools of the MongoDb installation folder. Mongo.exe starts the client with which you can interface with the database using commands and queries. Mongod.exe in turn starts the database. We saw a couple of command and query examples such as inserting a new record, searching for one and also deleting one. The default query language of MongoDb is JavaScript and most parameters to the query functions will be in JSON. A JSON query parameter is in fact also a document.
In this post we’ll take a closer look at insertions. Don’t forget to start both mongo.exe and mongod.exe in two separate command prompts if you want to try the examples yourself.
Inserting documents
Inserting documents corresponds approximately to INSERT statements in standard SQL. We’ve already seen an example of insertions in the previous post and there’s really nothing magic about it. We call the JavaScript command “db.[collectionName].insert([some valid JSON])” and that’s it. The JSON we pass in is a document that is added to a collection. We know by now that a collection can be a heterogenous set of documents with varying JSON. Therefore it’s difficult to find the exact counterpart of a collection in a relational database, but a table is an acceptable answer. Each record in a table then roughly corresponds to a document.
The “db” keyword will automatically route the request to the currently selected database, hence we don’t need to provide the database name in JavaScript commands. This is analogous to writing e.g. “USE [databaseName]” in SQL after which each statement will be executed against that database unless it’s overridden.
It’s a requirement that all documents in a MongoDb collection have a unique ID which will also serves as the primary and immutable key. It’s immutable since once we have set it for a document it cannot be updated. By default the ID field is called “_id”. If you provide no value for it then MongoDb will auto-assign an ID of type ObjectId to it. ObjectId is part of the BSON specification. MongoDb provides more details on this type available on this page. In essence an ObjectId has the same function as a GUID, i.e. a globally unique identifier.
Let’s test if we van provide an ID ourselves. I execute the following commands in a mongo client shell. Press Enter after each line:
use loadtest db.products.insert({"_id" : 1, "name" : "radio", "stock_level" : 100})
The server responds with…
WriteResult({ "nInserted" : 1 })
…i.e. one document was inserted into the products collection.
Next I’ll try to add one more product with the same ID:
db.products.insert({"_id" : 1, "name" : "tv", "stock_level" : 50})
We get a big NOPE from the database:
WriteResult({ "nInserted" : 0, "writeError" : { "code" : 11000, "errmsg" : "E11000 duplicate key error collection: loadtest.products index: _id_ dup key: { : 1.0 }" } })
“duplicate key error” tells us exactly what went wrong.
The example also shows that there’s no auto-incremented integer ID for documents in MongoDb, at least there’s nothing readily available like in MS SQL. If you want to achieve something similar then you’ll need to do implement it in your .NET logic. Note that getting the number of documents in a collection and increment that number by one is a risky strategy due to multithreading. Say this number is 20 at the moment you run the first query, i.e. “select the count of documents in the products collection”. Then another user adds a product just before you do and both of you will provide the same ID, i.e. 21. Either of the clients will get the exception above. You can then keep incrementing the ID field until your query succeeds but that’s not a very “modern” approach. If you as the reader have already found a good solution to this problem then you’re welcome to present it in the comments section.
Another popular identifier type is the GUID or UUID, such as the following:
434b1534-fc4d-47aa-9406-6d48cae713d6
f847a7d8-5e7f-41d7-a01b-85ee3f2776e8
f96589ff-cac7-4fe0-a486-12c7697de069
These are represented by the “uniqueidentifier” type in MS SQL. As it turns out GUIDs are not easy to work with in MongoDb and are best stored as strings. Here’s a thread on StackOverflow that describes the problem:
“Working with GUIDs has a few pitfalls, mostly related to how to work with the binary representation in the mongo shell and also to historical accidents which resulted in different drivers storing GUIDs using different byte orders.”
There’s no native GUID generator function in MongoDb JavaScript, so we cannot just type something like “Guid.NewGuid()” in C# unfortunately. MongoDb has a UUID function but it requires an input string.
Anyway, let’s see if we can insert an object with a string GUID:
db.products.insert({"_id" : "434b1534-fc4d-47aa-9406-6d48cae713d6", "name" : "tv", "stock_level" : 50})
WriteResult({ "nInserted" : 1 })
Let’s try another one:
db.products.insert({"_id" : "f96589ff-cac7-4fe0-a486-12c7697de069", "name" : "computer", "stock_level" : 80})
Yes, that seems to work fine.
The above SO thread says the following about GUIDs stored as strings:
“As far as storing your GUIDs as strings, that’s not an unheard of thing to do and it definitely makes viewing and querying the data in the mongo shell easier and avoids all the issues with different byte orders. The only disadvantage is that it uses more space (roughly double).”
We can therefore conclude that there’s a viable workaround for GUIDs at least.
It’s also an option to adopt the MongoDb ID style in our own C# models as well. After all the ObjectId is very much a globally unique identifier, like the 3 GUIDs presented above, they simply look different and are generated from other inputs, such as the current time in milliseconds. However, that has its own drawbacks. I don’t want to get on that track right now. We’ll return to the topic of IDs later on when we start writing some C# code. The ID of a domain object is its most important property and some developers tend to neglect it. Well, not you and me of course, but those other, very careless developers, right?
Size limits
We said before that we can easily insert JSON documents that represent our object relationships. E.g. Orders have OrderItems, and Authors have Books, and Cars have Tires etc. Those dependent objects can be saved as inline documents within the same JSON in an array, e.g. the menu-item example taken from this web site:
{"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } }}
The menu items are subdocuments of menu arranged in an array. This is a very small document but real-life documents can grow very large this way. Imagine trying to put all customers of Amazon in a single array like that. Currently the maximum size of a single document in MongoDb is 16MB. It’s quite a lot fora single document but Amazon would fill it in a nanosecond.
This is an additional consideration you’ll need to go through when designing your documents. Should we embed the dependent objects in the same “mother” document or not? A general guideline is if you want to easily access the dependent objects and update them then place them in a separate collection, i.e. separate documents. It’s very likely that you’ll need to access and update the order lines of an order, i.e. you’ll need an Order collection and an OrderLine collection and connect the order lines to the orders by some secondary key. Alternatively you can store the IDs of the order lines within an array of an Order and create the connection there. There’s nothing like “cascade delete” in MongoDb to ensure that there are no orphans so you’ll need to implement any related updates and deletes in your client code.
We’ll return to this topic later on, I just wanted to give you a heads-up. As you see NoSql is an exciting technology but obviously not everything is rosy and painless in that world either.
In the next post we’ll start looking into querying.
You can view all posts related to data storage on this blog here.