Introduction to MongoDb with .NET part 34: indexing subdocuments
May 25, 2016 Leave a comment
Introduction
In the previous post we looked at how to index arrays in MongoDb using the client shell. We saw that there wasn’t anything special in the JSON syntax, it is the same as when adding an index to a scalar field. There’s a limitation to indexing array fields though. We cannot index two or more array fields, only one at a time. It’s fine to index as many scalar fields as you want plus one array field, but not more than that.
In this post we’ll see how to index fields within subdocuments.
Deep indexing
Recall that we have a restaurant collection with the following structure:
{ "_id" : ObjectId("56edc2ff03a1cd840734dba8"), "address" : { "building" : "2780", "coord" : [ -73.98241999999999, 40.579505 ], "street" : "Stillwell Avenue", "zipcode" : "11224" }, "borough" : "Brooklyn", "cuisine" : "American ", "grades" : [ { "date" : ISODate("2014-06-10T00:00:00Z"), "grade" : "A", "score" : 5 }, { "date" : ISODate("2013-06-05T00:00:00Z"), "grade" : "A", "score" : 7 }, { "date" : ISODate("2012-04-13T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2011-10-12T00:00:00Z"), "grade" : "A", "score" : 12 } ], "name" : "Riviera Caterer", "restaurant_id" : "40356018" }
Let’s say that we want to find every restaurant which has at least one C grade. We can do that using the dot notation we saw before when we queried subdocuments:
db.restaurants.find({"grades.grade" : "C"})
The count() function will tell us that there are 2709 such documents.
Building an index on a field within a subdocument also follows the dot notation:
db.restaurants.createIndex({"grades.grade" : 1})
Let’s run the explain command to check whether our new index is in use:
db.restaurants.explain(true).find({"grades.grade" : "C"})
Indeed it is used for the query execution:
"inputStage" : { "stage" : "IXSCAN", "nReturned" : 2709, "keyPattern" : { "grades.grade" : 1 }, "indexName" : "grades.grade_1", "isMultiKey" : true,
2709 documents were returned from the index scan and that is also the total number of documents scanned:
"executionStats" : { "executionSuccess" : true, "nReturned" : 2709, "totalDocsExamined" : 2709,
That was easy, right?
Before we continue I want to mention two things.
The inputStage fragment above shows a property called isMultiKey that I haven’t explained yet. I should have done so in the previous post on array indexing but I missed it. An array index is called a multi-key index in MongoDb. Our subdocument in the above case is also an array of scores hence the index called “grades.grade_1” is a multi-key index. The query will show isMultiKey : true in that case, otherwise it will be false.
Another thing we can mention here is how to get the list of indexes of a collection. The getIndexes function helps us achieve just that:
db.restaurants.getIndexes()
The above query returns 3 indexes: the default index on the _id field, then the one we’ve just created and one more that we created earlier in this series which was a compound index on the borough and cuisine fields:
[ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "model.restaurants" }, { "v" : 1, "key" : { "borough" : 1, "cuisine" : 1 }, "name" : "borough_1_cuisine_1", "ns" : "model.restaurants" }, { "v" : 1, "key" : { "grades.grade" : 1 }, "name" : "grades.grade_1", "ns" : "model.restaurants" } ]
OK, let’s now look at a somewhat more complex query. We want to extract all restaurants that have at least one B grade with a score of 19 like here:
{ "date" : ISODate("2013-10-09T00:00:00Z"), "grade" : "B", "score" : 19 }
We saw the $elemMatch operator before and we need to use it again:
db.restaurants.find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})
The count function tells us that there are 1067 restaurants that match this criteria.
Let’s see what the explain function gives us:
db.restaurants.explain(true).find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})
There are two stages. The first stage is the index scan using the index we’ve created. It returns 8283 documents which correspond to the restaurants having at least one B grade:
"inputStage" : { "stage" : "IXSCAN", "nReturned" : 8283 "keyPattern" : { "grades.grade" : 1 }, "indexName" : "grades.grade_1", "isMultiKey" : true,
The next stage examines those documents and returns 1067 of them. So 1067 of the 8283 have at least one B grade along with a score of 19:
"executionStats" : { "executionSuccess" : true, "nReturned" : 1067, "executionTimeMillis" : 32, "totalKeysExamined" : 8283, "totalDocsExamined" : 8283
Let’s delete the above index and replace it with a compound index on the grade and score fields of the grades subdocument:
db.restaurants.dropIndex({"grades.grade" : 1}) db.restaurants.createIndex({"grades.grade" : 1, "grades.score" : 1})
Running the same explain command as above…
db.restaurants.explain(true).find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})
…will indeed show us that only 1067 documents were scanned in total, i.e. the compound index covered the total scan just fine:
"executionStats" : { "executionSuccess" : true, "nReturned" : 1067, "executionTimeMillis" : 9, "totalKeysExamined" : 1067, "totalDocsExamined" : 1067,
We’ll look at some index options in the next post.
You can view all posts related to data storage on this blog here.