Introduction to MongoDb with .NET part 34: indexing subdocuments

Introduction

In the previous post we looked at how to index arrays in MongoDb using the client shell. We saw that there wasn’t anything special in the JSON syntax, it is the same as when adding an index to a scalar field. There’s a limitation to indexing array fields though. We cannot index two or more array fields, only one at a time. It’s fine to index as many scalar fields as you want plus one array field, but not more than that.

In this post we’ll see how to index fields within subdocuments.

Deep indexing

Recall that we have a restaurant collection with the following structure:

{
        "_id" : ObjectId("56edc2ff03a1cd840734dba8"),
        "address" : {
                "building" : "2780",
                "coord" : [
                        -73.98241999999999,
                        40.579505
                ],
                "street" : "Stillwell Avenue",
                "zipcode" : "11224"
        },
        "borough" : "Brooklyn",
        "cuisine" : "American ",
        "grades" : [
                {
                        "date" : ISODate("2014-06-10T00:00:00Z"),
                        "grade" : "A",
                        "score" : 5
                },
                {
                        "date" : ISODate("2013-06-05T00:00:00Z"),
                        "grade" : "A",
                        "score" : 7
                },
                {
                        "date" : ISODate("2012-04-13T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                },
                {
                        "date" : ISODate("2011-10-12T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                }
        ],
        "name" : "Riviera Caterer",
        "restaurant_id" : "40356018"
}

Let’s say that we want to find every restaurant which has at least one C grade. We can do that using the dot notation we saw before when we queried subdocuments:

db.restaurants.find({"grades.grade" : "C"})

The count() function will tell us that there are 2709 such documents.

Building an index on a field within a subdocument also follows the dot notation:

db.restaurants.createIndex({"grades.grade" : 1})

Let’s run the explain command to check whether our new index is in use:

db.restaurants.explain(true).find({"grades.grade" : "C"})

Indeed it is used for the query execution:

"inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 2709,
                                
                                "keyPattern" : {
                                        "grades.grade" : 1
                                },
                                "indexName" : "grades.grade_1",
                                "isMultiKey" : true,

2709 documents were returned from the index scan and that is also the total number of documents scanned:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2709,
                "totalDocsExamined" : 2709,

That was easy, right?

Before we continue I want to mention two things.

The inputStage fragment above shows a property called isMultiKey that I haven’t explained yet. I should have done so in the previous post on array indexing but I missed it. An array index is called a multi-key index in MongoDb. Our subdocument in the above case is also an array of scores hence the index called “grades.grade_1” is a multi-key index. The query will show isMultiKey : true in that case, otherwise it will be false.

Another thing we can mention here is how to get the list of indexes of a collection. The getIndexes function helps us achieve just that:

db.restaurants.getIndexes()

The above query returns 3 indexes: the default index on the _id field, then the one we’ve just created and one more that we created earlier in this series which was a compound index on the borough and cuisine fields:

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "model.restaurants"
        },
        {
                "v" : 1,
                "key" : {
                        "borough" : 1,
                        "cuisine" : 1
                },
                "name" : "borough_1_cuisine_1",
                "ns" : "model.restaurants"
        },
        {
                "v" : 1,
                "key" : {
                        "grades.grade" : 1
                },
                "name" : "grades.grade_1",
                "ns" : "model.restaurants"
        }
]

OK, let’s now look at a somewhat more complex query. We want to extract all restaurants that have at least one B grade with a score of 19 like here:

{
                        "date" : ISODate("2013-10-09T00:00:00Z"),
                        "grade" : "B",
                        "score" : 19
}

We saw the $elemMatch operator before and we need to use it again:

db.restaurants.find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})

The count function tells us that there are 1067 restaurants that match this criteria.

Let’s see what the explain function gives us:

db.restaurants.explain(true).find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})

There are two stages. The first stage is the index scan using the index we’ve created. It returns 8283 documents which correspond to the restaurants having at least one B grade:

"inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 8283                                
                                "keyPattern" : {
                                        "grades.grade" : 1
                                },
                                "indexName" : "grades.grade_1",
                                "isMultiKey" : true,

The next stage examines those documents and returns 1067 of them. So 1067 of the 8283 have at least one B grade along with a score of 19:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1067,
                "executionTimeMillis" : 32,
                "totalKeysExamined" : 8283,
                "totalDocsExamined" : 8283

Let’s delete the above index and replace it with a compound index on the grade and score fields of the grades subdocument:

db.restaurants.dropIndex({"grades.grade" : 1})
db.restaurants.createIndex({"grades.grade" : 1, "grades.score" : 1})

Running the same explain command as above…

db.restaurants.explain(true).find({"grades" : {$elemMatch : {"grade" : "B", "score" : 19}}})

…will indeed show us that only 1067 documents were scanned in total, i.e. the compound index covered the total scan just fine:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1067,
                "executionTimeMillis" : 9,
                "totalKeysExamined" : 1067,
                "totalDocsExamined" : 1067,

We’ll look at some index options in the next post.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

My goal with this blog is to offend everyone in the world at least once with my words… so no one has a reason to have a heightened sense of themselves. We are all ignorant, we are all found wanting, we are all bad people sometimes.

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: