Introduction to MongoDb with .NET part 31: the query plan and our first index

Introduction

In the previous post we introduced a new sub-section in this series, namely how indexes work in MongoDb. We went through a general introduction of the topic without actually creating an index. We also looked at the stats() function that can be called on a collection to find some basic statistics such as the number of documents and index sizes. In addition we learnt a little bit about what a storage engine is and that the default storage engine that ships with MongoDb 3.2.4 is called WiredTiger.

In this post we’ll start creating indexes in the Mongo shell. Have both the mongo client and the mongo server ready in two different console windows. Remember, it’s the “mongo” and “mongod” commands.

The query plan

We’ll turn our attention to the demo zip codes database. Here’s a single document as a reminder:

{
        "_id" : "01001",
        "city" : "AGAWAM",
        "loc" : [
                -72.622739,
                42.070206
        ],
        "pop" : 15338,
        "state" : "MA"
}

We can think of multiple common queries against this collection but we’ll start with a simple one. We want to filter on the state field, e.g. we want to find all places in California, i.e. “CA”:

db.zipcodes.find({"state" : "CA"})

There’s a very useful function called “explain” that can be attached to collections. It returns the query plan of the query, i.e. some information about how the MongoDb would execute the query. The explain function is followed by the query we want to analyse. Here’s an example to retrieve the query plan for the above search:

db.zipcodes.explain().find({"state" : "CA"})

It returns a JSON document with a sub-document called “queryPlanner”. I got the following result:

"queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "model.zipcodes",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "state" : {
                                "$eq" : "CA"
                        }
                },
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "state" : {
                                        "$eq" : "CA"
                                }
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        }

The most important bit of information for us right now is the “winningPlan” section. The “stage” value of COLLSCAN means a collection scan. A collection scan is where MongoDb will have to scan every single document in the collection. To prove it we can run the explain command with a true input parameter which will also carry out the query. explain() in itself won’t actually execute the command to be analysed.

This version of explain returns a longer JSON document:

db.zipcodes.explain(true).find({"state" : "CA"})

The returned document includes everything from the previous output plus a section called “executionStats”:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1516,
                "executionTimeMillis" : 13,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 29353,
                "executionStages" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "state" : {
                                        "$eq" : "CA"
                                }
                        },
                        "nReturned" : 1516,
                        "executionTimeMillisEstimate" : 10,
                        "works" : 29355,
                        "advanced" : 1516,
                        "needTime" : 27838,
                        "needYield" : 0,
                        "saveState" : 229,
                        "restoreState" : 229,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 29353
                },
                "allPlansExecution" : [ ]
        }

We can concentrate on the following entries for the time being:

  • nReturned: the number of documents returned by the query
  • executionTimeMillis: the time it took to execute the query in milliseconds
  • totalDocsExamined: the number of documents that had to be scanned in order to complete the query. It is currently equivalent to the total number of documents in the zip codes collection which proves that MongoDb had to consider all documents

Create an index on the state property

Indexes are created using the createIndex function. The first argument is a JSON document describing the field which should be indexed and whether it’s in ascending or descending order. Index ordering matters most often if you’d like to sort the results in a certain order. For searches like in the above case the index ordering usually doesn’t make any difference. However, if you often search on two fields, e.g. borough and cuisine in the demo restaurants collection and you always sort the results by cuisine in a descending order then the corresponding compound index should reflect that. We’ll come back to this point later on in this series.

Let’s create the index on the state field and also give it a name:

db.zipcodes.createIndex({"state" : 1}, {"name" : "state-asc"})

The first argument describes the fields and the sort order of the index where 1 means an ascending and -1 a descending order. The second argument is optional where we can specify a range of optional arguments like the name of the index. We’ll come back to some of these options later on.

The response tells us that the number of indexes has increased from 1 to 2:

{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

The stats() function will also show the index we’ve just created.

Let’s check if the index makes any difference in the query plan. The explain function…

db.zipcodes.explain(true).find({"state" : "CA"})

…shows us a couple of significant improvements:

"inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "state" : 1
                                },
                                "indexName" : "state-asc",
                                [some ignored fields]
                                "indexBounds" : {
                                        "state" : [
                                                "[\"CA\", \"CA\"]"
                                        ]
                                }
                        }
...

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1516,
                "executionTimeMillis" : 13,
                "totalKeysExamined" : 1516,
                "totalDocsExamined" : 1516,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 1516,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 1517,
                        "advanced" : 1516,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 11,
                        "restoreState" : 11,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 1516,
                        "alreadyHasObj" : 0,
  • The “winningPlan” sub-document now shows a stage type of IXSCAN which stands for index scan. The document also shows the name of the index used for the scan
  • An interesting field here is “indexBounds” which shows the range of the indexes checked. We’re only filtering on a single value so the range starts and ends with CA
  • The execution time of 13ms hasn’t changed in this example but it’s due to the zip codes collection being very small. Indexes show their power with large collections of millions of documents
  • totalDocsExamined: 1516 instead of the total collection size 29353 we had before. This means that only 1516 documents had to be scanned instead of the whole collection. 1516 is also the number of documents returned i.e. we have a 1:1 match between the scanned and returned documents
  • totalKeysExamined: 1516 index keys were examined instead of 0

This is often a good strategy when building indexes. Run the explain command before and after the index creation, preferably on an alpha database instead of messing with the production database. The goal is to reduce the number of documents scanned to get the result set. The best is when the number of scanned documents is equal to the number of returned documents.

We’ll continue this discussion in the next post.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: