← Introduction to MongoDb with .NET part 30: starting with indexing

Introduction to Amazon Code Pipeline with Java part 10: the client web pages →

Introduction to MongoDb with .NET part 31: the query plan and our first index

May 20, 2016 Leave a comment

Introduction

In the previous post we introduced a new sub-section in this series, namely how indexes work in MongoDb. We went through a general introduction of the topic without actually creating an index. We also looked at the stats() function that can be called on a collection to find some basic statistics such as the number of documents and index sizes. In addition we learnt a little bit about what a storage engine is and that the default storage engine that ships with MongoDb 3.2.4 is called WiredTiger.

In this post we’ll start creating indexes in the Mongo shell. Have both the mongo client and the mongo server ready in two different console windows. Remember, it’s the “mongo” and “mongod” commands.

The query plan

We’ll turn our attention to the demo zip codes database. Here’s a single document as a reminder:

{
        "_id" : "01001",
        "city" : "AGAWAM",
        "loc" : [
                -72.622739,
                42.070206
        ],
        "pop" : 15338,
        "state" : "MA"
}

We can think of multiple common queries against this collection but we’ll start with a simple one. We want to filter on the state field, e.g. we want to find all places in California, i.e. “CA”:

db.zipcodes.find({"state" : "CA"})

There’s a very useful function called “explain” that can be attached to collections. It returns the query plan of the query, i.e. some information about how the MongoDb would execute the query. The explain function is followed by the query we want to analyse. Here’s an example to retrieve the query plan for the above search:

db.zipcodes.explain().find({"state" : "CA"})

It returns a JSON document with a sub-document called “queryPlanner”. I got the following result:

"queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "model.zipcodes",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "state" : {
                                "$eq" : "CA"
                        }
                },
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "state" : {
                                        "$eq" : "CA"
                                }
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        }

The most important bit of information for us right now is the “winningPlan” section. The “stage” value of COLLSCAN means a collection scan. A collection scan is where MongoDb will have to scan every single document in the collection. To prove it we can run the explain command with a true input parameter which will also carry out the query. explain() in itself won’t actually execute the command to be analysed.

This version of explain returns a longer JSON document:

db.zipcodes.explain(true).find({"state" : "CA"})

The returned document includes everything from the previous output plus a section called “executionStats”:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1516,
                "executionTimeMillis" : 13,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 29353,
                "executionStages" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "state" : {
                                        "$eq" : "CA"
                                }
                        },
                        "nReturned" : 1516,
                        "executionTimeMillisEstimate" : 10,
                        "works" : 29355,
                        "advanced" : 1516,
                        "needTime" : 27838,
                        "needYield" : 0,
                        "saveState" : 229,
                        "restoreState" : 229,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 29353
                },
                "allPlansExecution" : [ ]
        }

We can concentrate on the following entries for the time being:

nReturned: the number of documents returned by the query
executionTimeMillis: the time it took to execute the query in milliseconds
totalDocsExamined: the number of documents that had to be scanned in order to complete the query. It is currently equivalent to the total number of documents in the zip codes collection which proves that MongoDb had to consider all documents

Create an index on the state property

Indexes are created using the createIndex function. The first argument is a JSON document describing the field which should be indexed and whether it’s in ascending or descending order. Index ordering matters most often if you’d like to sort the results in a certain order. For searches like in the above case the index ordering usually doesn’t make any difference. However, if you often search on two fields, e.g. borough and cuisine in the demo restaurants collection and you always sort the results by cuisine in a descending order then the corresponding compound index should reflect that. We’ll come back to this point later on in this series.

Let’s create the index on the state field and also give it a name:

db.zipcodes.createIndex({"state" : 1}, {"name" : "state-asc"})

The first argument describes the fields and the sort order of the index where 1 means an ascending and -1 a descending order. The second argument is optional where we can specify a range of optional arguments like the name of the index. We’ll come back to some of these options later on.

The response tells us that the number of indexes has increased from 1 to 2:

{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

The stats() function will also show the index we’ve just created.

Let’s check if the index makes any difference in the query plan. The explain function…

db.zipcodes.explain(true).find({"state" : "CA"})

…shows us a couple of significant improvements:

"inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "state" : 1
                                },
                                "indexName" : "state-asc",
                                [some ignored fields]
                                "indexBounds" : {
                                        "state" : [
                                                "[\"CA\", \"CA\"]"
                                        ]
                                }
                        }
...

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1516,
                "executionTimeMillis" : 13,
                "totalKeysExamined" : 1516,
                "totalDocsExamined" : 1516,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 1516,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 1517,
                        "advanced" : 1516,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 11,
                        "restoreState" : 11,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 1516,
                        "alreadyHasObj" : 0,

The “winningPlan” sub-document now shows a stage type of IXSCAN which stands for index scan. The document also shows the name of the index used for the scan
An interesting field here is “indexBounds” which shows the range of the indexes checked. We’re only filtering on a single value so the range starts and ends with CA
The execution time of 13ms hasn’t changed in this example but it’s due to the zip codes collection being very small. Indexes show their power with large collections of millions of documents
totalDocsExamined: 1516 instead of the total collection size 29353 we had before. This means that only 1516 documents had to be scanned instead of the whole collection. 1516 is also the number of documents returned i.e. we have a 1:1 match between the scanned and returned documents
totalKeysExamined: 1516 index keys were examined instead of 0

This is often a good strategy when building indexes. Run the explain command before and after the index creation, preferably on an alpha database instead of messing with the production database. The goal is to reduce the number of documents scanned to get the result set. The best is when the number of scanned documents is equal to the number of returned documents.

We’ll continue this discussion in the next post.

You can view all posts related to data storage on this blog here.

Filed under .NET, MongoDb Tagged with c#, mongodb

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Exercises in .NET with Andras Nemes

Introduction to MongoDb with .NET part 31: the query plan and our first index

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

Introduction to MongoDb with .NET part 31: the query plan and our first index

Share:

Related

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow