Introduction to MongoDb with .NET part 36: other bits and pieces about the query plan

Introduction

In the previous post we looked at a couple of options we can provide when creating an index. With a unique index we can ensure that only unique values can be inserted on a property. A sparse index is often used in conjunction with a unique index. It enables us to index fields that not every document has, i.e. where the field is null. In that case those documents won’t be indexed. Finally we looked at foreground vs. background index creation. If an index is created on a foreground thread then it is relatively fast but it also blocks all readers and writers in the database. Background index creation on the other hand is slower but doesn’t block other threads that want to read and write.

In this post we’ll look at a couple of things related to the explain function.

Query plan modes

We’ve seen the query planner in action several times now. I hope you’re starting to understand how to read the returned values. If there are multiple steps required to execute a query then we start reading the output from the innermost document which describes the first stage, e.g. an index scan. Then the we follow the stages outwards and we’ll see that the number of documents returned and scanned decreases.

Let’s look at an example. We created a people creation a couple of times in this series but we’ll create one again:

use model
db.people.drop()
db.people.insert({"name" : "John", "age" : 34, "sports" : ["football", "badminton", "squash"], "grades" : [50, 60,45, 43]})
db.people.insert({"name" : "Mary", "age" : 28, "sports" : ["cricket", "badminton"], "grades" : [90, 60, 53]})
db.people.insert({"name" : "William", "age" : 29, "sports" : ["tennis", "basketball", "squash"], "grades" : [30, 40,64, 90, 78]})
db.people.insert({"name" : "Bridget", "age" : 45, "sports" : ["baseball", "athletics", "running"], "grades" : [50, 90,95, 40]})
db.people.insert({"name" : "Susan", "age" : 40, "sports" : ["football", "boxing", "karate"], "grades" : [70, 80,78]})
db.people.insert({"name" : "John", "age" : 39, "sports" : ["cyckling", "swimming", "basketball"], "grades" : [10, 90,60, 50]})
db.people.insert({"name" : "John", "age" : 31, "sports" : ["handball", "skating", "baseball"], "grades" : [55, 65]})
db.people.insert({"name" : "Joanna", "age" : 28, "sports" : ["judo", "karate", "running"], "grades" : [88, 77,66, 55]})
db.people.insert({"name" : "Christina", "age" : 27, "sports" : ["running", "swimming", "badminton"], "grades" : [51, 61,46, 44]})
db.people.insert({"name" : "Homer", "age" : 48, "sports" : ["football", "skating", "boxing"], "grades" : [45, 47,90, 99]})

Let’s create an index on the name property:

db.people.createIndex({"name" : 1})

We’ll now search for everyone with the name John aged over and above 32 years. We’ll also run the explain function in its default mode, i.e. without any parameters. That basic mode is called the “query planner” mode:

db.people.explain().find({"name" : "John", "age" : {$gt : 32}})

This basic mode doesn’t give us too much information. We know at least that our name index was used in the first stage. The second stage called FETCH had to perform another scan on the age field. Check the “filter” property to verify that:

"winningPlan" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "age" : {
                                        "$gt" : 32
                                }
                        },
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "name" : 1
                                },
                                "indexName" : "name_1",
                                "isMultiKey" : false,
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 1,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "name" : [
                                                "[\"John\", \"John\"]"
                                        ]
                                }
                        }
                }

The second level of granularity is called the execution stats mode. We’ve seen its output when we passed in a true boolean parameter. We can also invoke it like this:

db.people.explain("executionStats").find({"name" : "John", "age" : {$gt : 32}})

This level of output tells us how many documents were scanned at which stage. The index scan returns 3 documents, i.e. the ones covered by the name index as there are 3 documents with the name “John”. Then the fetch stage checks those 3 documents and runs a query on the age field. 2 out of those initial 3 are turned by the fetch stage. Here are the relevant parts:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 2,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 3,
                "totalDocsExamined" : 3,
                "executionStages" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "age" : {
                                        "$gt" : 32
                                }
                        },
                        "nReturned" : 2,

"inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 3,

The third level of granularity is called the all plans execution mode. It is useful in case there are initially multiple possible query plans to get to the result set. In that case there will be one winning plan and one or more rejected plans. MongoDb runs these options in parallel and decides which plan will extract the results the fastest.

To generate some rejected plans let’s insert an index on the sports array:

db.people.createIndex({"sports" : 1})

Let’s now run this complete query planner with a modified query:

db.people.explain("allPlansExecution").find({"name" : "John", "age" : {$gt : 32}, "sports" : "swimming"})

There will be a long document with a winning plan and there should be 2 rejected plans as well. The winning plan uses the sports array index. The first rejected index uses the name index instead. The second rejected plan performs 2 index searches in a stage called “AND_SORTED”.

The execution stats of the winning plan show that the sports index scan returns 2 documents which correspond to 2 documents where the sports array includes “swimming”, a John and a Christina. Then the fetch stage checks these 2 documents and returns 1 of them:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1,
                "executionTimeMillis" : 2,
                "totalKeysExamined" : 2,
                "totalDocsExamined" : 2,


"inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 2,

Why were the other 2 plans rejected? The most important property here is the one called totalKeysExamined. This is the number of index entries that had to be checked in order to execute the query. The sports index plan had to examine only 2 keys. The section in the output called “allPlansExecution” shows the number of keys examined in the rejected plans:

"allPlansExecution" : [
                        {
                                "nReturned" : 1,
                                "executionTimeMillisEstimate" : 0,
                                "totalKeysExamined" : 3,
                                "totalDocsExamined" : 3,

 {
                                "nReturned" : 1,
                                "executionTimeMillisEstimate" : 0,
                                "totalKeysExamined" : 3,
                                "totalDocsExamined" : 1,

The top section shows the name index plan, the bottom section shows the path with 2 index searches. Both had to check 3 index keys compared to the 2 in the winning plan. MongoDb then decided that the sports index path was the fastest and rejected the other two.

One last thing we can look at here is related to the number of scanned documents to provide the result set. Running the name query with the explain function…

db.people.explain("executionStats").find({"name" : "John"})

…will tell us that 3 documents were scanned in total:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 3,
                 "totalDocsExamined" : 3,

The index scan returned 3 keys and MongoDb examined the three documents that the index keys pointed at. Without the index MongoDb should have examined all 10 documents in the collection so it’s a great improvement. However, there are cases where we can do even more. That’s when the index itself covers the entire query.

In the above query MongoDb had to examine those 3 documents in the collection because we didn’t specify any filters for the properties to be returned. MongoDb had to collect the name, age, sports, scores and _id fields from the documents.

Say we only want to see the name field in the result set. We’ll modify our search and will extract those documents where the name field includes an “r”:

db.people.find({"name" : /r/}, {"_id" : 0, "name" : 1})

We get 4 documents:

{ "name" : "Bridget" }
{ "name" : "Christina" }
{ "name" : "Homer" }
{ "name" : "Mary" }

Let’s see what the query planner tells us:

db.people.explain("executionStats").find({"name" : /r/}, {"_id" : 0, "name" : 1})

We can see that MongoDb didn’t need to examine a single document. The name index was enough to cover the query by itself since we only wanted to look at the name field:

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 4,
                "executionTimeMillis" : 0,
                "totalKeysExamined" : 10,
                "totalDocsExamined" : 0,

So it can be really worth checking which properties you need. It can be so that the index will cover the query entirely which is the absolute fastest way to retrieve the result set. Those queries are called covered queries in MongoDb.

Let’s finish this post with a tip on how to extract the total index size of a collection using the totalIndexSize() function:

db.people.totalIndexSize()

It gives 49152 bytes. If you want to see the details then recall the stats function:

db.people.stats()

…which has a section on index sizes:

"nindexes" : 3,
        "totalIndexSize" : 49152,
        "indexSizes" : {
                "_id_" : 16384,
                "name_1" : 16384,
                "sports_1" : 16384
        },

We’ll look at text indexes in the next post.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

iReadable { }

.NET Tips & Tricks

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: