Introduction to MongoDb with .NET part 36: other bits and pieces about the query plan
May 27, 2016 Leave a comment
Introduction
In the previous post we looked at a couple of options we can provide when creating an index. With a unique index we can ensure that only unique values can be inserted on a property. A sparse index is often used in conjunction with a unique index. It enables us to index fields that not every document has, i.e. where the field is null. In that case those documents won’t be indexed. Finally we looked at foreground vs. background index creation. If an index is created on a foreground thread then it is relatively fast but it also blocks all readers and writers in the database. Background index creation on the other hand is slower but doesn’t block other threads that want to read and write.
In this post we’ll look at a couple of things related to the explain function.
Query plan modes
We’ve seen the query planner in action several times now. I hope you’re starting to understand how to read the returned values. If there are multiple steps required to execute a query then we start reading the output from the innermost document which describes the first stage, e.g. an index scan. Then the we follow the stages outwards and we’ll see that the number of documents returned and scanned decreases.
Let’s look at an example. We created a people creation a couple of times in this series but we’ll create one again:
use model db.people.drop() db.people.insert({"name" : "John", "age" : 34, "sports" : ["football", "badminton", "squash"], "grades" : [50, 60,45, 43]}) db.people.insert({"name" : "Mary", "age" : 28, "sports" : ["cricket", "badminton"], "grades" : [90, 60, 53]}) db.people.insert({"name" : "William", "age" : 29, "sports" : ["tennis", "basketball", "squash"], "grades" : [30, 40,64, 90, 78]}) db.people.insert({"name" : "Bridget", "age" : 45, "sports" : ["baseball", "athletics", "running"], "grades" : [50, 90,95, 40]}) db.people.insert({"name" : "Susan", "age" : 40, "sports" : ["football", "boxing", "karate"], "grades" : [70, 80,78]}) db.people.insert({"name" : "John", "age" : 39, "sports" : ["cyckling", "swimming", "basketball"], "grades" : [10, 90,60, 50]}) db.people.insert({"name" : "John", "age" : 31, "sports" : ["handball", "skating", "baseball"], "grades" : [55, 65]}) db.people.insert({"name" : "Joanna", "age" : 28, "sports" : ["judo", "karate", "running"], "grades" : [88, 77,66, 55]}) db.people.insert({"name" : "Christina", "age" : 27, "sports" : ["running", "swimming", "badminton"], "grades" : [51, 61,46, 44]}) db.people.insert({"name" : "Homer", "age" : 48, "sports" : ["football", "skating", "boxing"], "grades" : [45, 47,90, 99]})
Let’s create an index on the name property:
db.people.createIndex({"name" : 1})
We’ll now search for everyone with the name John aged over and above 32 years. We’ll also run the explain function in its default mode, i.e. without any parameters. That basic mode is called the “query planner” mode:
db.people.explain().find({"name" : "John", "age" : {$gt : 32}})
This basic mode doesn’t give us too much information. We know at least that our name index was used in the first stage. The second stage called FETCH had to perform another scan on the age field. Check the “filter” property to verify that:
"winningPlan" : { "stage" : "FETCH", "filter" : { "age" : { "$gt" : 32 } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "name" : 1 }, "indexName" : "name_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "name" : [ "[\"John\", \"John\"]" ] } } }
The second level of granularity is called the execution stats mode. We’ve seen its output when we passed in a true boolean parameter. We can also invoke it like this:
db.people.explain("executionStats").find({"name" : "John", "age" : {$gt : 32}})
This level of output tells us how many documents were scanned at which stage. The index scan returns 3 documents, i.e. the ones covered by the name index as there are 3 documents with the name “John”. Then the fetch stage checks those 3 documents and runs a query on the age field. 2 out of those initial 3 are turned by the fetch stage. Here are the relevant parts:
"executionStats" : { "executionSuccess" : true, "nReturned" : 2, "executionTimeMillis" : 0, "totalKeysExamined" : 3, "totalDocsExamined" : 3, "executionStages" : { "stage" : "FETCH", "filter" : { "age" : { "$gt" : 32 } }, "nReturned" : 2, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 3,
The third level of granularity is called the all plans execution mode. It is useful in case there are initially multiple possible query plans to get to the result set. In that case there will be one winning plan and one or more rejected plans. MongoDb runs these options in parallel and decides which plan will extract the results the fastest.
To generate some rejected plans let’s insert an index on the sports array:
db.people.createIndex({"sports" : 1})
Let’s now run this complete query planner with a modified query:
db.people.explain("allPlansExecution").find({"name" : "John", "age" : {$gt : 32}, "sports" : "swimming"})
There will be a long document with a winning plan and there should be 2 rejected plans as well. The winning plan uses the sports array index. The first rejected index uses the name index instead. The second rejected plan performs 2 index searches in a stage called “AND_SORTED”.
The execution stats of the winning plan show that the sports index scan returns 2 documents which correspond to 2 documents where the sports array includes “swimming”, a John and a Christina. Then the fetch stage checks these 2 documents and returns 1 of them:
"executionStats" : { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 2, "totalKeysExamined" : 2, "totalDocsExamined" : 2, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 2,
Why were the other 2 plans rejected? The most important property here is the one called totalKeysExamined. This is the number of index entries that had to be checked in order to execute the query. The sports index plan had to examine only 2 keys. The section in the output called “allPlansExecution” shows the number of keys examined in the rejected plans:
"allPlansExecution" : [ { "nReturned" : 1, "executionTimeMillisEstimate" : 0, "totalKeysExamined" : 3, "totalDocsExamined" : 3, { "nReturned" : 1, "executionTimeMillisEstimate" : 0, "totalKeysExamined" : 3, "totalDocsExamined" : 1,
The top section shows the name index plan, the bottom section shows the path with 2 index searches. Both had to check 3 index keys compared to the 2 in the winning plan. MongoDb then decided that the sports index path was the fastest and rejected the other two.
One last thing we can look at here is related to the number of scanned documents to provide the result set. Running the name query with the explain function…
db.people.explain("executionStats").find({"name" : "John"})
…will tell us that 3 documents were scanned in total:
"executionStats" : { "executionSuccess" : true, "nReturned" : 3, "totalDocsExamined" : 3,
The index scan returned 3 keys and MongoDb examined the three documents that the index keys pointed at. Without the index MongoDb should have examined all 10 documents in the collection so it’s a great improvement. However, there are cases where we can do even more. That’s when the index itself covers the entire query.
In the above query MongoDb had to examine those 3 documents in the collection because we didn’t specify any filters for the properties to be returned. MongoDb had to collect the name, age, sports, scores and _id fields from the documents.
Say we only want to see the name field in the result set. We’ll modify our search and will extract those documents where the name field includes an “r”:
db.people.find({"name" : /r/}, {"_id" : 0, "name" : 1})
We get 4 documents:
{ "name" : "Bridget" } { "name" : "Christina" } { "name" : "Homer" } { "name" : "Mary" }
Let’s see what the query planner tells us:
db.people.explain("executionStats").find({"name" : /r/}, {"_id" : 0, "name" : 1})
We can see that MongoDb didn’t need to examine a single document. The name index was enough to cover the query by itself since we only wanted to look at the name field:
"executionStats" : { "executionSuccess" : true, "nReturned" : 4, "executionTimeMillis" : 0, "totalKeysExamined" : 10, "totalDocsExamined" : 0,
So it can be really worth checking which properties you need. It can be so that the index will cover the query entirely which is the absolute fastest way to retrieve the result set. Those queries are called covered queries in MongoDb.
Let’s finish this post with a tip on how to extract the total index size of a collection using the totalIndexSize() function:
db.people.totalIndexSize()
It gives 49152 bytes. If you want to see the details then recall the stats function:
db.people.stats()
…which has a section on index sizes:
"nindexes" : 3, "totalIndexSize" : 49152, "indexSizes" : { "_id_" : 16384, "name_1" : 16384, "sports_1" : 16384 },
We’ll look at text indexes in the next post.
You can view all posts related to data storage on this blog here.