← Introduction to Amazon Code Pipeline with Java part 11: starting with the job agent

Introduction to MongoDb with .NET part 33: indexing arrays →

Introduction to MongoDb with .NET part 32: compound indexes

May 23, 2016 Leave a comment

Introduction

In the previous post we created our first index on the zip codes collection using the Mongo shell. We inserted a text on the state property. We also got introduced to the explain function which presents a query plan with or without executing the query depending on the boolean parameter to the function. It’s a good strategy to run the explain function before and after the index creation. Your goal should be to reduce the amount of documents scanned as much as possible.

In this post we’ll take a look at compound indexes.

Compound indexes

A compound index is one which consists of two or more properties of a collection. They are useful if a frequent query involves multiple filters. We can take an example from the restaurants collection. Here’s an example document:

db.restaurants.findOne()
{
        "_id" : ObjectId("56edc2ff03a1cd840734dba8"),
        "address" : {
                "building" : "2780",
                "coord" : [
                        -73.98241999999999,
                        40.579505
                ],
                "street" : "Stillwell Avenue",
                "zipcode" : "11224"
        },
        "borough" : "Brooklyn",
        "cuisine" : "American ",
        "grades" : [
                {
                        "date" : ISODate("2014-06-10T00:00:00Z"),
                        "grade" : "A",
                        "score" : 5
                },
                {
                        "date" : ISODate("2013-06-05T00:00:00Z"),
                        "grade" : "A",
                        "score" : 7
                },
                {
                        "date" : ISODate("2012-04-13T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                },
                {
                        "date" : ISODate("2011-10-12T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                }
        ],
        "name" : "Riviera Caterer",
        "restaurant_id" : "40356018"
}

We can imagine that a frequent query would filter on the borough and cuisine properties. E.g. give me all Thai restaurants in Manhattan:

db.restaurants.find({"borough" : "Manhattan", "cuisine" : "Thai"})

The explain command…

db.restaurants.explain(true).find({"borough" : "Manhattan", "cuisine" : "Thai"})

…will tell us the following:

In the absence of any indexes on the borough or cuisine fields MongoDb performs a complete query scan – recall the COLLSCAN stage in the winning plan sub-document
154 documents are returned and all 25362 documents in the collection were scanned

Let’s try to improve this. We’ll first see if a single index on the borough field helps. Insert the following index:

db.restaurants.createIndex({"borough" : 1})

We didn’t provide any name for the index. MongoDb will give it a default name of (property_name)_(1 or -1 whether ascending or descending), e.g.:

borough_1
state_-1

Running the same explain command gives us the following:

An index scan was performed with the borough_1 index
The index scan returned 10259 documents which is the number of restaurants in Manhattan in the collection
The total number of scanned documents were therefore 10259 which is at least some improvement compared to what we had before

Here are the relevant bits of the query plan to support the above – I’ve removed the “noise”:

 "winningPlan" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "cuisine" : {
                                        "$eq" : "Thai"
                                }
                        },
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "borough" : 1
                                },
                                "indexName" : "borough_1",

...

 "executionStats" : {                
                "nReturned" : 154,
                
                "totalDocsExamined" : 10259,
                "executionStages" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "cuisine" : {
                                        "$eq" : "Thai"
                                }
                        },
                        "nReturned" : 154,                        
                        "docsExamined" : 10259                        
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 10259,
                                
                                "indexName" : "borough_1",
                                
        },

We have therefore 2 stages in total. First the index scanning stage which returns 10259 documents, i.e. the restaurants in Manhattan. Then we have a FETCH stage where all 10259 documents are scanned and filtered on the cuisine field.

Let’s see if a compound index can further improve the results. First we’ll remove the index with the dropIndex function. The dropIndex function accepts the same JSON parameter with which we created the index:

db.restaurants.dropIndex({"borough" : 1})

The following command will create a compound index on the borough and the cuisine fields:

db.restaurants.createIndex({"borough" : 1, "cuisine" : 1})

Running the explain function now will tell us the following:

The compound index was indeed used to execute the query
The index scan stage returned 154 documents
The total number of documents examined is 154 which also equals the number of documents the query returns
The total execution time was 1 millisecond

Here are some relevant details from the query plan:

 "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "borough" : 1,
                                        "cuisine" : 1
                                },
                                "indexName" : "borough_1_cuisine_1",

...

 "executionStats" : {                
                "nReturned" : 154,
                "executionTimeMillis" : 1,
                "totalKeysExamined" : 154,
                "totalDocsExamined" : 154,
                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 154,                        
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 154,
                                "executionTimeMillisEstimate" : 0,                                
                                "indexName" : "borough_1_cuisine_1",

We’re certainly better off compared to the state where we had no indexes at all.

Can the above index help us with a filter on the borough property only? Let’s try the following:

db.restaurants.explain(true).find({"borough" : "Manhattan"})

Yes, it definitely helps. The total number of documents returned is 10259 which is also the number of scanned documents. The query plan shows that the compound index was used to accelerate the query execution.

What about a query on the cuisine field only? Let’s see:

db.restaurants.explain(true).find({"cuisine" : "Thai"})

No, our compound index doesn’t help at all:

 "winningPlan" : {
                        "stage" : "COLLSCAN",

...

 "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 288,
                "executionTimeMillis" : 13,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 25362,
                "executionStages" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "cuisine" : {
                                        "$eq" : "Thai"
                                }
                        },
                        "nReturned" : 288,

All documents in the collection had to be scanned.

This is logical if you think about it. Let’s say we have a telephone book from all English speaking countries, i.e. one for Canada, one for the US etc. According to Wikipedia there are 67 sovereign states and 27 other entities that fit this criteria, so we have 94 telephone books. If we want to find all people with the last name Jones that live in the UK then we can consult the UK telephone book and ignore the rest. However, if we want to find everyone with the last name Jones then we have to scan all 94 telephone books to collect all the relevant “documents”.

The telephone books represent a compound index of country and last name, in that order. The above analysis suggests that we have to go from left to right when assessing the usability of a compound index for a specific search. The first member of a compound index is the main index key, then comes the secondary key, followed by the tertiary and so on. If we have a compound index on A, B and C then the following queries will use the index:

Filter on A
Filter on A and B
Filter on A, B and C

Other filters, such as the ones on B or B and C will need to scan all the documents in the collection.

We’ll look at indexing arrays in the next post.

You can view all posts related to data storage on this blog here.

Filed under .NET, MongoDb Tagged with c#, mongodb

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Exercises in .NET with Andras Nemes

Introduction to MongoDb with .NET part 32: compound indexes

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

Introduction to MongoDb with .NET part 32: compound indexes

Share:

Related

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow