Introduction to MongoDb with .NET part 33: indexing arrays
May 24, 2016 Leave a comment
Introduction
In the previous post we looked at compound indexes in MongoDb. Compound indexes can be useful if you’d like to make multi-field searches more efficient. E.g. if there’s a frequent search with a filter on “name” and “age” then a compound index is a good candidate to make the search more efficient. The exact sequence of the keys in the compound definition is important. The leftmost key is the primary key, followed by the secondary key and so on. A search on the secondary key field won’t use the compound index.
In this post we’ll look at indexing array fields.
Array indexes
Indexing arrays is in fact not much different from indexing simple data types like integers and strings. Our demo collections – zip codes and restaurants – have no suitable array fields. We’ll therefore create our own little collection. We created one called “people” before but we’ll create it again from scratch. Run the following commands in a MongoDb client shell:
use model db.people.drop() db.people.insert({"name" : "John", "sports" : ["football", "badminton", "squash"], "grades" : [50, 60,45, 43]}) db.people.insert({"name" : "Mary", "sports" : ["cricket", "badminton"], "grades" : [90, 60, 53]}) db.people.insert({"name" : "William", "sports" : ["tennis", "basketball", "squash"], "grades" : [30, 40,64, 90, 78]}) db.people.insert({"name" : "Bridget", "sports" : ["baseball", "athletics", "running"], "grades" : [50, 90,95, 40]}) db.people.insert({"name" : "Susan", "sports" : ["football", "boxing", "karate"], "grades" : [70, 80,78]}) db.people.insert({"name" : "Andrew", "sports" : ["cyckling", "swimming", "basketball"], "grades" : [10, 90,60, 50]}) db.people.insert({"name" : "Phil", "sports" : ["handball", "skating", "baseball"], "grades" : [55, 65]}) db.people.insert({"name" : "Joanna", "sports" : ["judo", "karate", "running"], "grades" : [88, 77,66, 55]}) db.people.insert({"name" : "Christina", "sports" : ["running", "swimming", "badminton"], "grades" : [51, 61,46, 44]}) db.people.insert({"name" : "Homer", "sports" : ["football", "skating", "boxing"], "grades" : [45, 47,90, 99]})
That’s 10 documents in total with each having 2 array fields. We now want to find all people that like karate. That should be easy by now I hope:
db.people.find({"sports" : "karate"})
It finds two people: Susan and Joanna.
We add an index on an array field by the same createIndex command we saw before in action:
db.people.createIndex({"sports" : 1})
We can see the index in action by running our good friend the explain() function:
db.people.explain(true).find({"sports" : "karate"})
¨
It tells us that an index scan was performed using the sports_1 index which we’ve just created and that exactly two documents were scanned in total. Since that’s equal to the number of returned documents we’re satisfied.
Can we have a compound index on the name and the sports fields? Let’s see:
db.people.createIndex({"name" : 1, "sports" : 1})
Yes, that’s possible. What if we want to have a compound index on 2 array fields? After all a potential query could be to extract all people who like football and have at least one score above 80. Running the following command…
db.people.createIndex({"sports" : 1, "grades" : 1})
…tells us that compound indexes with 2 or more array fields are not allowed:
{ "ok" : 0, "errmsg" : "cannot index parallel arrays [grades] [sports]", "code" : 10088 }
Parallel arrays cannot be indexed. Each combination of the array values, such as (football, 90), (football, 85), (basketball, 55) etc. would need to be indexed which quickly causes the index size to explode. At this point of time compound indexes on 2 or more arrays are simply not allowed in MongoDb.
We’ll take up indexing fields in sub-documents in the next post.
You can view all posts related to data storage on this blog here.