Introduction to CouchDB with .NET part 18: Mango indexes and queries continued
June 23, 2017 1 Comment
Introduction
In the previous post we started discussing a new feature in CouchDB 2.0, namely Mango queries. Mango queries and Mango indexes are also based on views but these views are created for us, we don’t need to worry about them. Therefore Mango queries provide us with a tool to perform ad-hoc searches in CouchDB with a JSON-based query language. We spent most of the previous post on setting up a small database of ZIP codes that we use for our demos. The gateway to performing the queries is the POST /_find endpoint in the HTTP API. We attach the query to the HTTP request body. We looked at the various properties of the query where the selector is the most important. We also saw that the _id property is indexed by default and then carried out our first query based on the id.
In this post we’ll continue looking at Mango queries.
Indexes and our first operator demo
We can insert various operators into the query to describe the selector we want. The following query uses the $lt operator which stands for less-than. We want to select all ZIP codes whose value is less than 40000:
POST http://localhost:5984/zipcodes/_find
{ "selector": { "post_code": {"$lt": 40000} } }
The selector condition is itself a JSON object with the operator and its value. Operators and their values come in pairs like that.
Execute the query and…
…we get a warning along with the results:
{ "warning": "no matching index found, create an index to optimize query time", "docs": [ { "_id": "dfd33c43e5c559ed6f2343d6f9000aeb", "_rev": "1-29ac3928e126a88b3c97b7996ed94c7b", "post_code": 35801, "country": "United States", "country_abbreviation": "US", "places": [ { "place_name": "Huntsville", "longitude": -86.5673, "state": "Alabama", "state_abbreviation": "AL", "latitude": 34.7269 } ] }, //other results ignored ] }
That’s right, we have no index on the post code field yet. However, the query still could be executed which is very positive. We’re used to this behaviour from other popular database engines, that queries with no matching indexes are also allowed through.
Let’s create an index via the Fauxton UI. Select the zipcodes database, expand the + button of the All Document menu item and select Mango Indexes. You’ll land on a page with a hint on how to create a new index:
Change the index JSON into the following:
{ "index": { "fields": [ "post_code" ] }, "type": "json", "name": "post-code-index" }
…and press the green Create Index button.
We can view the available indexes in a tabular format in Fauxton by pressing the Table button:
If you look at the “def” column then you’ll see that an index has an extra property value “asc”. It means ascending of course. Indexes can be ascending or descending, with ascending being more common. The index will keep the indexed field in ascending or descending order depending on how the index was specified. We’ll execute the same query as above once again in a bit and you’ll notice that the results are sorted in ascending order by the post_code field automatically.
Should you need a descending index then here’s the format:
{ "index": { "fields": [ {"post_code": "desc" } ] } }
This ordering will make searches very efficient. We discussed before that view indexes must be refreshed if a new document is inserted into the database. That’s quite common for indexing in general: indexes must be updated when the underlying data changes and CouchDB is no exception. If there’s a new ZIP code document then this ZIP code must be squeezed into the ZIP code index. This implies that write operations are slower compared to the situation with no indexes at all. However, reads will be much faster. You’ll need to consider the frequency of reads and writes before you blindly start creating indexes on every single field. This latter strategy is never a good idea by the way. Normally read operations by far outweigh writes in an application which is an extra ingredient in the index planning process.
Furthermore you’ll notice that the “fields” property has an array as its value. It’s because we can insert compound indexes, i.e. indexes that consist of 2 ore more fields. If there’s a frequent search on two fields, e.g. post_code AND country then it can be a good idea to create a composite index of “post_code” and “country”. That kind of index will also help with searches on the post_code field alone. However, if there’s a search solely on the country field then the multi-key index won’t be usable as the primary sorting key is the post_code. Country is only the secondary sorting key in that case and a country-based query will need to go through all documents.
Here’s how we would enter a compound index:
{ "index": { "fields": [ "post_code", "country" ] }, "type": "json", "name": "post-code-country-compound-index" }
If we insert the above index and perform a search on the country field only then we still get the same warning as above:
{ "selector": { "country": "United States" } }
…
"warning": "no matching index found, create an index to optimize query time",
However, if you execute the same post-code based query as in the first example above and you’ll see that the warning is gone. Our compound index covers post-code based queries as well. Feel free to keep the compound index or delete it…:
…and recreate the index with the post_code field only, it’s up to you. Feel free to play around with single and compound indexes, ascending and descending etc. Note that currently we cannot define different sort orders for the different members of a compound key. The following will fail:
{ "index": { "fields": [ {"post_code": "asc"}, {"country": "desc"} ] }, "type": "json", "name": "post-code-country-compound-index" }
…with the following message:
“Sorts currently only support a single direction for all fields.”
This feature may be implemented in the future though.
Querying in Fauxton
We can execute queries via Fauxton as well via the following menu item:
Then we need to provide the selector and press the green button:
You can execute the demo queries via Fauxton or the HTTP API as you wish.
Before we continue let’s prove quickly that Mango indexes are in fact view design documents. The index was added into a design document and it will be listed in Fauxton:
Click on the design document ID and you’ll be shown a JSON document similar to the following:
{ "_id": "_design/3916955199b783713c3cda9576a78b38efb3dfbf", "_rev": "1-525e013a98def2b906073640dc72a955", "language": "query", "views": { "post-code-index": { "map": { "fields": { "post_code": "asc" } }, "reduce": "_count", "options": { "def": { "fields": [ "post_code" ] } } } } }
Some bits and pieces look different to what we saw before regarding views but the index called post-code-index is definitely a view. It has a map and a reduce phase as well.
Sorting, limiting, skipping and selecting properties
We mentioned before that the selector property is the most important part of a the JSON document passed to the Mango query. Before we concentrate more on the selector syntax and operators let’s quickly go through the others, they are very intuitive. Here’s an enhanced version of the post_code query we saw above:
{ "selector": { "post_code": {"$lt": 40000} }, "fields": ["post_code", "country"], "sort": [{"post_code": "desc"}], "limit": 5, "skip": 2 }
- We want to select the post_code and country fields only and ignore the rest
- We’d like to sort the result by post_code in a descending order
- We need only the top 5 elements after skipping two. These two are often used for paging through the results.
Here’s the result set:
{ "docs": [ { "post_code": 37201, "country": "United States" }, { "post_code": 35816, "country": "United States" }, { "post_code": 35801, "country": "United States" }, { "post_code": 33190, "country": "United States" }, { "post_code": 33124, "country": "United States" } ] }
We can add multiple sort properties and set their direction as follows:
{ "selector": { "post_code": {"$lt": 40000} }, "fields": ["post_code", "country"], "sort": [{"post_code": "desc"}, {"country": "desc"}], "limit": 5, "skip": 2 }
…which doesn’t bring us anything new since we only have a single country in the database. However, you see how the “sort” property is an array that can be extended.
We’ll continue with selectors in the next post.
You can view all posts related to data storage on this blog here.
Pingback: CouchDB Weekly News, June 29, 2017 – CouchDB Blog