Introduction to CouchDB with .NET part 11: various MapReduce examples
June 7, 2017 1 Comment
Introduction
In the previous post we continued exploring view design documents in CouchDB. We first saw how to filter the documents that will be considered in the map phase. We then examined various simple operations on the key returned by the mapping. Our first reducer example included the usage of the built-in _count function for a simple count aggregation. Lastly we mentioned the usage of the stale query parameter to tell CouchDB how to handle the updating of the view indexes.
In this post we’ll continue looking at querying through views and MapReduce functions in CouchDB.
Updating and inserting views through the HTTP API and complex keys and values
We currently have a design document called gender in the children database with the following JSON structure:
{ "_id": "_design/gender", "_rev": "3-965ad5bba5ede97ed0a62c010bac434f", "views": { "gender-query": { "map": "function (doc) {\n if (doc.gender) {\n emit(doc.gender, \"gender\");\n }\n}", "reduce": "_count" } }, "language": "javascript" }
Our goal is to update this document so that it includes a second view besides “gender-query”. We’ll also see how to emit compound values and keys from the map function at the same time.
In order to update the above document we need to send a PUT request with the following structure: /dbname/_design/document_id. The body must include the revision ID of the document we’re updating, i.e. 3-965ad5bba5ede97ed0a62c010bac434f in our example. The JSON body will also include any updates like in the following example:
PUT http://localhost:5984/children/_design/gender
…with the header Content-Type set to application/json. Here’s the JSON payload:
{ "_rev": "3-965ad5bba5ede97ed0a62c010bac434f", "views": { "gender-query": { "map": "function (doc) {\n if (doc.gender) {\n emit(doc.gender, \"gender\");\n }\n}", "reduce": "_count" }, "gender-query-complex-value": { "map": "function (doc) {\n if (doc.gender) { emit(doc.gender, [doc.first_name, doc.last_name, doc.age]); }}" } }, "language": "javascript" }
The API should return a 201 Created response:
{ "ok": true, "id": "_design/gender", "rev": "4-348d31bcb9553ca0f7a6982156c939e8" }
Consider now the value of the key-value pair in the above example. The selected JSON properties are indicated by an array as the mapped value. We can call this new view as follows:
GET http://localhost:5984/children/_design/gender/_view/gender-query-complex-value
Here’s an example of the complex value returned:
"value": [ "Sarah", "Maxwell", 8 ]
That’s similar to how we select specific columns in SQL like SELECT name, age FROM Person.
We can have a compound key as well using the same array technique. We’ll at the same time insert a new view using the HTTP API. It is the same request as above but the JSON document must not include the revision ID. Also, the URL must point to a non-existing design document.
PUT http://localhost:5984/children/_design/name
Payload:
{ "views": { "complex-names": { "map": "function (doc) {\n if (doc.first_name) { emit([doc.first_name, doc.last_name], doc.age); }}" } }, "language": "javascript" }
The new design document with a single view will be created:
{ "ok": true, "id": "_design/name", "rev": "1-d46d6acee2370142480d9915f763b324" }
GET http://localhost:5984/children/_design/name/_view/complex-names
Here’s an example of a key-value pair:
"key": [ "Charlie", "Brown" ], "value": 5
Compound keys are important when querying on multiple fields. They also play a role if we want to pass down multiple properties to the reducer from the map function. We’ll soon see examples for both.
Selecting ranges
We’ll now see how to select a range based on the key. More specifically we want to search on the children’s age. Let’s first add a couple of new documents to our children database so that we have a bit more variation of ages:
{ "first_name": "Chloe", "last_name": "Hughes", "gender": "f", "age": 12 } { "first_name": "Jessica", "last_name": "Taylor", "gender": "f", "age": 13 } { "first_name": "Alice", "last_name": "Robinson", "gender": "f", "age": 11 } { "first_name": "Kevin", "last_name": "White", "gender": "m", "age": 11 } { "first_name": "William", "last_name": "Hall", "gender": "m", "age": 12 } { "first_name": "Mark", "last_name": "Jackson", "gender": "m", "age": 13 }
So we have 3 new girls and 3 new boys. Let’s extend the “names” design document with a new view where the key is the age of the children. We can use Fauxton to update design documents:
That brings us to the same JSON editor we saw before. Add the following element to the views collection:
"age": { "map": "function (doc) {if (doc.age) {emit (doc.age, [doc.first_name, doc.last_name]); }}" }
GET http://localhost:5984/children/_design/name/_view/age
…will return all documents in the children database sorted by age in an ascending order. Here’s an example:
"key": 5, "value": [ "Charlie", "Brown" ]
In order to select a range of ages we can use the startkey and endkey query parameters. Here’s how to select all children of age 11 and above:
GET http://localhost:5984/children/_design/name/_view/age?startkey=11
Similarly we use the endkey to select the kids of up to 11 years of age:
GET http://localhost:5984/children/_design/name/_view/age?endkey=11
We can combine these parameters to return a range as follows:
GET http://localhost:5984/children/_design/name/_view/age?startkey=7&endkey=11
Keep in mind that both startkey and endkey are inclusive.
Using start and end keys together with setting the order to descending is confusing at first. Say we want to run the above example…
GET http://localhost:5984/children/_design/name/_view/age?startkey=11
…and return the kids in a descending order by age. The following should be correct, right?
GET http://localhost:5984/children/_design/name/_view/age?startkey=11&descending=true
However, you’ll see that the query returns all children of UP TO 11 years of age in a descending order. E.g. John Williams of age 6 is also included. Here the start key is defined as the start for the ordered documents. We start with age 11 and start counting down since we want to have a descending order. In order to achieve what we originally want we have to replace startkey with endkey:
GET http://localhost:5984/children/_design/name/_view/age?endkey=11&descending=true
The result set will be as intended.
Selecting exact key values
What if we want to select all children of a specific age? We can use the same “age” view and filter on the key in the URL as follows:
GET http://localhost:5984/children/_design/name/_view/age?key=11
This returns two elements, i.e. two kids of age 11:
{ "total_rows": 15, "offset": 9, "rows": [ { "id": "3031afee194a8e4295c021221a002cc7", "key": 11, "value": [ "Alice", "Robinson" ] }, { "id": "3031afee194a8e4295c021221a004a76", "key": 11, "value": [ "Kevin", "White" ] } ] }
We’ve already seen an example of the above technique before where we selected a certain gender. Don’t forget to put the search parameter between quotation marks when searching on string keys. You should also URL-encode the string value.
We can also specify multiple values for the keys. Let’s say we want to select all kids of age 6, 11 and 13. We’ll need a POST request for that since we have send a JSON body.
POST http://localhost:5984/children/_design/name/_view/age
Here’s the payload:
{ "keys": [6, 11, 13] }
If you want to filter on strings then don’t forget to put quotation marks around each search parameter in the “keys” array.
Querying on multiple properties
We now want to filter on the age and gender properties of the children documents. Let’s say we want to find all boys younger than 11 years. This is a case for compound keys. Modify the “name” design document we created above and add the following view:
"age-gender": { "map": "function (doc) {if (doc.age && doc.gender) { emit([doc.gender, doc.age], 'Full name: ' + doc.first_name + ' ' + doc.last_name); }}" }
GET http://localhost:5984/children/_design/name/_view/age-gender
{ "id": "8a6469af31cb8032e49c2e6648002bc3", "key": [ "f", 6 ], "value": "Full name: Emily Davies" }
We can assign a compound value to the startkey and endkey parameters. Consider the following example:
GET http://localhost:5984/children/_design/name/_view/age-gender?key=%5B“m”, 11]
That returns all the boys of age 11, i.e. where the compound key has the values “m” and 11.
What about the following?
GET http://localhost:5984/children/_design/name/_view/age-gender?endkey=%5B“m”, 11]
Recall that end and start keys are inclusive. The above example means that we want all children documents where the key includes “m” AND 11 or below. The compound search key is handled as a unit. Since the letter “f” comes before “m” in alphabetical order all girls will be included regardless of their age. Then the selection continues until we reach the boys with age 11. Hence the following girl will be part of the result set despite her age:
{ "id": "3031afee194a8e4295c021221a000088", "key": [ "f", 12 ], "value": "Full name: Chloe Hughes" }
The reason is that the combination “f” and 12 comes before the upper limit “m” / 11. The following boy won’t be part of the result set:
"first_name": "William", "last_name": "Hall", "gender": "m", "age": 12
OK, let’s now add a startkey as well:
GET http://localhost:5984/children/_design/name/_view/age-gender?endkey=%5B“m”, 11]&startkey=[“m”, 6]
That will return all boys of age between 6 and 11 with both limits being inclusive. We can therefore solve our original requirement, i.e. return all boys below the age of 11 as follows:
GET http://localhost:5984/children/_design/name/_view/age-gender?endkey=%5B“m”, 11]&startkey=[“m”, 0]
Alternatively we can assign an empty JSON object as the upper limit for the key:
http://localhost:5984/children/_design/name/_view/age-gender?startkey=%5B“m”, {}]&endkey=[“m”, 11]&descending=true
We’ll continue in the next post.
You can view all posts related to data storage on this blog here.
Pingback: CouchDB Weekly News, June 8, 2017 – CouchDB Blog