Introduction to CouchDB with .NET part 10: view design documents cont’d
June 5, 2017 1 Comment
Introduction
In theprevious post we started looking into a large and relatively complex topic within CouchDB namely design documents. Design documents are JSON documents like the data documents we’ve looked at so far but they have a special purpose. They use JavaScript for their logic. The JS functions are embedded in the JSON document which has sections for each type of role a design document can play: view, update, show/list, validation. We started discussing the view design documents that play an important role in querying a CouchDB database. We also introduced the idea of MapReduce that is heavily used in view-based queries.
In this post we’ll continue where we left off in the previous part and continue looking at how view design documents work. We’ll keep using the children database we created previously.
As a reminder, we ended up with the following map function for our first view index:
function (doc) { emit(doc.last_name, doc.first_name + ' ' + doc.last_name); }
Checking for presence of JSON properties
We know that CouchDB documents can have any valid JSON structure. They don’t need to adhere to any strict schema. So what happens if a document in the children database doesn’t have a first_name and last_name property? Like this new document I entered for this first demo:
{ "_id": "39ba3b2cbe146f86835663f68000086e", "age": 6 }
Running the same request as in the previous post, i.e.
GET http://localhost:5984/children/_design/query-demo/_view/full-name
…will include the following not-so-pretty array element:
{ "id": "39ba3b2cbe146f86835663f68000086e", "key": null, "value": "undefined undefined" }
We use the normal JS null-checking technique to filter out documents that shouldn’t be taken into account. Update the map function definition in Fauxton to the following:
function (doc) { if (doc.last_name && doc.first_name) { emit(doc.last_name, doc.first_name + ' ' + doc.last_name); } }
The document with no first and last name will be filtered out.
Filtering on gender
Let’s add a new view index which allows us to filter on the gender property. Follow the steps we took in the previous post and create a new view design document:
- Design document ID: _design/gender
- Index name: gender-query
- Map function: see below
- Reduce: NONE
Here’s the map function:
function (doc) { if (doc.gender) { emit(doc.gender, "gender"); } }
We return the gender as the key and a string “gender” for the value. Note that the value can be anything in this case, even NULL, but we’ll see soon how we can filter on the key.
We’ll execute the query with an extended URL. We’ve already seen the role of the “include_docs” flag and it can be used even here as follows:
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true
We get the following result set:
{ "total_rows": 6, "offset": 0, "rows": [ { "id": "8a6469af31cb8032e49c2e6648000e92", "key": "f", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e6648000e92", "_rev": "1-e6d0d4d825cd6c4072316443065c67d7", "first_name": "Sophia", "last_name": "Jones", "gender": "f", "age": 7 } }, { "id": "8a6469af31cb8032e49c2e6648002bc3", "key": "f", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e6648002bc3", "_rev": "1-74c3846252052b1057f0d38f25f301e8", "first_name": "Emily", "last_name": "Davies", "gender": "f", "age": 6 } }, { "id": "8a6469af31cb8032e49c2e6648003a3e", "key": "f", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e6648003a3e", "_rev": "1-cfefdad2f8734ea1847d76a890a2d95b", "first_name": "Olivia", "last_name": "Walker", "gender": "f", "age": 8 } }, { "id": "8a6469af31cb8032e49c2e66480004fd", "key": "m", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e66480004fd", "_rev": "1-e1c9362343980d876a7b59cac61dd35c", "first_name": "John", "last_name": "Williams", "gender": "m", "age": 6 } }, { "id": "8a6469af31cb8032e49c2e6648001c23", "key": "m", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e6648001c23", "_rev": "1-367a393907c14c45c2a4d7ab05c89e04", "first_name": "Richard", "last_name": "Brown", "gender": "m", "age": 5 } }, { "id": "8a6469af31cb8032e49c2e6648002c80", "key": "m", "value": "gender", "doc": { "_id": "8a6469af31cb8032e49c2e6648002c80", "_rev": "1-a5710200c72e9d4f72ab71c12a053cce", "first_name": "Daniel", "last_name": "Roberts", "gender": "m", "age": 7 } } ] }
We can extend the URL to search on the key field. The following will return all boys in the children database:
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true&key=”m”
I won’t copy the result set here but there will be three documents returned. Similarly the query…
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true&key=”f”
…returns the three girls in the database.
We can limit the result set using the “limit” query parameter:
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true&limit=2
As noted before the documents are returned in an ascending order by the key. In our case “f” comes before “m” so the full query…
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true
…returns girls first. We can easily change the sort order using the “descending” query parameter:
GET http://localhost:5984/children/_design/gender/_view/gender-query?include_docs=true&descending=true
Now the boys will be shown first.
Counting with a reducer
Let’s edit the gender-query index in the Fauxton UI and add a count reducer. We want to see how many boys and girls we have in the database:
Here’s what our design document looks like in JSON now:
GET http://localhost:5984/children/_design/gender
{ "_id": "_design/gender", "_rev": "3-965ad5bba5ede97ed0a62c010bac434f", "views": { "gender-query": { "map": "function (doc) { if (doc.gender) { emit(doc.gender, \"gender\"); }}", "reduce": "_count" } }, "language": "javascript" }
The reduce phase is clearly visible with the “reduce” keyword. It’s set to “_count” which is a built-in aggregation function.
The “group” query parameter must be set to true to run the reducer:
GET http://localhost:5984/children/_design/gender/_view/gender-query?group=true
Here’s the result set:
{ "rows": [ { "key": "f", "value": 3 }, { "key": "m", "value": 3 } ] }
We have three boys and three girls as expected.
Note how the returned data set is now different with the reducer:
GET http://localhost:5984/children/_design/gender/_view/gender-query
We don’t get the same result anymore with all 6 documents as above:
{ "rows": [ { "key": null, "value": 6 } ] }
The reducer was automatically applied and it returned 6 which is the number of documents in the database. We can turn off the reducer as follows:
GET http://localhost:5984/children/_design/gender/_view/gender-query?reduce=false
We’re now back at the same result set as before where we had no reducer in the index.
The reduce phase is carried out after map is done. The data set from the map phase is applied on the reducer. Therefore we have a small chain with two members: select the keys and values in the map phase and then apply the reducer function on it.
Stale views
We mentioned that the indexes are recreated when a new document is added to the database. Recreating indexes can be a time-consuming process, however. We can tell CouchDB that it’s OK to take the most recent view result and skip the index rebuild step using the “stale” query parameter:
GET http://localhost:5984/children/_design/gender/_view/gender-query?reduce=false&stale=ok
If we now add a new document to the database…:
{ "_id": "39ba3b2cbe146f86835663f680001d9e", "_rev": "1-e6d0d4d825cd6c4072316443065c67d7", "first_name": "Sarah", "last_name": "Maxwell", "gender": "f", "age": 8 }
…and run the GET query above with the stale parameter set to true then Sarah will be excluded from the result set. You should get the same 6 documents as before. As soon as we remove the stale parameter…:
GET http://localhost:5984/children/_design/gender/_view/gender-query?reduce=false
…we get the full data set again with the 7 children.
Another possible value for the stale flag is “update_after” which will update the index after running the query but return a stale view when the query is executed. Let’s see what the effect is. Add a new document to the children database:
{ "_id": "39ba3b2cbe146f86835663f6800031ce", "_rev": "1-e1c9362343980d876a7b59cac61dd35c", "first_name": "Charlie", "last_name": "Brown", "gender": "m", "age": 5 }
…and execute the following GET request:
GET http://localhost:5984/children/_design/gender/_view/gender-query?reduce=false&stale=update_after
The result set should have 7 document, i.e. Charlie is not yet included. Execute the same “stale=ok” query again:
GET http://localhost:5984/children/_design/gender/_view/gender-query?reduce=false&stale=ok
We’ll now get all 8 documents including Charlie.
We’ll continue in the next post with more MapReduce examples.
You can view all posts related to data storage on this blog here.
Pingback: CouchDB Weekly News, June 8, 2017 – CouchDB Blog