Introduction to CouchDB with .NET part 6: batch updates and insertions
May 26, 2017 1 Comment
Introduction
In the previous post we looked at the CouchDB concurrency implementation and the notion of eventual consistency. Concurrency in a database means that multiple threads might want to access and modify the same data record at the same time. CouchDB solves the concurrency problem by a mechanism called Multi-Version Concurrency Control MVCC. With MVCC CouchDB keeps the various revisions of the same document. If a thread wants to read the document while it is being updated then the reading thread will get the most recent complete copy of the document. The caller will in such a case get an outdated revision of the document. However, a subsequent request will then get the updated copy. This scenario is called eventual consistency. CouchDB reaches high availability due to the absence of data locks and sacrifices data consistency to some degree.
In this post we’ll look at batch updates and insertions.
Batch modifications
CouchDB can take care of groups of updates and insertions to be executed in a batch. It’s really simple actually. The most basic scenario is that we send multiple documents to the CouchDB HTTP API instead of just one. The JSON objects need to be enclosed in an array assigned to the “docs” property. In the previous post we created a database called persons. Let’s add 5 documents to it at one go. We’ll use the same POST request as before but we need to extend it with “_bulk_docs”:
POST http://localhost:5984/persons/_bulk_docs
We set the Content-Type header to application/json.
Here’s the JSON body:
{ "docs": [{ "first-name": "Elvis", "last-name": "Presley", "age": 54 }, { "first-name": "Marilyn", "last-name": "Monroe", "age": 75 }, { "first-name": "Marlon", "last-name": "Brando", "age": 63 }, { "first-name": "Roger", "last-name": "Moore", "age": 87 }, { "first-name": "Greta", "last-name": "Garbo", "age": 79 }] }
If everything goes well then CouchDB will respond with a 201 Created and a collection of IDs and revision numbers for all 5 documents:
[ { "ok": true, "id": "3559d9c81c785b6bfc27a34904018a42", "rev": "1-4a82a53090f81a0766448db0de7bf7bb" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401990e", "rev": "1-9c71af451db58eda5a2ba985148a5c0c" }, { "ok": true, "id": "3559d9c81c785b6bfc27a34904019fa5", "rev": "1-b77c22d6889fb3595c9a13c55f0aed58" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401af41", "rev": "1-7d2c0c0874e94ba64edf168f10f75a77" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401b00f", "rev": "1-9d13836fe90d2a56e15ba6d41b3cab80" } ]
Now let’s try to update the first three documents in one batch. All we need to do is include the ID and revision number of the document to be updated. The URL is the same as for insertions. Here’s the JSON payload I’m going to test:
{ "docs": [{ "_id": "3559d9c81c785b6bfc27a34904018a42", "_rev": "1-4a82a53090f81a0766448db0de7bf7bb", "first-name": "Elvis", "last-name": "Presley", "age": 68 }, { "_id": "3559d9c81c785b6bfc27a3490401990e", "_rev": "1-9c71af451db58eda5a2ba985148a5c0c", "first-name": "Marilyn", "last-name": "Monroe", "age": 79 }, { "_id": "3559d9c81c785b6bfc27a34904019fa5", "_rev": "1-b77c22d6889fb3595c9a13c55f0aed58", "first-name": "Marlon", "last-name": "Brando", "age": 66 }] }
CouchDB responds with 201 Created and a new set of revision numbers:
[ { "ok": true, "id": "3559d9c81c785b6bfc27a34904018a42", "rev": "2-f19b6df3bd9015872a7dde724c922c83" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401990e", "rev": "2-5eb075ac21567de210d0d080b4293f8a" }, { "ok": true, "id": "3559d9c81c785b6bfc27a34904019fa5", "rev": "2-53da1176c8605ee7c21b7db5a1be0164" } ]
Now let’s see what happens if one update fails. In the next payload we’ll update the last two Person documents, i.e. Roger Moore and Greta Garbo. Furthermore we’ll update Elvis Presley again but we’ll use the first revision number. We know from the previous post that this operation should fail due to an outdated record being updated. I’ll test with the following JSON payload:
{ "docs": [{ "_id": "3559d9c81c785b6bfc27a3490401af41", "_rev": "1-7d2c0c0874e94ba64edf168f10f75a77", "first-name": "Roger", "last-name": "Moore", "age": 92 }, { "_id": "3559d9c81c785b6bfc27a3490401b00f", "_rev": "1-9d13836fe90d2a56e15ba6d41b3cab80", "first-name": "Greta", "last-name": "Garbo", "age": 81 }, { "_id": "3559d9c81c785b6bfc27a34904018a42", "_rev": "1-4a82a53090f81a0766448db0de7bf7bb", "first-name": "Elvis", "last-name": "Presley", "age": 72 }] }
CouchDB responds with the following:
[ { "ok": true, "id": "3559d9c81c785b6bfc27a3490401af41", "rev": "2-37eeec3925f444598e932b66c77bb40d" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401b00f", "rev": "2-4c2625988e2c84c74c4e18e52086874b" }, { "id": "3559d9c81c785b6bfc27a34904018a42", "error": "conflict", "reason": "Document update conflict." } ]
The first two updates got through but the last one has failed with a conflict error. In other words CouchDB handles each update and insertion in the batch as an independent unit. If one fails then it has no implications for the rest.
Transaction… sort of and no more
It’s worth noting that earlier versions of CouchDB supported the “all_or_nothing” flag to change that behaviour. When set to true CouchDB would only execute the batch if every document successfully went through. We would use it in the following way:
{ "all_or_nothing": true, "docs": [{ "_id": "3559d9c81c785b6bfc27a3490401af41", "_rev": "2-37eeec3925f444598e932b66c77bb40d", "first-name": "Roger", "last-name": "Moore", "age": 95 }, { "_id": "3559d9c81c785b6bfc27a3490401b00f", "_rev": "2-4c2625988e2c84c74c4e18e52086874b", "first-name": "Greta", "last-name": "Garbo", "age": 83 }, { "_id": "3559d9c81c785b6bfc27a34904018a42", "_rev": "1-4a82a53090f81a0766448db0de7bf7bb", "first-name": "Elvis", "last-name": "Presley", "age": 78 }] }
However, this flag is not supported anymore in CouchDB 2:
[ { "id": "3559d9c81c785b6bfc27a3490401af41", "rev": "2-37eeec3925f444598e932b66c77bb40d", "error": "not_implemented", "reason": "all_or_nothing is not supported" }, { "id": "3559d9c81c785b6bfc27a3490401b00f", "rev": "2-4c2625988e2c84c74c4e18e52086874b", "error": "not_implemented", "reason": "all_or_nothing is not supported" }, { "id": "3559d9c81c785b6bfc27a34904018a42", "rev": "1-4a82a53090f81a0766448db0de7bf7bb", "error": "not_implemented", "reason": "all_or_nothing is not supported" } ]
The all_or_nothing flag provided at least some support for transactions in CouchDB but it only worked in a single node environment. It wouldn’t work in a typical clustered database anyway. However, you might see it applied in a project with an older CouchDB version.
Batch modifications in CouchDB are said to be non-atomic which means that each modification is carried out in isolation and not as a unit.
Mixing updates and insertions
We can mix insertions and updates in the same batch. The next JSON payload updates two existing documents and inserts a new:
{ "docs": [{ "_id": "3559d9c81c785b6bfc27a3490401af41", "_rev": "2-37eeec3925f444598e932b66c77bb40d", "first-name": "Roger", "last-name": "Moore", "age": 95 }, { "_id": "3559d9c81c785b6bfc27a3490401b00f", "_rev": "2-4c2625988e2c84c74c4e18e52086874b", "first-name": "Greta", "last-name": "Garbo", "age": 83 }, { "first-name": "Freddie", "last-name": "Mercury", "age": 80 }] }
The response shows that the third document got its first revision, i.e. it was inserted successfully:
[ { "ok": true, "id": "3559d9c81c785b6bfc27a3490401af41", "rev": "3-30aadf4451a04569b82b01175b391458" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401b00f", "rev": "3-767e04b13edd0ebeda8ad71a1808c768" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401b892", "rev": "1-de9b80faccbc3bcd0dc04a8e2eee51a0" } ]
Batch deletions
We can delete multiple data records in the same batch. We set the _deleted property to true. The following JSON payload will remove the first three documents from the persons database, i.e. John W. Smith, Elvis Presley and Marilyn Monroe:
{ "docs": [{ "_id": "3559d9c81c785b6bfc27a349040177b0", "_rev": "3-a11c420a45f2ca5334522e72aefb899e", "_deleted": true }, { "_id": "3559d9c81c785b6bfc27a34904018a42", "_rev": "2-f19b6df3bd9015872a7dde724c922c83", "_deleted": true }, { "_id": "3559d9c81c785b6bfc27a3490401990e", "_rev": "2-5eb075ac21567de210d0d080b4293f8a", "_deleted": true }] }
It should succeed and all three documents should get a new revision number:
[ { "ok": true, "id": "3559d9c81c785b6bfc27a349040177b0", "rev": "4-05a41ea3b37ea0dea540fc34890c9369" }, { "ok": true, "id": "3559d9c81c785b6bfc27a34904018a42", "rev": "3-af434f7f64b7304ae495bb58f05d1521" }, { "ok": true, "id": "3559d9c81c785b6bfc27a3490401990e", "rev": "3-03208d7c4416be70af857a2b8ab22fbe" } ]
We’ll continue in the next post.
You can view all posts related to data storage on this blog here.
Pingback: CouchDB Weekly News, June 1, 2017 – CouchDB Blog