Introduction to CouchDB with .NET part 8: data replication

Introduction

In the previous post we saw how to view changes made to a database. The _changes API endpoint returns a JSON with the list of changes. There are various filter functions to view only certain document IDs or include the properties of the document in the response.

In this post we’ll take a look at data replication between two databases.

Data replication

By data replication we mean that a set of data in a database is copied over to another database. There’s always a source and a target database where the data from the source database is replicated to the target database. The target database is also called a destination database. Ideally only those records are copied over that don’t exist in the target database. This is a means of creating backups or archives of data.

Data replication can be a one-off event where the DB administrator initiates the replication either in a management UI or in code. Another replication mode is continuous where a change in the source database is automatically copied over to the target database. Either way the expected outcome of the operation is that the source and target databases have the same set of active documents. Even the revision IDs must be the same. If a document was deleted in the source database then it must also be marked as deleted in the target database.

The source and target database can be either local or remote databases. If the source is local and the target is remote then we push the documents from local to remote. This is called push replication. The opposite, i.e. when the source is remote and the target is local, is called pull replication.

Demo preparations

Let’s create a new database called bands for the upcoming demos:

PUT http://localhost:5984/bands

We’ll also add a couple of documents:

POST http://localhost:5984/bands

Don’t forget the Content-Type: application/json header. I’ve inserted the following documents:

{
	"name": "Queen",
	"members": 4,
	"genre": "rock"
}
{
	"name": "Genesis",
	"members": 5,
	"genre": "progressive"
}
{
	"name": "ACDC",
	"members": 5,
	"genre": "metal"
}

This is what the documents look like in the bands database now:

http://localhost:5984/bands/_all_docs?include_docs=true

{
  "total_rows": 3,
  "offset": 0,
  "rows": [
    {
      "id": "3559d9c81c785b6bfc27a3490401c28b",
      "key": "3559d9c81c785b6bfc27a3490401c28b",
      "value": {
        "rev": "1-d35ab60c0eb6780b5043b0e4c12eff13"
      },
      "doc": {
        "_id": "3559d9c81c785b6bfc27a3490401c28b",
        "_rev": "1-d35ab60c0eb6780b5043b0e4c12eff13",
        "name": "Queen",
        "members": 4,
        "genre": "rock"
      }
    },
    {
      "id": "3559d9c81c785b6bfc27a3490401d03d",
      "key": "3559d9c81c785b6bfc27a3490401d03d",
      "value": {
        "rev": "1-0b4d86c2c9dd6154f97496518f656571"
      },
      "doc": {
        "_id": "3559d9c81c785b6bfc27a3490401d03d",
        "_rev": "1-0b4d86c2c9dd6154f97496518f656571",
        "name": "Genesis",
        "members": 5,
        "genre": "progressive"
      }
    },
    {
      "id": "3559d9c81c785b6bfc27a3490401db60",
      "key": "3559d9c81c785b6bfc27a3490401db60",
      "value": {
        "rev": "1-e3c4850fd4685abe613870d1ab5d249f"
      },
      "doc": {
        "_id": "3559d9c81c785b6bfc27a3490401db60",
        "_rev": "1-e3c4850fd4685abe613870d1ab5d249f",
        "name": "ACDC",
        "members": 5,
        "genre": "metal"
      }
    }
  ]
}

Replication in Fauxton

The Fauxton UI provides a simple way to initiate data replication. Click the Replication item and fill in the source and target database options. We only have local databases so both will be local. Otherwise we would choose the Remote option and set the URL of the remote database:

One off data replication in the CouchDB Fauxton UI

We also select the Create Target option since bands_backup doesn’t exist. Click Replicate and the UI will tell us that replication has started. Unfortunately the UI won’t tell us when the process has finished but we can safely check the documents of the bands_backup database after some seconds via the HTTP API:

GET http://localhost:5984/bands_backup/_all_docs?include_docs=true

I won’t copy the response here, it’s identical to the set of documents in the bands database.

Data replication via the HTTP API

I’ll first add 2 more documents to the bands database:

{
	"name": "Nirvana",
	"members": 3,
	"genre": "grunge"
}
{
	"name": "The Jacksons",
	"members": 5,
	"genre": "pop"
}

The endpoint we need is POST http://localhost:5984/_replicate with the following JSON body:

{
	"source": "bands",
	"target": "bands_backup"
}

We can set the “create_target” flag to true if we want the target database to be created. We can also set the doc_ids array to only replicate certain documents to the target database. If either the target or source databases are remote then we include their full URLs like http://mycompany.database.local:5984/bands .

Here’s the response from the operation. I’ve removed the sequence IDs from the output for better legibility:

{
  "ok": true,
  "session_id": "a96959d0f6cf7b424407828a25ba2c9e",
  "source_last_seq": "5-g1AAAA...",
  "replication_id_version": 3,
  "history": [
    {
      "session_id": "a96959d0f6cf7b424407828a25ba2c9e",
      "start_time": "Tue, 30 May 2017 06:35:50 GMT",
      "end_time": "Tue, 30 May 2017 06:35:50 GMT",
      "start_last_seq": "3-g1AAAA...",
      "end_last_seq": "5-g1AAAA...",
      "recorded_seq": "5-g1AAAA...",
      "missing_checked": 2,
      "missing_found": 2,
      "docs_read": 2,
      "docs_written": 2,
      "doc_write_failures": 0
    },
    {
      "session_id": "557ee9481035f29302a2352ccc189a5a",
      "start_time": "Tue, 30 May 2017 05:50:23 GMT",
      "end_time": "Tue, 30 May 2017 05:50:23 GMT",
      "start_last_seq": 0,
      "end_last_seq": "3-g1AAAA...",
      "recorded_seq": "3-g1AAAA...",
      "missing_checked": 3,
      "missing_found": 3,
      "docs_read": 3,
      "docs_written": 3,
      "doc_write_failures": 0
    }
  ]
}

Here are the most important properties in the response:

  • ok: true if the replication has succeeded
  • source_last_seq: the last sequence number from source database. This sequence ID is the change ID we saw in the previous post. This sequence ID should normally be the same as “end_last_seq” and “recorded_last_seq” in the most recent element in the replication history array
  • history: an array of the replication history. The above output shows both replications we applied to the bands database
  • start and end times: the start and end dates of the replication process
  • start_last_seq: the most recent change ID from which the replication is applied
  • end_last_seq: the replication is applied up to this change ID
  • recorded_seq: the last recorded change ID, normally the same as end_last_seq
  • missing_checked: the number of missing documents that were checked in the process
  • missing_found: the number of missing documents. This will be a positive integer in case of any document insertions.
  • docs_read: the number of documents read
  • docs_written: the number of documents replicated to the target database. This will be a positive integer in case of any document insertions. Normally missing_found will be the same as docs_written.
  • doc_write_failures: the number of documents where replication has failed

We’ll now update a document in the source database. We’ll set a new member count to The Jacksons. This time we’ll do that in the Fauxton UI:

Update a document in the CouchDB UI Fauxton

Execute the same _replicate command as above. The history array was updated with an extra element:

{
      "session_id": "6c550878b1c70a9765b48764741612d0",
      "start_time": "Tue, 30 May 2017 06:57:35 GMT",
      "end_time": "Tue, 30 May 2017 06:57:35 GMT",
      "start_last_seq": "5-g1AAAA...",
      "end_last_seq": "6-g1AAAA...",
      "recorded_seq": "6-g1AAAA...",
      "missing_checked": 1,
      "missing_found": 0,
      "docs_read": 0,
      "docs_written": 0,
      "doc_write_failures": 0
    }

There are no missing documents hence the docs_written property value 0, we’re only updating an existing document. You can check the same document in the bands_backup database, it was successfully updated:

Data replication copied over update of document in CouchDB

Continuous replication

We can also easily set up continuous replication. In the UI we have to check the Continuous option:

Set up continuous replication in CouchDB Fauxton UI

…whereas in the HTTP API we would set the continuous flag to true:

{
	"source": "bands",
	"target": "bands_backup",
	"continuous": true
}

The API will respond with a 202 Accepted.

So now if we update a record with a new field in Fauxton…:

Set a band to inactive in CouchDB Fauxton UI

…then the changes will be automatically propagated to the target database:

Automatic replication succeeded in CouchDB

The replication task is also visible in Fauxton:Continuous replication task is visible in CouchDB Fauxton UI

We can also view the tasks via the HTTP API:

GET http://localhost:5984/_active_tasks

…which responds with an array of active tasks.

Note however that a CouchDB server restart wipes out the list of active tasks. All replication processes must be set up again after a server restart.

We’ll continue in the next post.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

One Response to Introduction to CouchDB with .NET part 8: data replication

  1. Pingback: CouchDB Weekly News, June 1, 2017 – CouchDB Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: