Introduction to MongoDb with .NET part 27: aggregation in the .NET driver

Introduction

In the previous post we looked at various other aggregation operators in MongoDb. With $first and $last we can find the highest and lowest member in a sorted collection. If you need to find e.g. the city with the highest population or the country with the lowest number of pets per capita then these operators will probably figure in the aggregation pipeline. The $unwind stage operator blows up the members of an array so that we can run a query on those members in an easier way. Without $unwind it’s difficult to access the individual members of an array. Note that $unwind can dramatically increase the result set of documents.

We’ve now gone through the most important aggregation operators in MongoDb. It’s time to turn our attention to how aggregations can be written in the .NET driver.

Starting point

We’ll run an aggregation on the zipcodes collection through the .NET driver. As a reminder here’s the structure of a zipcode document:

{
        "_id" : "01001",
        "city" : "AGAWAM",
        "loc" : [
                -72.622739,
                42.070206
        ],
        "pop" : 15338,
        "state" : "MA"
}

We want to extract those states whose populations exceed 5 million people, sort the resulting documents by state and take the first 5 documents. Here’s the query in plain JavaScript:

db.zipcodes.aggregate([
{$group: { "_id" : "$state", "population" : {$sum : "$pop"}}}, 
{$match : {population : {$gte : 5000000}}}, 
{$sort: {"_id" : 1}}, 
{$limit : 5}])

Here’s the result set:

{ "_id" : "CA", "population" : 29754890 }
{ "_id" : "FL", "population" : 12686644 }
{ "_id" : "GA", "population" : 6478216 }
{ "_id" : "IL", "population" : 11427576 }
{ "_id" : "IN", "population" : 5544136 }

Our goal in this section of the tutorial is to describe the above JavaScript query in C# in various ways.

The Aggregate method

We’ll look at various ways to write aggregations in the .NET driver. Note that not all aggregation stages are supported by a dedicated function. E.g. the $redact stage which we haven’t discussed in this series at all has no equivalent built into the driver.

The gateway into aggregations in the .NET driver is the Aggregate function which comes in 2 forms.

The first one requires no compulsory parameters at all, only an optionL AggregateOptions parameter. The function opens up a fluent API where we can append various pipeline stages. The most generic function to append a stage to a pipeline is called AppendStage. We can add just any type of stage to the pipeline with that.

We also need to get acquainted with the MongoDb object called BsonDocument. I’ve been avoiding this object up to now in this series. It’s a generic object to represent any type of MongoDb document. If there’s a document for which we have no strongly typed object representation in our code we can always turn to BsonDocument. In fact we could have gone through this tutorial only with BsonDocuments instead of building strongly typed objects like ZipCodeDb and RestaurantDb. I’m a great fan of strong typing with all its benefits so I wouldn’t encourage you to describe your documents with BsonDocuments.

However, BsonDocument can be handy with aggregations since it’s easier to describe what we want in the various pipeline stages with it rather than having strongly typed objects for each stage. Let’s go step by step and try one aggregation type at a time. We’ll go for the sorting stage using Aggregate, AppendStage and BsonDocument:

ModelContext modelContext = ModelContext.Create(new ConfigFileConfigurationRepository(), new AppConfigConnectionStringRepository());
var res = modelContext.ZipCodes.Aggregate()
	.AppendStage<BsonDocument>(new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }).ToList();
foreach (var item in res)
{
	var nameList = item.Names.ToList();
	foreach (string name in nameList)
	{
		BsonValue bsonValue = item.GetValue(name);
		Console.WriteLine(string.Format("{0}: {1}", name, bsonValue));
	}
}

As you type “modelContext.ZipCodes.Aggregate().” in Visual Studio you’ll see that IntelliSense offers a wide range of dedicated pipeline stages like Limit, Match etc. Let’s not rush ahead and instead start with AppendStage since it enables us to add any kind of aggregation to the pipeline.

We can see the BsonDocument object in action in the above code. The following bit of code…:

new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }

…is the loosely typed way of building the following in JavaScript:

{$sort: {"_id" : 1}}

“AppendStage” means that we want to retrieve the result set as a list of BsonDocument objects. This is to demonstrate how we can extract the properties and values out of a BsonDocument. Each BsonDocument item in the list will have a Name property which represents the key, or the “column” if you like. We can take that name and extract the corresponding value out of it.

Here’s the output of the first 3 items:

_id: 01001
city: AGAWAM
loc: [-72.622739, 42.070206]
pop: 15338
state: MA
_id: 01002
city: CUSHMAN
loc: [-72.51565, 42.377017]
pop: 36963
state: MA
_id: 01005
city: BARRE
loc: [-72.108354, 42.409698]
pop: 4546
state: MA

You’ll recognise the properties from our zip codes collection.

Since $sort doesn’t affect the document structure we can cast these objects into the strongly typed ZipCodeDb objects.

Here’s the modified code:

var stronglyTypedRes = modelContext.ZipCodes.Aggregate()
				.AppendStage<ZipCodeDb>(new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }).ToList();
foreach (var item in stronglyTypedRes)
{
	Console.WriteLine(item);
}

Here’s the output of the first 3 items:

{
  "Id": "01001",
  "City": "AGAWAM",
  "Coordinates": [
    -72.622739,
    42.070206
  ],
  "Population": 15338,
  "State": "MA"
}
{
  "Id": "01002",
  "City": "CUSHMAN",
  "Coordinates": [
    -72.51565,
    42.377017
  ],
  "Population": 36963,
  "State": "MA"
}
{
  "Id": "01005",
  "City": "BARRE",
  "Coordinates": [
    -72.108354,
    42.409698
  ],
  "Population": 4546,
  "State": "MA"
}

We’ll keep exploring the aggregations in the .NET driver in the next post.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

iReadable { }

.NET Tips & Tricks

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: