Introduction to MongoDb with .NET part 27: aggregation in the .NET driver
May 12, 2016 2 Comments
Introduction
In the previous post we looked at various other aggregation operators in MongoDb. With $first and $last we can find the highest and lowest member in a sorted collection. If you need to find e.g. the city with the highest population or the country with the lowest number of pets per capita then these operators will probably figure in the aggregation pipeline. The $unwind stage operator blows up the members of an array so that we can run a query on those members in an easier way. Without $unwind it’s difficult to access the individual members of an array. Note that $unwind can dramatically increase the result set of documents.
We’ve now gone through the most important aggregation operators in MongoDb. It’s time to turn our attention to how aggregations can be written in the .NET driver.
Starting point
We’ll run an aggregation on the zipcodes collection through the .NET driver. As a reminder here’s the structure of a zipcode document:
{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
We want to extract those states whose populations exceed 5 million people, sort the resulting documents by state and take the first 5 documents. Here’s the query in plain JavaScript:
db.zipcodes.aggregate([ {$group: { "_id" : "$state", "population" : {$sum : "$pop"}}}, {$match : {population : {$gte : 5000000}}}, {$sort: {"_id" : 1}}, {$limit : 5}])
Here’s the result set:
{ "_id" : "CA", "population" : 29754890 } { "_id" : "FL", "population" : 12686644 } { "_id" : "GA", "population" : 6478216 } { "_id" : "IL", "population" : 11427576 } { "_id" : "IN", "population" : 5544136 }
Our goal in this section of the tutorial is to describe the above JavaScript query in C# in various ways.
The Aggregate method
We’ll look at various ways to write aggregations in the .NET driver. Note that not all aggregation stages are supported by a dedicated function. E.g. the $redact stage which we haven’t discussed in this series at all has no equivalent built into the driver.
The gateway into aggregations in the .NET driver is the Aggregate function which comes in 2 forms.
The first one requires no compulsory parameters at all, only an optionL AggregateOptions parameter. The function opens up a fluent API where we can append various pipeline stages. The most generic function to append a stage to a pipeline is called AppendStage. We can add just any type of stage to the pipeline with that.
We also need to get acquainted with the MongoDb object called BsonDocument. I’ve been avoiding this object up to now in this series. It’s a generic object to represent any type of MongoDb document. If there’s a document for which we have no strongly typed object representation in our code we can always turn to BsonDocument. In fact we could have gone through this tutorial only with BsonDocuments instead of building strongly typed objects like ZipCodeDb and RestaurantDb. I’m a great fan of strong typing with all its benefits so I wouldn’t encourage you to describe your documents with BsonDocuments.
However, BsonDocument can be handy with aggregations since it’s easier to describe what we want in the various pipeline stages with it rather than having strongly typed objects for each stage. Let’s go step by step and try one aggregation type at a time. We’ll go for the sorting stage using Aggregate, AppendStage and BsonDocument:
ModelContext modelContext = ModelContext.Create(new ConfigFileConfigurationRepository(), new AppConfigConnectionStringRepository()); var res = modelContext.ZipCodes.Aggregate() .AppendStage<BsonDocument>(new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }).ToList(); foreach (var item in res) { var nameList = item.Names.ToList(); foreach (string name in nameList) { BsonValue bsonValue = item.GetValue(name); Console.WriteLine(string.Format("{0}: {1}", name, bsonValue)); } }
As you type “modelContext.ZipCodes.Aggregate().” in Visual Studio you’ll see that IntelliSense offers a wide range of dedicated pipeline stages like Limit, Match etc. Let’s not rush ahead and instead start with AppendStage since it enables us to add any kind of aggregation to the pipeline.
We can see the BsonDocument object in action in the above code. The following bit of code…:
new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }
…is the loosely typed way of building the following in JavaScript:
{$sort: {"_id" : 1}}
“AppendStage” means that we want to retrieve the result set as a list of BsonDocument objects. This is to demonstrate how we can extract the properties and values out of a BsonDocument. Each BsonDocument item in the list will have a Name property which represents the key, or the “column” if you like. We can take that name and extract the corresponding value out of it.
Here’s the output of the first 3 items:
_id: 01001 city: AGAWAM loc: [-72.622739, 42.070206] pop: 15338 state: MA _id: 01002 city: CUSHMAN loc: [-72.51565, 42.377017] pop: 36963 state: MA _id: 01005 city: BARRE loc: [-72.108354, 42.409698] pop: 4546 state: MA
You’ll recognise the properties from our zip codes collection.
Since $sort doesn’t affect the document structure we can cast these objects into the strongly typed ZipCodeDb objects.
Here’s the modified code:
var stronglyTypedRes = modelContext.ZipCodes.Aggregate() .AppendStage<ZipCodeDb>(new BsonDocument { { "$sort", new BsonDocument("_id", 1) } }).ToList(); foreach (var item in stronglyTypedRes) { Console.WriteLine(item); }
Here’s the output of the first 3 items:
{ "Id": "01001", "City": "AGAWAM", "Coordinates": [ -72.622739, 42.070206 ], "Population": 15338, "State": "MA" } { "Id": "01002", "City": "CUSHMAN", "Coordinates": [ -72.51565, 42.377017 ], "Population": 36963, "State": "MA" } { "Id": "01005", "City": "BARRE", "Coordinates": [ -72.108354, 42.409698 ], "Population": 4546, "State": "MA" }
We’ll keep exploring the aggregations in the .NET driver in the next post.
You can view all posts related to data storage on this blog here.
hi, how to manage more fields into _id?
let’s say i want to write this aggregate query, how to write this condition _id: {userKey: “$userKey”, appId:”$appId”} ?
db.mycollection.aggregate(
[
{ $match : { tipo:”sometype” } } ,
{$group: {
_id: {userKey: “$userKey”, appId:”$appId”} ,
uniqueIds: {$addToSet: “$_id”},
count: {$sum: 1}
}
},
{$match: {
count: {“$gt”: 1}
}
}
]);
I’m confused about where ModelContext comes from. I’ve only been reading the tutorial from the discussion of mongodb aggregations. Did you talk about this in an earlier post?