An example of grouping and joining collections in .NET: calculate total scores by student and subject
November 7, 2016 Leave a comment
Imagine that we have two collections. First we have a student collection with IDs and names. Then we also have a collection that holds the scores the students got in various subjects on several occasions. This latter collection also holds a reference to a student by the student ID. The goal is to join the two collections and calculate the total score of each student by subject.
There are various ways to solve this problem. The goal of this post is to show an example of using the LINQ GroupBy and GroupJoin operators to build an object with the information we need.
The demo objects
Here’s the Student object:
public class Student { public int Age { get; internal set; } public int Id { get; set; } public string Name { get; set; } }
…and this is our StudentScore class:
public class StudentScore { public int Points { get; set; } public int StudentId { get; set; } public string Subject { get; set; } }
Note the StudentId property in StudentScore. We’ll use that to join the Student and StudentScore collections later on.
The collections
We have 3 students:
IEnumerable<Student> students = new List<Student>() { new Student() { Id = 1, Name = "John", Age = 13}, new Student() { Id = 2, Name = "Mary", Age = 12}, new Student() { Id = 3, Name = "Anne", Age = 14} };
…and 24 student scores, 8 for each student across three different subjects, Maths, English and Biology:
IEnumerable<StudentScore> studentScores = new List<StudentScore>() { new StudentScore() { StudentId = 1, Subject = "Maths", Points = 54}, new StudentScore() { StudentId = 1, Subject = "Maths", Points = 32}, new StudentScore() { StudentId = 1, Subject = "Maths", Points = 45}, new StudentScore() { StudentId = 1, Subject = "English", Points = 55}, new StudentScore() { StudentId = 1, Subject = "English", Points = 54}, new StudentScore() { StudentId = 1, Subject = "Biology", Points = 32}, new StudentScore() { StudentId = 1, Subject = "Biology", Points = 27}, new StudentScore() { StudentId = 1, Subject = "Biology", Points = 52}, new StudentScore() { StudentId = 2, Subject = "Maths", Points = 44}, new StudentScore() { StudentId = 2, Subject = "Maths", Points = 37}, new StudentScore() { StudentId = 2, Subject = "Maths", Points = 49}, new StudentScore() { StudentId = 2, Subject = "English", Points = 59}, new StudentScore() { StudentId = 2, Subject = "English", Points = 64}, new StudentScore() { StudentId = 2, Subject = "Biology", Points = 42}, new StudentScore() { StudentId = 2, Subject = "Biology", Points = 67}, new StudentScore() { StudentId = 2, Subject = "Biology", Points = 50}, new StudentScore() { StudentId = 3, Subject = "Maths", Points = 53}, new StudentScore() { StudentId = 3, Subject = "Maths", Points = 72}, new StudentScore() { StudentId = 3, Subject = "Maths", Points = 48}, new StudentScore() { StudentId = 3, Subject = "English", Points = 54}, new StudentScore() { StudentId = 3, Subject = "English", Points = 59}, new StudentScore() { StudentId = 3, Subject = "Biology", Points = 32}, new StudentScore() { StudentId = 3, Subject = "Biology", Points = 87}, new StudentScore() { StudentId = 3, Subject = "Biology", Points = 34} };
Step 1: grouping across all subjects
We’ll complete the exercise in three steps. We’ll first see how to group the student scores by subject name using the GroupBy operator. We’ll calculate the total score across all subjects regardless of the student ID. This is of course not very useful but it demonstrates the simplest usage of the GroupBy operator:
IEnumerable<IGrouping<string, StudentScore>> subjectGroupings = studentScores .GroupBy(score => score.Subject); foreach (var subjectGroup in subjectGroupings) { Console.WriteLine($"Subject: {subjectGroup.Key}"); Console.WriteLine($"Total points across all students: {subjectGroup.Sum(g => g.Points)}"); }
In its simplest form GroupBy accepts the property to group by. We want to group the scores collection by the subject names. Normally people use “var” for the GroupBy return type as it is a bit long but I wanted you to see the actual return type. It is a collection of IGrouping objects where the first type parameter is the grouping key type. As we loop through the grouping collection we can retrieve the group key and the StudentScore objects that belong to that key. Since each key will have a collection of objects we can perform the usual LINQ operators on them, like Sum in the above case to calculate the total score for the subject.
Here’s the printout:
Subject: Maths
Total points across all students: 434
Subject: English
Total points across all students: 345
Subject: Biology
Total points across all students: 423
Step 2: joining students and scores
In this step we’ll join the two collections on the student ID and calculate the total score of each student across all subjects. Again, this is not very useful as we add the Maths, English and Biology scores together but we’ll demonstrate the simplest usage of the GroupJoin operator.
GroupJoin operates on a collection, like students and can join it with another collection, like student scores on a common join key, such as the student ID. Students will be the outer collection which is joined by the inner collection student scores. We also have the corresponding join keys from both collections: the outer join key Student.Id and the inner join key StudentScore.StudentId. GroupJoin also accepts a result selector function which accepts two parameters and returns a third. The first in-parameter is of type outer object, Student in our case. The second in-parameter is an IEnumerable of the inner object type, i.e. StudentScore. The return type can be anonymous or a “real” object. In the below code example we take the scores of each student and add them up using the Sum operator on the Points property of StudentScore:
var totalStudentScores = students.GroupJoin(studentScores , student => student.Id, score => score.StudentId, (student, scores) => new { Id = student.Id, StudentName = student.Name, TotalScore = scores.Sum(s => s.Points) } ); foreach (var student in totalStudentScores) { Console.WriteLine($"ID: {student.Id}, Name: {student.StudentName}, Total score: {student.TotalScore} "); }
The return type is an anonymous object hence I had to use “var” here. Here are the ingredients of the GroupJoin function:
- students: the outer collection for the join
- studentScores: the inner collection for the join
- student: the object key outer selector where we select its ID property
- score: the object for the inner selector where we select its StudentId property
- lastly we have the result selector function which accepts a student and the scores collection and returns the student ID, the student name and the total number of points across all subjects
Here’s the output:
ID: 1, Name: John, Total score: 351
ID: 2, Name: Mary, Total score: 412
ID: 3, Name: Anne, Total score: 439
Step 3: putting GroupBy and GroupJoin together
The last step is to put the above concepts together in the same LINQ statement. The solution below involves extending the result selector function to multiple statements. You’ll see the same grouping statement as above but only for a specific student ID. Once we have the groups we build a Dictionary object of string and int where the key is the subject name and the value is the total score for that subject. Finally we return an object with the student ID, the student name and the scores dictionary. For the print statement I used the wildly popular JSON.NET library for easier string formatting. The example also demonstrates that it’s perfectly fine to have a code block with some internal logic in the result selector:
var groupedStudentScores = students.GroupJoin(studentScores , student => student.Id, score => score.StudentId, (student, scores) => { var subjectGroups = scores.Where(score => score.StudentId == student.Id) .GroupBy(score => score.Subject); Dictionary<string, int> scoresDictionary = new Dictionary<string, int>(); foreach (var subjectGroup in subjectGroups) { string subjectName = subjectGroup.Key; int subjectScoreTotal = subjectGroup.Sum(s => s.Points); scoresDictionary[subjectName] = subjectScoreTotal; } return new { Id = student.Id, StudentName = student.Name, Scores = scoresDictionary }; } ); foreach (var student in groupedStudentScores) { Console.WriteLine($"ID: {student.Id}, Name: {student.StudentName}, Total score by subject: {JsonConvert.SerializeObject(student.Scores)} "); }
Here’s the output:
ID: 1, Name: John, Total score by subject: {“Maths”:131,”English”:109,”Biology”:111}
ID: 2, Name: Mary, Total score by subject: {“Maths”:130,”English”:123,”Biology”:159}
ID: 3, Name: Anne, Total score by subject: {“Maths”:173,”English”:113,”Biology”:153}
You can view all LINQ-related posts on this blog here.