An overview of grouping collections with LINQ in .NET

Introduction

The LINQ GroupBy operator is one of the most powerful ones among all LINQ operators. It helps us group elements in a collection in various ways and lets us control the element and result selection process. However, with its many overloads and Func parameters the GroupBy operator can be a bit difficult to understand at first. At least it is more complex than say the Min() and Max() LINQ operators.

This post will go through the GroupBy operator and many of its overloads with a number of examples.

The demo objects

Imagine that we’re analysing the elements of a HTML page loaded in a browser: scripts, stylesheets, images etc. We want to calculate various statistics such as the total size by file type. Here’s the object that represents an element on an HTML page:

public class PageComponent
{
	public string Name { get; set; }
	public string Type { get; set; }
	public string Extension { get; set; }
	public int SizeBytes { get; set; }
}

Here’s the list of components for the demo. Note how 2 scripts at the end are duplicates of 2 other entries in the list. We simulate a situation where those 2 elements were carelessly included twice on the page. It will be important a code example later on.

IEnumerable<PageComponent> pageComponents = new List<PageComponent>()
			{
				new PageComponent() { Name = "jquery.js", Type = "script", Extension = "js", SizeBytes = 12345},
				new PageComponent() { Name = "datatables.js", Type = "script", Extension = "js", SizeBytes = 2345},
				new PageComponent() { Name = "knockout.js", Type = "script", Extension = "js", SizeBytes = 70386},
				new PageComponent() { Name = "greeter.ts", Type = "script", Extension = "ts", SizeBytes = 10794},
				new PageComponent() { Name = "myscript.ts", Type = "script", Extension = "ts", SizeBytes = 86124},
				new PageComponent() { Name = "mystyle.css", Type = "stylesheet", Extension = "css", SizeBytes = 8433},
				new PageComponent() { Name = "datatables.css", Type = "stylesheet", Extension = "css", SizeBytes = 27470},
				new PageComponent() { Name = "combined.css", Type = "stylesheet", Extension = "css", SizeBytes = 24627},
				new PageComponent() { Name = "externallink.html", Type = "html", Extension = "html", SizeBytes = 72639},
				new PageComponent() { Name = "googleads.html", Type = "html", Extension = "html", SizeBytes = 15873},
				new PageComponent() { Name = "nicepic.img", Type = "image", Extension = "img", SizeBytes = 24140},
				new PageComponent() { Name = "favicon.ico", Type = "image", Extension = "ico", SizeBytes = 64152},
				new PageComponent() { Name = "daily.img", Type = "image", Extension = "img", SizeBytes = 52667},
				new PageComponent() { Name = "selfie.png", Type = "image", Extension = "png", SizeBytes = 22922},
				new PageComponent() { Name = "mybeautifulface.png", Type = "image", Extension = "png", SizeBytes = 78416},
				new PageComponent() { Name = "hello.img", Type = "image", Extension = "img", SizeBytes = 65046},

				new PageComponent() { Name = "myscript.ts", Type = "script", Extension = "ts", SizeBytes = 86124},
				new PageComponent() { Name = "mystyle.css", Type = "stylesheet", Extension = "css", SizeBytes = 8433}
			};

Group by a single key

Let’s start with the simplest application of GroupBy which accepts a key selector. It is a simple lambda expression that returns the property by which we want to group the page components. Say we want to group the elements by their type:

IEnumerable<IGrouping<string, PageComponent>> groupByTypeBasic = pageComponents.GroupBy(g => g.Type);
foreach (var group in groupByTypeBasic)
{
				
	Console.WriteLine($"Element type: {group.Key}, items in group: {group.Count()}, total size: {group.Sum(g => g.SizeBytes)}");
        Console.WriteLine("============================");
	foreach (PageComponent pc in group)
	{
		Console.WriteLine($"Element name: {pc.Name}");
	}
	Console.WriteLine("============================");
	Console.WriteLine();
				
}

The return type of GroupBy is quite long and people most often just use “var”. However, I wanted to show the actual return type so that you are aware of the IGrouping type. Each IGrouping contains a key, which is the page element type in this case. The key is a string, hence the first type parameter. The second element in IGrouping is the list of elements within that group, so we can apply the usual LINQ expressions on it such as Count and Sum. The list type is PageComponent hence the second type parameter in IGrouping.

Here’s the output:

Element type: script, items in group: 6, total size: 268118
============================
Element name: jquery.js
Element name: datatables.js
Element name: knockout.js
Element name: greeter.ts
Element name: myscript.ts
Element name: myscript.ts
============================

Element type: stylesheet, items in group: 4, total size: 68963
============================
Element name: mystyle.css
Element name: datatables.css
Element name: combined.css
Element name: mystyle.css
============================

Element type: html, items in group: 2, total size: 88512
============================
Element name: externallink.html
Element name: googleads.html
============================

Element type: image, items in group: 6, total size: 307343
============================
Element name: nicepic.img
Element name: favicon.ico
Element name: daily.img
Element name: selfie.png
Element name: mybeautifulface.png
Element name: hello.img
============================

It’s equally easy to group by the extension, just change the group key lambda:

IEnumerable<IGrouping<string, PageComponent>> groupByExtensionBasic = pageComponents.GroupBy(g => g.Extension);
foreach (var group in groupByExtensionBasic)
{				
	Console.WriteLine($"Extension: {group.Key}");
	Console.WriteLine("============================");
	foreach (PageComponent pc in group)
	{
		Console.WriteLine($"Element name: {pc.Name}");
	}
	Console.WriteLine("============================");
	Console.WriteLine();				
}

Here’s the result set:

Extension: js
============================
Element name: jquery.js
Element name: datatables.js
Element name: knockout.js
============================

Extension: ts
============================
Element name: greeter.ts
Element name: myscript.ts
Element name: myscript.ts
============================

Extension: css
============================
Element name: mystyle.css
Element name: datatables.css
Element name: combined.css
Element name: mystyle.css
============================

Extension: html
============================
Element name: externallink.html
Element name: googleads.html
============================

Extension: img
============================
Element name: nicepic.img
Element name: daily.img
Element name: hello.img
============================

Extension: ico
============================
Element name: favicon.ico
============================

Extension: png
============================
Element name: selfie.png
Element name: mybeautifulface.png
============================

Note that grouping is based on equality of the keys. The GroupBy operator will compare the various key values, like “script” and “stylesheet” and try to determine whether they are equal. Comparing primitives is simple, .NET can easily determine whether two strings or integers are equal to each other. However, we cannot rely on the built-in equality comparison for cases where the grouping key is a “real” object as we’ll see in the upcoming examples. Fortunately for us GroupBy has overloads that allow us to override the comparison logic.

More complex keys

The grouping key can be a “real” object, not just a primitive type. The key selector argument is a lambda expression and we can put an entire code block there with some logic to build up the key.

In the next example we want to group the components by their size in bytes. More specifically we want to put them in a size interval where the interval is 10000 bytes. So we want to have the interval 0-10000, then 10000-20000 and so on.

For this we’ll build a grouping key object called PageComponentSizeGrouper with the following properties:

public class PageComponentSizeGrouper
{
	public int LowerBound { get; set; }
	public int UpperBound { get; set; }

	public override string ToString()
	{
		return string.Concat(LowerBound, " - ", UpperBound);
	}
}

Our grouping key will therefore be an object where .NET cannot determine the equality itself. How could it “know” when two PageComponentSizeGrouper objects are equal? We have to declare the equality logic ourselves. Fortunately that’s a straightforward task. We need to implement the IEqualityComparer of T interface. We declare that two PageComponentSizeGrouper objects are equal if their LowerBound and UpperBound properties are equal:

public class PageComponentSizeGrouperEqualityComparer : IEqualityComparer<PageComponentSizeGrouper>
{
	public bool Equals(PageComponentSizeGrouper x, PageComponentSizeGrouper y)
	{
		return x.LowerBound == y.LowerBound && x.UpperBound == y.UpperBound;
	}

	public int GetHashCode(PageComponentSizeGrouper obj)
	{
		return obj.LowerBound * obj.UpperBound;
	}
}

Here’s the grouping example code. Note how the key selector is now an extended code block with a separate return statement. We also supply the equality comparer as the second argument:

var groupByComponentSizeRange = pageComponents.GroupBy(pe =>
{
	int groupSize = 10000;
	int lowerBound = pe.SizeBytes - pe.SizeBytes % groupSize;
	int upperBound = lowerBound + groupSize;
	return new PageComponentSizeGrouper() { LowerBound = lowerBound, UpperBound = upperBound };
}, new PageComponentSizeGrouperEqualityComparer());

foreach (var group in groupByComponentSizeRange.OrderBy(g => g.Key.LowerBound))
{
				
	Console.WriteLine($"Size range: {group.Key.ToString()}, items in group: {group.Count()}");
	Console.WriteLine("============================");
	foreach (PageComponent pc in group)
	{
		Console.WriteLine($"Element name: {pc.Name}");
	}
	Console.WriteLine("============================");
	Console.WriteLine();
				
}

Here’s the output:

Size range: 0 – 10000, items in group: 3
============================
Element name: datatables.js
Element name: mystyle.css
Element name: mystyle.css
============================

Size range: 10000 – 20000, items in group: 3
============================
Element name: jquery.js
Element name: greeter.ts
Element name: googleads.html
============================

Size range: 20000 – 30000, items in group: 4
============================
Element name: datatables.css
Element name: combined.css
Element name: nicepic.img
Element name: selfie.png
============================

Size range: 50000 – 60000, items in group: 1
============================
Element name: daily.img
============================

Size range: 60000 – 70000, items in group: 2
============================
Element name: favicon.ico
Element name: hello.img
============================

Size range: 70000 – 80000, items in group: 3
============================
Element name: knockout.js
Element name: externallink.html
Element name: mybeautifulface.png
============================

Size range: 80000 – 90000, items in group: 2
============================
Element name: myscript.ts
Element name: myscript.ts
============================

We can also group by PageComponent. Again we’ll need an equality comparer. We say that two PageComponent objects are equal if their names are equal:

public class PageElementEqualityComparer : IEqualityComparer<PageComponent>
{
	public bool Equals(PageComponent x, PageComponent y)
	{
		return x.Name.Equals(y.Name, StringComparison.InvariantCultureIgnoreCase);
	}

	public int GetHashCode(PageComponent obj)
	{
		return obj.Name.GetHashCode();
	}
}

Next we just print the name of each element how many times they occur in the list:

var duplicatesFinder = pageComponents.GroupBy(pe => pe, new PageElementEqualityComparer());
foreach (var group in duplicatesFinder)
{
	Console.WriteLine($"Element name: {group.Key.Name}, occurrence: {group.Count()}");				
}

Here’s the result:

Element name: jquery.js, occurrence: 1
Element name: datatables.js, occurrence: 1
Element name: knockout.js, occurrence: 1
Element name: greeter.ts, occurrence: 1
Element name: myscript.ts, occurrence: 2
Element name: mystyle.css, occurrence: 2
Element name: datatables.css, occurrence: 1
Element name: combined.css, occurrence: 1
Element name: externallink.html, occurrence: 1
Element name: googleads.html, occurrence: 1
Element name: nicepic.img, occurrence: 1
Element name: favicon.ico, occurrence: 1
Element name: daily.img, occurrence: 1
Element name: selfie.png, occurrence: 1
Element name: mybeautifulface.png, occurrence: 1
Element name: hello.img, occurrence: 1

We immediately see that myscript.ts and mystyle.css appear twice.

GroupBy with element selector

In the examples so far the return type in the IGrouping object was always a PageComponent. That is the default element selector, i.e. we select to return the PageComponent element from the GroupBy operator. We can override that if the return type needs to be something else.

In the next example we transform each PageComponent into an anonymous object with some pretty useless string properties but it’s perfect for demo purposes:

var groupByTypeWithElementSelector = pageComponents.GroupBy(
	pe => pe.Type, 
	pe =>
	new { NameCapitals = pe.Name.ToUpper(), ShortName = pe.Name.Substring(0, 5), CharCount = pe.Name.Count() });
foreach (var group in groupByTypeWithElementSelector)
{
				
	Console.WriteLine($"Element type: {group.Key}");
	Console.WriteLine("============================");
	foreach (var pc in group)
	{
		Console.WriteLine($"Element name capitals: {pc.NameCapitals}, short name: {pc.ShortName}, chars: {pc.CharCount}");
	}
	Console.WriteLine("============================");
	Console.WriteLine();				
				
}

Here’s the output:

Element type: script
============================
Element name capitals: JQUERY.JS, short name: jquer, chars: 9
Element name capitals: DATATABLES.JS, short name: datat, chars: 13
Element name capitals: KNOCKOUT.JS, short name: knock, chars: 11
Element name capitals: GREETER.TS, short name: greet, chars: 10
Element name capitals: MYSCRIPT.TS, short name: myscr, chars: 11
Element name capitals: MYSCRIPT.TS, short name: myscr, chars: 11
============================

Element type: stylesheet
============================
Element name capitals: MYSTYLE.CSS, short name: mysty, chars: 11
Element name capitals: DATATABLES.CSS, short name: datat, chars: 14
Element name capitals: COMBINED.CSS, short name: combi, chars: 12
Element name capitals: MYSTYLE.CSS, short name: mysty, chars: 11
============================

Element type: html
============================
Element name capitals: EXTERNALLINK.HTML, short name: exter, chars: 17
Element name capitals: GOOGLEADS.HTML, short name: googl, chars: 14
============================

Element type: image
============================
Element name capitals: NICEPIC.IMG, short name: nicep, chars: 11
Element name capitals: FAVICON.ICO, short name: favic, chars: 11
Element name capitals: DAILY.IMG, short name: daily, chars: 9
Element name capitals: SELFIE.PNG, short name: selfi, chars: 10
Element name capitals: MYBEAUTIFULFACE.PNG, short name: mybea, chars: 19
Element name capitals: HELLO.IMG, short name: hello, chars: 9
============================

Note that we return an anonymous object from the key selector. However, we’re equally free to return a “normal” object if we wish to do so. We just need to have an object with those properties, e.g. TransformedPageComponentName and type “new TransformedPageComponentName() {…}” instead.

GroupBy with a result selector

The GroupBy examples so far all return an enumerable of IGrouping objects. That is the default result selector. However, even the result selector can be overridden by a Func. This Func accepts a grouping key and the list of elements within the group and returns another object of some type.

We return to the page size grouping example. This time we want to find the following:

  • The item count in each interval
  • The max element size in each interval
  • The min element size in each interval

Here’s the implementation:

var groupByComponentSizeWithResultSelector = pageComponents.GroupBy(pe =>
{
	int groupSize = 10000;
	int lowerBound = pe.SizeBytes - pe.SizeBytes % groupSize;
	int upperBound = lowerBound + groupSize;
	return new PageComponentSizeGrouper() { LowerBound = lowerBound, UpperBound = upperBound };
}, 
(sizeGroup, pageElements) => 
	new
	{
		ElementCountInGroup = pageElements.Count(),
		MinElementSize = pageElements.Min(pe => pe.SizeBytes),
		MaxElementSize = pageElements.Max(pe => pe.SizeBytes),
		GroupingKeyOrderer = sizeGroup.LowerBound,
		GroupingKeyToString = sizeGroup.ToString()				
	}
, new PageComponentSizeGrouperEqualityComparer());

foreach (var group in groupByComponentSizeWithResultSelector.OrderBy(g => g.GroupingKeyOrderer))
{	
					
	Console.WriteLine($"Size range: {group.GroupingKeyToString}");
	Console.WriteLine($"Item count in group: { group.ElementCountInGroup}");
	Console.WriteLine($"Min element size: {group.MinElementSize}");
	Console.WriteLine($"Max element size: {group.MaxElementSize}");
	Console.WriteLine("============================");
	Console.WriteLine();			
}

Note the elements for the result selector:

  • sizeGroup: this is the grouping key and will be of type PageComponentSizeGrouper. Hence we can access its properties and functions like LowerBound and ToString in the return type
  • pageElements: this is the page element list within the group so we can perform any LINQ operator on it

Here’s the output:

Size range: 0 – 10000
Item count in group: 3
Min element size: 2345
Max element size: 8433
============================

Size range: 10000 – 20000
Item count in group: 3
Min element size: 10794
Max element size: 15873
============================

Size range: 20000 – 30000
Item count in group: 4
Min element size: 22922
Max element size: 27470
============================

Size range: 50000 – 60000
Item count in group: 1
Min element size: 52667
Max element size: 52667
============================

Size range: 60000 – 70000
Item count in group: 2
Min element size: 64152
Max element size: 65046
============================

Size range: 70000 – 80000
Item count in group: 3
Min element size: 70386
Max element size: 78416
============================

Size range: 80000 – 90000
Item count in group: 2
Min element size: 86124
Max element size: 86124
============================

Group by with key and result selector

We’ll now extend the above example and extend it a little. We’ll override both the element and result selectors. This gets us to the most extended GroupBy overload where we supply a grouping key, a key selector, a result selector and an equality comparer for the grouping key. We’ll extend the previous result set with the name of the element with the min and max size. For the key selector we return the item name and a modified size where we simply deduct 1000 bytes from the original size:

var groupByComponentSizeWithResultAndElementSelector = pageComponents.GroupBy(pe =>
{
	int groupSize = 10000;
	int lowerBound = pe.SizeBytes - pe.SizeBytes % groupSize;
	int upperBound = lowerBound + groupSize;
	return new PageComponentSizeGrouper() { LowerBound = lowerBound, UpperBound = upperBound };
}, pe =>
	new
	{
		ElementName = pe.Name,
		ModifiedSize = pe.SizeBytes - 1000
	},
(sizeGroup, pageElements) =>
{
	int minElementSize = pageElements.Min(pe => pe.ModifiedSize);
	int maxElementSize = pageElements.Max(pe => pe.ModifiedSize);
	string minElementName = (from pe in pageElements where pe.ModifiedSize == minElementSize select pe.ElementName).FirstOrDefault();
	string maxElementName = (from pe in pageElements where pe.ModifiedSize == maxElementSize select pe.ElementName).FirstOrDefault();
	return new
	{
		ElementCountInGroup = pageElements.Count(),
		MinElementSize = minElementSize,
		MaxElementSize = maxElementSize,
		MinElementName = minElementName,
		MaxElementName = maxElementName,
		GroupingKeyOrderer = sizeGroup.LowerBound,
		GroupingKeyToString = sizeGroup.ToString()
	};
}
, new PageComponentSizeGrouperEqualityComparer());

foreach (var group in groupByComponentSizeWithResultAndElementSelector.OrderBy(g => g.GroupingKeyOrderer))
{						
	Console.WriteLine($"Size range: {group.GroupingKeyToString}");
	Console.WriteLine($"Item count in group: { group.ElementCountInGroup}");
	Console.WriteLine($"Min element size: {group.MinElementSize}");
	Console.WriteLine($"Max element size: {group.MaxElementSize}");
	Console.WriteLine($"Min element name: {group.MinElementName}");
	Console.WriteLine($"Max element name: {group.MaxElementName}");
	Console.WriteLine("============================");
	Console.WriteLine();				
}

Now the pageElements list for the result selector will be the elements returned in the key selector, i.e. the anonymous object with the properties ElementName and ModifiedSize. Note how we can access those properties in the result selector. We have no access to the original PageComponent objects here since we overrode the key selector.

Here’s the output:

Size range: 0 – 10000
Item count in group: 3
Min element size: 1345
Max element size: 7433
Min element name: datatables.js
Max element name: mystyle.css
============================

Size range: 10000 – 20000
Item count in group: 3
Min element size: 9794
Max element size: 14873
Min element name: greeter.ts
Max element name: googleads.html
============================

Size range: 20000 – 30000
Item count in group: 4
Min element size: 21922
Max element size: 26470
Min element name: selfie.png
Max element name: datatables.css
============================

Size range: 50000 – 60000
Item count in group: 1
Min element size: 51667
Max element size: 51667
Min element name: daily.img
Max element name: daily.img
============================

Size range: 60000 – 70000
Item count in group: 2
Min element size: 63152
Max element size: 64046
Min element name: favicon.ico
Max element name: hello.img
============================

Size range: 70000 – 80000
Item count in group: 3
Min element size: 69386
Max element size: 77416
Min element name: knockout.js
Max element name: mybeautifulface.png
============================

Size range: 80000 – 90000
Item count in group: 2
Min element size: 85124
Max element size: 85124
Min element name: myscript.ts
Max element name: myscript.ts
============================

We’ve successfully transformed the original PageComponent list to something utterly different. GroupBy is a truly powerful LINQ operator for object mappings, transformations and even statistical calculations.

You can view all LINQ-related posts on this blog here.

Advertisement

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

One Response to An overview of grouping collections with LINQ in .NET

  1. luckyure says:

    Andras, just wanted to say this is a beautiful example, well done sir.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: