Finding the set difference between two sequences using the LINQ Except operator

Say you have the following two sequences:

string[] first = new string[] {"hello", "hi", "good evening", "good day", "good morning", "goodbye" };
string[] second = new string[] {"whatsup", "how are you", "hello", "bye", "hi"};

If you’d like to find the values that only figure in “first” then it’s easy to achieve using the LINQ Except operator:

IEnumerable<string> except = first.Except(second);
foreach (string value in except)
{
	Console.WriteLine(value);
}

The “except” variable will include “good evening”, “good day”, “good morning”, “goodbye” as “hello”, “hi” also figure in “second”.

The Except operator uses a comparer to determine whether two elements are equal. In this case .NET has a built-in default comparer to compare strings so you didn’t have to implement any special comparer. However, if you have custom objects in the two arrays then the default object reference comparer won’t be enough:

public class Singer
{
	public int Id { get; set; }
	public string FirstName { get; set; }
	public string LastName { get; set; }
}

IEnumerable<Singer> singersA = new List<Singer>() 
{
	new Singer(){Id = 1, FirstName = "Freddie", LastName = "Mercury"} 
	, new Singer(){Id = 2, FirstName = "Elvis", LastName = "Presley"}
	, new Singer(){Id = 3, FirstName = "Chuck", LastName = "Berry"}

};

IEnumerable<Singer> singersB = new List<Singer>() 
{
	new Singer(){Id = 1, FirstName = "Freddie", LastName = "Mercury"} 
	, new Singer(){Id = 2, FirstName = "Elvis", LastName = "Presley"}
	, new Singer(){Id = 4, FirstName = "Ray", LastName = "Charles"}
	, new Singer(){Id = 5, FirstName = "David", LastName = "Bowie"}
};

IEnumerable<Singer> singersDiff = singersA.Except(singersB);
foreach (Singer s in singersDiff)
{
	Console.WriteLine(s.Id);
}

The singersDiff sequence will include everything from singersA of course as each object is different as far as their references are concerned. This is where the second prototype of the operator enters the scene where you can define your own comparison function:

public class DefaultSingerComparer : IEqualityComparer<Singer>
{
	public bool Equals(Singer x, Singer y)
	{
		return x.Id == y.Id;
	}

	public int GetHashCode(Singer obj)
	{
		return obj.Id.GetHashCode();
	}
}

So we say that singerA == singerB if their IDs are equal. You can use this comparer as follows:

IEnumerable<Singer> singersDiff = singersA.Except(singersB, new DefaultSingerComparer());
foreach (Singer s in singersDiff)
{
	Console.WriteLine(s.Id);
}

singersDiff will now correctly include singer #3 only.

You can view all LINQ-related posts on this blog here.

Advertisement

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: