Finding the set difference between two sequences using the LINQ Except operator
March 10, 2017 Leave a comment
Say you have the following two sequences:
string[] first = new string[] {"hello", "hi", "good evening", "good day", "good morning", "goodbye" }; string[] second = new string[] {"whatsup", "how are you", "hello", "bye", "hi"};
If you’d like to find the values that only figure in “first” then it’s easy to achieve using the LINQ Except operator:
IEnumerable<string> except = first.Except(second); foreach (string value in except) { Console.WriteLine(value); }
The “except” variable will include “good evening”, “good day”, “good morning”, “goodbye” as “hello”, “hi” also figure in “second”.
The Except operator uses a comparer to determine whether two elements are equal. In this case .NET has a built-in default comparer to compare strings so you didn’t have to implement any special comparer. However, if you have custom objects in the two arrays then the default object reference comparer won’t be enough:
public class Singer { public int Id { get; set; } public string FirstName { get; set; } public string LastName { get; set; } } IEnumerable<Singer> singersA = new List<Singer>() { new Singer(){Id = 1, FirstName = "Freddie", LastName = "Mercury"} , new Singer(){Id = 2, FirstName = "Elvis", LastName = "Presley"} , new Singer(){Id = 3, FirstName = "Chuck", LastName = "Berry"} }; IEnumerable<Singer> singersB = new List<Singer>() { new Singer(){Id = 1, FirstName = "Freddie", LastName = "Mercury"} , new Singer(){Id = 2, FirstName = "Elvis", LastName = "Presley"} , new Singer(){Id = 4, FirstName = "Ray", LastName = "Charles"} , new Singer(){Id = 5, FirstName = "David", LastName = "Bowie"} }; IEnumerable<Singer> singersDiff = singersA.Except(singersB); foreach (Singer s in singersDiff) { Console.WriteLine(s.Id); }
The singersDiff sequence will include everything from singersA of course as each object is different as far as their references are concerned. This is where the second prototype of the operator enters the scene where you can define your own comparison function:
public class DefaultSingerComparer : IEqualityComparer<Singer> { public bool Equals(Singer x, Singer y) { return x.Id == y.Id; } public int GetHashCode(Singer obj) { return obj.Id.GetHashCode(); } }
So we say that singerA == singerB if their IDs are equal. You can use this comparer as follows:
IEnumerable<Singer> singersDiff = singersA.Except(singersB, new DefaultSingerComparer()); foreach (Singer s in singersDiff) { Console.WriteLine(s.Id); }
singersDiff will now correctly include singer #3 only.
You can view all LINQ-related posts on this blog here.