Determine if two strings are anagrams with C# .NET

Two strings are anagrams if they are made up of the same set of characters. Examples:

  • “hello” and “loleh”
  • “123123” and “312312”
  • “qwerty” and “wretqy”

The degree of “anagram-ness” can vary:

  • ignore case?
  • ignore non-numeric characters?
  • ignore whitespace?

In this post we’ll only consider word-characters only and the comparison will be case-insensitive to make the problem more interesting.

We’ll write a function that accepts two integers and returns a boolean: true if the strings are anagrams, otherwise false.

We’ll look at two solutions out of many that exist out there:

  • using a character map
  • using string sort

What is a character map? It is a map where the key is of type char and the value if of type integer. We collect the unique characters of a string and count how many times each character occurs in the string. E.g.:

CharCount
‘f’2
‘g’1
‘i’2
‘d’1
‘o’1

We do that for both strings and compare the counts of each unique character.

Let’s start with a skeleton:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace Algorithms
{
    public class Anagram
    {
        public bool AreAnagrams(string first, string second)
        {
            if (first == null || second == null) return false;
            return AreAnagramsWithCharMap(first, second);  
            //return AreAnagramsWithSortedStrings(first, second);          
        }

        private bool AreAnagramsWithCharMap(string first, string second)
        {
            return false;
        }

        private bool AreAnagramsWithSortedStrings(string first, string second)
        {
            return false;
        }

        private Dictionary<char, int> MakeCharacterMap(string input)
        {
            string cleaned = CleanInput(input);
            return cleaned.Distinct().ToDictionary(c => c, c => cleaned.Count(s => s == c));
        }

        private string CleanInput(string input)
        {
            return Regex.Replace(input, @"[_]+|[^\w]+|[\d-]+", "").ToLower();
        }
    }
}

We start by some null-checking and return false if either of the two inputs is null. The AreAnagramsWithCharMap function has been wired up but it’s not yet implemented. The function for the second solution AreAnagramsWithSortedStrings has also been prepared.

We have two private functions as well:

  • CleanInput
    • this one takes a string and strips all underscores, white space and non-word characters from it and returns the lower-case version of it
  • MakeCharacterMap
    • first we clean the incoming input string
    • second we use a couple of LINQ operators to build the character map
      • Distinct() to gather the unique characters from the string
      • ToDictionary() to build the map itself, the key will be the character itself and for the value we count the number of occurrences of that character in the string

Let’s look at the implementation of AreAnagramsWithCharMap:

private bool AreAnagramsWithCharMap(string first, string second)
{
    var charMapFirst = MakeCharacterMap(first);
    var charMapSecond = MakeCharacterMap(second);
    if (charMapFirst.Count != charMapSecond.Count)
    {
        return false;
    }
    return charMapFirst.All(kvp => charMapSecond.ContainsKey(kvp.Key) ? kvp.Value == charMapSecond[kvp.Key] : false);
}

We first create the two character maps. If they differ in size then we can immediately return false. It means that one of the strings has at least one more character than the other so it’s pointless to continue.

Otherwise we make use of the All LINQ operator which return true of all the elements of a collection fulfil a certain condition. The condition is based on two parameters:

  • whether the character map contains the character as the key in the first place
  • whether that character occurs with the same frequency as in the source map

If both conditions are fulfilled for all characters in the character maps then we return true.

Here’s a set of unit tests:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace Algorithms.Tests
{
    [TestClass]
    public class AnagramTests
    {
        [TestMethod]
        public void AreAnagramsTests()
        {
            Anagram a = new Anagram();
            Assert.IsTrue(a.AreAnagrams("hello", "___ hllOe!! 456 ???"));
            Assert.IsFalse(a.AreAnagrams("sdfs", null));
            Assert.IsFalse(a.AreAnagrams(null, "sdfs"));
            Assert.IsFalse(a.AreAnagrams(null, null));
            Assert.IsFalse(a.AreAnagrams("qwerty", "yewr"));
            Assert.IsFalse(a.AreAnagrams("qwerty", "qwertyuiop"));
            Assert.IsTrue(a.AreAnagrams("? par**lIame%%nt !", "partIAL men"));
            Assert.IsTrue(a.AreAnagrams("a gentleman", "elegant man"));
        }
    }
}

The will all pass.

The second solution takes the two cleaned strings, puts them in order and compares them. This solution is slightly less performant than the first one due to the string ordering though. The map comparison doesn’t involve any ordering so it’s somewhat quicker.

Here’s the implemented function:

private bool AreAnagramsWithSortedStrings(string first, string second)
{
    string sortedOne = string.Concat(CleanInput(first).OrderBy(c => c));
    string sortedTwo = string.Concat(CleanInput(second).OrderBy(c => c));            
    return sortedOne == sortedTwo;
}

We again clean the input string and then call the OrderBy LINQ operator. It returns an ordered collection of characters from the string, i.e. not an ordered string. Hence we need to embed this bit of code in string.Concat so that we build the string again from the characters. Finally we simply compare the two strings.

Wire up this function from the main one and rerun the unit tests. They will still pass.

public bool AreAnagrams(string first, string second)
{
    if (first == null || second == null) return false;
    //return AreAnagramsWithCharMap(first, second);
    return AreAnagramsWithSortedStrings(first, second);
}

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

HarsH ReaLiTy

A Good Blog is Hard to Find

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: