Finding all href values in a HTML string with C# .NET

Say you’d like to collect all link URLs in a HTML text. E.g.:

     <a href=\"\">Visit fantasticsite!</a>
     <a href=\"\">Read the news</a>

The goal is to find “; and “;. Using an XML parser could be a solution if the HTML code is well formatted XML. This is of course not always the case so the dreaded regular expressions provide a viable alternative.

The following code uses a Regex to find those sections in the input text that match a regular expression:

static void Main(string[] args)
	string input = "<html><p><a href=\"\">Visit fantasticsite!</a></p><div><a href=\"\">Read the news</a></div></html>";
	Console.WriteLine("Main done...");

private static void FindHrefs(string input)
	Regex regex = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))", RegexOptions.IgnoreCase);
	Match match;
	for (match = regex.Match(input); match.Success; match = match.NextMatch())
		Console.WriteLine("Found a href. Groups: ");
		foreach (Group group in match.Groups)
			Console.WriteLine("Group value: {0}", group);


This gives the following output:

FindHrefs Regexp in action

View all posts related to string and text operations here.


About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

8 Responses to Finding all href values in a HTML string with C# .NET

  1. rsp says:

    Nice, Thanks Andras.

  2. Subhash PM says:

    Good Article… Thanks.. 🙂

  3. mathewpoc says:

    Excellent work. I appreciate it.

  4. Faniel Joseph Selvaraj says:

    Thanks Andras… it was very much useful…Thank you…

  5. SURESH says:

    Thanks Andras. its very useful.

  6. Vilas Meshram says:

    I am able to retrive href value by using your code. but how to retrieve “path” value? I tried to replace “href” with “path” but not getting proper value. How to replace “%20” with “” and “%2c” with “,” using regular expression using C#?

    I need expected result As : ddn/SpecialDeals/Lists/SpecialDeals/7803_.000/Bolar, Suni – to file.pdf

    from result = href=”/ddn/SpecialDeals/_layouts/QuestSoftware/ItemHandler.ashx?path=/ddn/SpecialDeals/Lists/SpecialDeals/7803_.000/Bolar%2c%20Suni%20-%20to%20file.pdf”

    Origional string:

    Bolar, Suni – to file.pdf

    Thanks in Advance Andras

  7. Sororfortuna says:

    Sorry about reviving this post if that’s a problem, I am currently having problems on the following line: Console.WriteLine(“Group value: {0}”, group);
    How would one be able to just get the path without the ‘ href=”” ‘?

  8. Abdul Rashid says:

    Perfect and very useful. Thanks.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog


Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: