Finding all href values in a HTML string with C# .NET

Say you’d like to collect all link URLs in a HTML text. E.g.:

<html>
   <p>
     <a href=\"http://www.fantasticsite.com\">Visit fantasticsite!</a>
   </p>
   <div>
     <a href=\"http://www.cnn.com\">Read the news</a>
   </div>
</html>

The goal is to find “http://www.fantasticsite.com&#8221; and “http://www.cnn.com&#8221;. Using an XML parser could be a solution if the HTML code is well formatted XML. This is of course not always the case so the dreaded regular expressions provide a viable alternative.

The following code uses a Regex to find those sections in the input text that match a regular expression:

static void Main(string[] args)
{
	string input = "<html><p><a href=\"http://www.fantasticsite.com\">Visit fantasticsite!</a></p><div><a href=\"http://www.cnn.com\">Read the news</a></div></html>";
	FindHrefs(input);
	Console.WriteLine("Main done...");
	Console.ReadKey();
}

private static void FindHrefs(string input)
{
	Regex regex = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))", RegexOptions.IgnoreCase);
	Match match;
	for (match = regex.Match(input); match.Success; match = match.NextMatch())
	{
		Console.WriteLine("Found a href. Groups: ");
		foreach (Group group in match.Groups)
		{
			Console.WriteLine("Group value: {0}", group);
		}				
	}

}

This gives the following output:

FindHrefs Regexp in action

View all posts related to string and text operations here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

4 Responses to Finding all href values in a HTML string with C# .NET

  1. rsp says:

    Nice, Thanks Andras.

  2. Subhash PM says:

    Good Article… Thanks.. 🙂

  3. mathewpoc says:

    Excellent work. I appreciate it.

  4. Faniel Joseph Selvaraj says:

    Thanks Andras… it was very much useful…Thank you…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

My goal with this blog is to offend everyone in the world at least once with my words… so no one has a reason to have a heightened sense of themselves. We are all ignorant, we are all found wanting, we are all bad people sometimes.

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: