Extracting information from a text using Regex and Match in C# .NET
December 19, 2014 3 Comments
Occasionally you need to extract some information from a free-text form. Consider the following text:
First name: Elvis
Last name: Presley
Address: 1 Heaven Street
City: Memphis
State: TN
Zip: 12345
Say you need to extract the full name, the address, the city, the state and the zip code into a pipe-delimited string. The following function is one option:
private static string ExtractJist(string freeText) { StringBuilder patternBuilder = new StringBuilder(); patternBuilder.Append(@"First name: (?<fn>.*$)\n") .Append("Last name: (?<ln>.*$)\n") .Append("Address: (?<address>.*$)\n") .Append("City: (?<city>.*$)\n") .Append("State: (?<state>.*$)\n") .Append("Zip: (?<zip>.*$)"); Match match = Regex.Match(freeText, patternBuilder.ToString(), RegexOptions.Multiline | RegexOptions.IgnoreCase); string fullname = string.Concat(match.Groups["fn"], " ", match.Groups["ln"]); string address = match.Groups["address"].ToString(); string city = match.Groups["city"].ToString(); string state = match.Groups["state"].ToString(); string zip = match.Groups["zip"].ToString(); return string.Concat(fullname, "|", address, "|", city, "|", state, "|", zip); }
Call the function as follows:
string source = @"First name: Elvis Last name: Presley Address: 1 Heaven Street City: Memphis State: TN Zip: 12345 "; string extracted = ExtractJist(source);
View all posts related to string and text operations here.
Reblogged this on public interface IReadable { IEnumerable Read(); } and commented:
Regex has never been easy for me for some reason
It’s probably not easy for anyone for that matter, it’s so cryptic. I know a handful of them, otherwise I need to google for a solution.
What if the order of the subjects (first name, last name etc) were arbitrary?