Inspecting the drives on the local PC using C# .NET

It’s very simple to enumerate the drives on a Windows computer. The DriveInfo object has a GetDrives static method which returns an array of DriveInfo objects. A DriveInfo describes a drive through its properties. It is similar to the FileInfo class which in turn describes a file.

There are some properties that can only be extracted if the drive is ready, e.g. a CD-Rom drive. The following method prints the available drives:

DriveInfo[] drives = DriveInfo.GetDrives();
foreach (DriveInfo drive in drives)
{
	Console.WriteLine("Drive name: {0}", drive.Name);
	if (drive.IsReady)
	{
		Console.WriteLine("Total size: {0}", drive.TotalSize);
		Console.WriteLine("Total free space: {0}", drive.TotalFreeSpace);
		Console.WriteLine("Available free space: {0}", drive.AvailableFreeSpace);
		Console.WriteLine("Drive format: {0}", drive.DriveFormat);
	}
	else
	{
		Console.WriteLine("Device {0} is not ready.", drive.Name);									
	}
	Console.WriteLine("Drive type: {0}", drive.DriveType);	
}

I got the following output:

Drives enumerated

DriveType can have the following values:

  • CDRom: an optical drive such as a CD-ROM, DVD etc.
  • Fixed: a fixed disk, quite often labelled as “C:\”
  • Network: a mapped drive
  • NoRootDirectory: a drive with no root directory
  • Ram: a RAM drive
  • Removable: a removable drive such as a pen-drive
  • Unknown: a drive whose type could not be determined

Read all posts dedicated to file I/O here.

Using Amazon S3 with the AWS.NET API Part 1: introduction

Introduction

Cloud-based blob storage solutions are abound and Amazon Web Services (AWS) is the leader – or one of the leaders – in that area. Amazon S3 (Simple Storage Service) provides a “secure, durable, highly-scalable object storage” solution, as it is stated on the homepage. You can use S3 to store just about any type of file: images, text files, videos, JAR files, HTML pages etc. All files are stored in a key-value map, i.e. each file has a key where the file itself is the value.

Purpose of S3

S3 is often used to store static components of web pages such as images or videos. S3 can be integrated with other Amazon components such as RedShift and Elastic MapReduce. It can also be used to transfer large amounts of data from one component to another.

However, in this series we’ll be concentrating on a slightly different but very specific usage: saving, deleting and checking for the existence of text based storage files. S3 can function as an important building block in a Big Data analysis system where a data mining application can pull the raw data from an S3 bucket, i.e. a container of files.

In S3 you can organise your files into pseudo-folders. I wrote “pseudo” as they are not real folders like the ones we create on Windows. The are rather used as visual containers so that you can organise your files in a meaningful way instead of putting all of them under the same bucket. Examples:

s3:\\sales\january-2015\monthly-sales.txt
s3:\\sales\february-2015\monthly-sales.txt

“sales” is the top bucket, “january-2015” is folder and then we have the file itself. You are free to create subfolders within each folder and subfolders within the subfolders etc.

Keep in mind that S3 is not used for updates though. Once you’ve uploaded a file to S3 then it cannot be updated in a one-step operation. Even if you want to edit a text file there’s no editor for it. You’ll need to delete the old file and upload a new one instead.

Amazon have also done a great job at providing SDKs for a range of platforms, like .NET, Java, Python etc. Programmatic access to S3’s services is equally available.

As with many other Amazon components don’t assume that S3 can only be used with other Amazon components. Any software capable of executing HTTP calls can use to access S3: Web API, .NET MVC, iOS, Java desktop apps, Windows services, you name it. Mixed architecture is quickly becoming the norm nowadays and S3 can be used both as part of a mixed solution or as an Amazon-only architecture.

Goals of this series

The goals of this series are two-fold:

  • Provide basic UI and programmatic knowledge to anyone looking for a fast, cheap, reliable and scalable cloud-based blob storage solution
  • Tie in with the previous series about another Amazon service called Kinesis which provides a reliable message queue service as entry point to a Big Data data mining system. We’ll take up S3 as an alternative for raw data storage towards the end of the series. The series on Kinesis ended where we stored the raw data on a local text file – we’ll replace it with S3 based storage

I’ll try to keep the two goals as separate as possible so that all readers with different motivations can follow through – those of you who are only interested in S3 as well as those that will see the greater picture.

I’ll assume that you have at least a test account of Amazon Web Services including the necessary access keys: an Amazon Access Key and a Secret Access Key. You’ll need to sign up with Amazon and then sign up for S3 within Amazon. You can create a free account on the S3 home page using the Create Free Account link button:

S3 create free account link button

Amazon has a lot of great documentation online. Should you get stuck you’ll almost always find an answer there. Don’t be afraid to ask in the comments section below.

In this post we’ll take an easy start and go through some of the visual aspects of S3.

S3 GUI

Log onto AWS and locate the S3 service link:

S3 link on Amazon UI

Before we go anywhere it’s important to mention regions in Amazon. A region in Amazon Web Services is the geographical location of the data centre where your selected service will be located. E.g. if you create a cloud-based server with Amazon EC2 in the region called US East then the server will be set up North Virginia. It doesn’t mean that your home page deployed on that server won’t be reachable anywhere else, it only means the physical location of the service. So if you’re expecting the bulk of your customers to come from Japan then it’s wise to set up the first web server in the region called Asia Pacific (Tokyo). Also, if you set up a service in e.g. EU (Ireland) on the AWS UI, then log out and log in again, your service may not be visible at first. A good guess is that you need to select the correct region. The region is indicated in the URL, e.g.:

https://console.aws.amazon.com/s3/home?region=eu-west-1#

…where “eu-west-1” stands for EU (Ireland). You can normally select the region in the top right hand corner of the Amazon UI where you’ll see the user-friendly names of the regions.

Regions are important for virtually all the components in AWS. Take a wild guess, which component is an exception. Check out the top right hand corner of the S3 screen:

No regions for Amazon S3

So regions don’t play the same role here as in other components in the AWS product offering. However, the selected region can still be used to “optimize for latency, minimize costs, or address regulatory requirements” as it says in the Create a Bucket windows we’ll soon see.

You’ll see that by default the screen will show all the top buckets:

S3 All buckets screen

Let’s try a couple of things to get our hands dirty. Click on the Create Bucket button. Give the bucket a name, select the nearest region to your location and press Create. We’ll skip logging right now:

Creating first bucket Amazon S3

The bucket is quickly created:

Created first bucket in Amazon S3

On the right hand side of the screen you can set various properties of the bucket such as logging and security:

Set properties of bucket in Amazon S3

We won’t go into them at all otherwise this series will lose its scope. The default values are usually fine for most purposes. If you ever need to modify the settings, especially those that have to do with permissions, then consult the AWS documentation of S3.

Click on the name of the bucket and you’ll see that it’s empty:

First bucket is empty in Amazon S3

Click the Upload button and click Add Files:

Upload a file in Amazon S3

Select some file on your hard drive, preferably a text file as they are easier to open. Click “Start Upload” and the file upload progress should appear on the right hand side:

Text file uploaded to bucket in Amazon S3

Click the file name to select it. The Actions drop-down will list several options available for a file such as Open, Download or Delete which are self-explanatory.

Now click the “Create Folder” button and give the folder a name:

Created folder in Amazon S3 bucket

You can click the folder name to open its contents – much like you do it with a double-click on the Windows file system. You can then upload different files in that folder and create subfolders.

This is enough for starters. We’ll start looking into some basic operations in code in the next post.

View all posts related to Amazon Web Services and Big Data here.

Setting the file access rule of a file with C# .NET

When creating a new file you can set the access control rule for it in code. There are a couple of objects to build the puzzle.

The FileInfo class, which describes a file in a directory, has a SetAccessControl method which accepts a FileSecurity object. The FileSecurity object has an AddAccessRule method where you can pass in a FileSystemAccessRule object. The FileSystemAccessRule object has 4 overloads, 2 of which accept an IdentityReference abstract class. One of the implementations of IdentityReference is SecurityIdentifier. SecurityIdentifier in turn has 4 overloads where the last one is probably the most straightforward to use.

  • WellKnownSidType: an enumeration listing the commonly used security identifiers
  • A domainSid of type SecurityIdentifier: this can most often be ignored. Check out the MSDN link above to see which WellKnownSidType enumeration values require this

The following method will set the access control to “Everyone”, which is represented by WellKnownSidType.WorldSid. “Everyone” will have full control over the file indicated by FileSystemRights.FullControl and AccessControlType.Allow in the FileSystemAccessRule constructor:

FileInfo fi = new FileInfo(@"C:\myfile.txt");
if (!fi.Exists)
{
	File.Create(fi.FullName);
}
SecurityIdentifier userAccount = new SecurityIdentifier(WellKnownSidType.WorldSid, null);
FileSecurity fileAcl = new FileSecurity();
fileAcl.AddAccessRule(new FileSystemAccessRule(userAccount, FileSystemRights.FullControl, AccessControlType.Allow));
fi.SetAccessControl(fileAcl);

You can easily check the result by viewing the properties of the file:

File access control set to Everyone full control

Read all posts dedicated to file I/O here.

Reading a text file using a specific encoding in C# .NET

In this post we saw how to save a text file specifying an encoding in the StreamWriter constructor. You can indicate the encoding type when you read a file with StreamWriter’s sibling StreamReader. Normally you don’t need to worry about specifying the code page when reading files. .NET will automatically understand most encoding types when reading files.

Here’s an example how you can read a file with a specific encoding type:

string filename = string.Concat(@"c:\file-utf-seven.txt");
StreamWriter streamWriter = new StreamWriter(filename, false, Encoding.UTF7);
streamWriter.WriteLine("I am feeling great.");
streamWriter.Close();

using (StreamReader reader = new StreamReader(filename, Encoding.UTF7))
{
	Console.WriteLine(reader.ReadToEnd());
}

Read all posts dedicated to file I/O here.

Using Amazon Kinesis with the AWS.NET API Part 6: storage

Introduction

In the previous post we added some validation to our demo message handling application. Validation adds some sanity checks to our logic so that bogus inputs are discarded.

In this post, which will be the last in the series on Amazon Kinesis, we’ll be looking at storage. We’ll save the data on disk, which in itself is not too interesting, but we’ll also discuss some formats that are suitable for further processing.

Formats

It is seldom that we’re saving data just to fill up a data store. This is true in our case as well. We’re getting the messages from the Kinesis stream and we’ll be soon saving them. However, we’ll certainly want to perform some actions on the data, such as data aggregations:

  • Calculate the average response time for http://www.bbc.co.uk/africa between 12:15 to 12:30 on 13 January 2015 for users with Firefox 11
  • Calculate max response time in week 45 2014 for the domain cnn.com for users located in Seattle
  • Calculate the 99th percentile of the response time for http://www.twitter.com for February 2014

…etc. Regardless of where you’re planning to save the data, such as a traditional relational DB like MS SQL or a NoSql DB such as MongoDb, you’ll need to plan on the storage format i.e. what tables, collections, columns and datatypes you’ll need. As the next Amazon component we’ll take up on this blog is the blob storage S3 we’ll be concentrating on storing the raw data points in a text file. At first this may seem like a very bad idea but S3 is a very efficient, durable and scalable storage. However, don’t assume that this is a must for your Big Data system to work, you can save your data the way you want. Here we’re just paving the way for the next step.

As mentioned before in this series I have another, higher-level set of posts dedicated to Amazon architecture available here. I took up a similar topic there about message formats, I’ll re-use some of those explanations below.

The format will most likely depend on the mechanism that will eventually pull data from the raw data store. Data mining and analysis solutions such as Amazon RedShift or Elastic MapReduce (EMR) – which we’ll take up later on – will all need to work with the raw data. So at this stage you’ll need to do some forward thinking:

  • A: What mechanism will need to read from the raw data store for aggregation?
  • B: How can we easily – or relatively easily – read the raw data visually by just opening a raw data file?

B is important for debugging purposes if you want to verify the calculations. It’s also important if some customer is interested in viewing the raw data for some time period. For B you might want to store the raw data as it is, i.e. as JSON. E.g. you can have a text file with the following data points:

{"CustomerId": "abc123", "DateUnixMs": 1416603010000, "Activity": "buy", "DurationMs": 43253}
{"CustomerId": "abc123", "DateUnixMs": 1416603020000, "Activity": "buy", "DurationMs": 53253}
{"CustomerId": "abc123", "DateUnixMs": 1416603030000, "Activity": "buy", "DurationMs": 63253}
{"CustomerId": "abc123", "DateUnixMs": 1416603040000, "Activity": "buy", "DurationMs": 73253}

…i.e. with one data point per line.

However, this format is not really suitable for point A above. Other mechanisms will have a hard time understanding this data format. For RedShift and EMR to work most efficiently we’ll need to store the raw data in some delimited fields such as CSV or tab delimited fields. So the above data points will then be stored as follows in a tab-delimited file:

abc123     1416603010000    buy    43253
abc123     1416603020000    buy    53253
abc123     1416603030000    buy    63253
abc123     1416603040000    buy    73253

This is probably OK for point B above as well. It’s not too hard on your eyes to understand this data structure so we’ll settle for that. You might ask why we didn’t select some other delimiter, such as a pipe ‘|’ or a comma ‘,’. The answer is that our demo system is based on URLs and URLs can have pipes and commas in them making them difficult to split. Tabs will work better but you are free to choose whatever fits your system best.

Implementation

This time we’ll hide the implementation of the storage mechanism behind an interface. It will be a forward-looking solution where we’ll be able to easily switch between the concrete implementations. Open the demo C# application we’ve been working on so far and locate the WebTransaction object in the AmazonKinesisConsumer application. We’ll add a method to create a tab-delimited string out of its properties:

public string ToTabDelimitedString()
{
	StringBuilder sb = new StringBuilder();
	sb.Append(CustomerName)
		.Append("\t")
		.Append(Url)
		.Append("\t")
		.Append(WebMethod)
		.Append("\t")
		.Append(ResponseTimeMs)
		.Append("\t")
		.Append(UtcDateUnixMs);
	return sb.ToString();
}

Create a text file on your hard drive, like c:\raw-data\storage.txt. Add the following interface to AmazonKinesisConsumer:

public interface IRawDataStorage
{
	void Save(IEnumerable<WebTransaction> webTransactions);
}

…and also the following file based implementation:

public class FileBasedDataStorage : IRawDataStorage
{
	private readonly FileInfo _fileName;

	public FileBasedDataStorage(string fileFullPath)
	{
		if (string.IsNullOrEmpty(fileFullPath)) throw new ArgumentNullException("File full path");
		_fileName = new FileInfo(fileFullPath);
		if (!_fileName.Exists)
		{
			throw new ArgumentException(string.Concat("Provided file path ", fileFullPath, " does not exist."));
		}			
	}
		
	public void Save(IEnumerable<WebTransaction> webTransactions)
	{
		StringBuilder stringBuilder = new StringBuilder();
		foreach (WebTransaction wt in webTransactions)
		{
			stringBuilder.Append(wt.ToTabDelimitedString()).Append(Environment.NewLine);
		}

		using (StreamWriter sw = File.AppendText(_fileName.FullName))
		{
			sw.Write(stringBuilder.ToString());
		}
	}
}

The implementation of the Save method should be quite straightforward. We build a string with the tab delimited representation of the WebTransaction object which is then appended to the source file.

Here comes the updated ReadFromStream() method:

private static void ReadFromStream()
{
	IRawDataStorage rawDataStorage = new FileBasedDataStorage(@"c:\raw-data\storage.txt");
	AmazonKinesisConfig config = new AmazonKinesisConfig();
	config.RegionEndpoint = Amazon.RegionEndpoint.EUWest1;
	AmazonKinesisClient kinesisClient = new AmazonKinesisClient(config);
	String kinesisStreamName = ConfigurationManager.AppSettings["KinesisStreamName"];

	DescribeStreamRequest describeRequest = new DescribeStreamRequest();
	describeRequest.StreamName = kinesisStreamName;

	DescribeStreamResponse describeResponse = kinesisClient.DescribeStream(describeRequest);
	List<Shard> shards = describeResponse.StreamDescription.Shards;

	foreach (Shard shard in shards)
	{
		GetShardIteratorRequest iteratorRequest = new GetShardIteratorRequest();
		iteratorRequest.StreamName = kinesisStreamName;
		iteratorRequest.ShardId = shard.ShardId;
		iteratorRequest.ShardIteratorType = ShardIteratorType.TRIM_HORIZON;

		GetShardIteratorResponse iteratorResponse = kinesisClient.GetShardIterator(iteratorRequest);
		string iteratorId = iteratorResponse.ShardIterator;

		while (!string.IsNullOrEmpty(iteratorId))
		{
			GetRecordsRequest getRequest = new GetRecordsRequest();
			getRequest.Limit = 1000;
			getRequest.ShardIterator = iteratorId;

			GetRecordsResponse getResponse = kinesisClient.GetRecords(getRequest);
			string nextIterator = getResponse.NextShardIterator;
			List<Record> records = getResponse.Records;

			if (records.Count > 0)
			{
				Console.WriteLine("Received {0} records. ", records.Count);
				List<WebTransaction> newWebTransactions = new List<WebTransaction>();
				foreach (Record record in records)
				{
					string json = Encoding.UTF8.GetString(record.Data.ToArray());
					try
					{
						JToken token = JContainer.Parse(json);
						try
						{									
							WebTransaction wt = JsonConvert.DeserializeObject<WebTransaction>(json);
							List<string> validationErrors = wt.Validate();
							if (!validationErrors.Any())
							{
								Console.WriteLine("Valid entity: {0}", json);
								newWebTransactions.Add(wt);
							}
							else
							{
								StringBuilder exceptionBuilder = new StringBuilder();
								exceptionBuilder.Append("Invalid WebTransaction object from JSON: ")
									.Append(Environment.NewLine).Append(json)
									.Append(Environment.NewLine).Append("Validation errors: ")
									.Append(Environment.NewLine);
								foreach (string error in validationErrors)
								{
									exceptionBuilder.Append(error).Append(Environment.NewLine);																										
								}
								Console.WriteLine(exceptionBuilder.ToString());
							}									
						}
						catch (Exception ex)
						{
							//simulate logging
							Console.WriteLine("Could not parse the following message to a WebTransaction object: {0}", json);
						}
					}
					catch (Exception ex)
					{
						//simulate logging
						Console.WriteLine("Could not parse the following message, invalid json: {0}", json);
					}
				}

				if (newWebTransactions.Any())
				{
					try
					{
						rawDataStorage.Save(newWebTransactions);
						Console.WriteLine("Saved all new web transactions to the data store.");
					}
					catch (Exception ex)
					{
						Console.WriteLine("Failed to save the web transactions to file: {0}", ex.Message);
					}
				}
			}

			iteratorId = nextIterator;
		}
	}
}

Run both the consumer and producer applications and send a couple of web transactions to Kinesis. You should end up with the tab delimited observations in the storage file. In my case I have the following:

yahoo http://www.yahoo.com GET 432 1417556120657
google http://www.google.com POST 532 1417556133322
bbc http://www.bbc.co.uk GET 543 1417556148276
twitter http://www.twitter.com GET 623 1417556264008
wiki http://www.wikipedia.org POST 864 1417556302529
facebook http://www.facebook.com DELETE 820 1417556319381

This concludes our discussion of Amazon Kinesis. We’ve also set the path for the next series where we’ll be looking into Amazon S3. If you’re interested in a full Big Data chain using cloud-based Amazon components then you’re more than welcome to read on.

View all posts related to Amazon Web Services and Big Data here.

Java 8 Date and time API: the LocalDateTime class

Introduction

In this post we saw how to represent dates on the level of days, such as 2014-10-05 using the LocalDate class. This post discussed the usage of LocalTime to show the point of time within the 24-hr clock, such as 11:45:43.

LocalDate has no concept of time units below the day level. LocalTime has no concept of time above the level of hours. However, what if you need to represent the date as 2014-10-05 11:45:43, i.e. with both the day and time sections? You can turn to the aptly named LocalDateTime class which marries LocalTime and LocalDate.

The usage of LocalDateTime is very similar to both LocalDate and LocalTime. You can quickly read through the posts referenced above for further information. Most date-related methods are common for LocalDate, LocalTime and LocalDateTime.

LocalDateTime

You can get the current local date-time as follows:

LocalDateTime now = LocalDateTime.now();

This will get the current date according to the default time zone of your computer.

You can construct a new LocalDateTime instance using the various static “of” methods, e.g.

LocalDateTime someDateInPast = LocalDateTime.of(2014, Month.MAY, 23, 10, 23, 43);

You can add/subtract some units of time using the “plus” and “minus” methods. The “until” method will find the time span between the two time points in the provided unit of measurement:

LocalDateTime later = now.plusMinutes(321);
long until = now.until(later, ChronoUnit.MINUTES);

“until” will be 321 minutes as expected.

We saw that in the case of LocalDate and LocalTime not all enumeration types of ChronoUnit are supported which is due to the allowed level of granularity. LocalDateTime allows for all values in the enumeration, from nanoseconds to eras – defined as 1 billion years in Java 8, i.e. you can measure the difference between two LocalDateTime instances in terms of nanoseconds ranging to eras – as long as the “long” type supports them which might not be the case with nanoseconds given a large enough time range.

The isAfter and isBefore methods work as the method names imply:

LocalDateTime now = LocalDateTime.now();
LocalDateTime someDateInPast = LocalDateTime.of(2014, Month.MAY, 23, 10, 23, 43);
boolean before = now.isBefore(later);
boolean after = now.isAfter(later);

“before” will be true and “after” will be false as expected.

You can extract the LocalDate and LocalTime portions of LocalDateTime using the toLocalDate and toLocalTime methods:

LocalDate toLocalDate = now.toLocalDate();
LocalTime toLocalTime = now.toLocalTime();

You can extract the various portions of the LocalDateTime instance using the various “get” methods, such as:

LocalDateTime someDateInPast = LocalDateTime.of(2014, Month.MAY, 23, 10, 23, 43);
DayOfWeek dayOfWeek = someDateInPast.getDayOfWeek();
int dayOfYear = someDateInPast.getDayOfYear();
int year = someDateInPast.getYear();

The returned values are “FRIDAY”, 143 – i.e. the date in someDateInPast was the 143rd day in the year of 2014 -, and 2014 respectively.

View all posts related to Java here.

Java 8 Date and time API: the LocalTime class

Introduction

In this post we saw how to handle local date values to the level of days with the LocalDate object. A typical point in time handled through this object is e.g. 2014-03-02. There’s no concept of hours and minutes in that object.

LocalTime

The “time of day” equivalent of LocalDate is LocalTime and its usage is very similar. I recommend you read through the post referenced above as many methods, like the “plus” and “minus” ones still apply in the same form. LocalTime will have no concept of days, months and years. You can use this class if e.g. some of your logic depends on the time of day every day, regardless of the calendar day.

Here’s how you can find the current time of day:

LocalTime now = LocalTime.now();

This will find the current time in the default time zone of your computer.

You can also create a time using the “of” static method. You’ll set the time to 5:32am as follows:

LocalTime early = LocalTime.of(5, 32);

You can add/subtract some units of time using the “plus” and “minus” methods. The “until” method will find the difference between the two time points in the provided unit of measurement:

LocalTime now = LocalTime.now();
LocalTime later = now.plusHours(2);
long until = now.until(later, ChronoUnit.MINUTES);

“until” will be 120 as there are 120 minutes from “now” until “now + 2 hrs” of course. However, if you run this code at e.g. 23:30 in your time zone then “until” will be a negative value as 23:30 plus 2 hrs is 01:30. There’s no “next day” in LocalTime so “until” in that case will be -1320 which is the same as -22 hrs.

Only those ChronoUnit enumerations are valid that make sense for the LocalTime class: Minutes, hours, seconds, etc., anything under the level of days. If you’re not sure then you can check if the ChronoUnit is supported using the isSupported method:

boolean supported = now.isSupported(ChronoUnit.CENTURIES);

The above code will yield “false”.

The isAfter and isBefore methods work as the method names imply:

LocalTime now = LocalTime.now();
LocalTime later = now.plusMinutes(10);
boolean before = now.isBefore(later);
boolean after = now.isAfter(later);

However, be careful with the return values. Just like above, it depends on when during the day you run this code so don’t assume that “before” will always be true and “after” will always be false in the above example. If you run this code at 23:58 then the return values will be the exact opposite as 23:58 + 10 minutes = 00:08 which will be before 23:58 and 23:58 comes after 00:08.

You can use the overridden “compareTo” method in a similar manner – it will return -1, 0 or 1 depending on which side of the comparison comes first – but again the result will depend on the exact timing.

In the next post we’ll look at the LocalDateTime class.

View all posts related to Java here.

Saving a text file using a specific encoding in C# .NET

The StreamWriter object constructor lets you indicate the encoding type when writing to a text file. The following method shows how simple it is:

private static void SaveFile(Encoding encoding)
{
	Console.WriteLine("Encoding: {0}", encoding.EncodingName);
	string filename = string.Concat(@"c:\file-", encoding.EncodingName, ".txt");
	StreamWriter streamWriter = new StreamWriter(filename, false, encoding);
	streamWriter.WriteLine("I am feeling great.");
	streamWriter.Close();
}

We saw in this post how to get hold of a specific code page. We also saw that if you only use characters in the ASCII range, i.e. positions 0-127 then most encoding types will handle the string in a uniform way.

Call the above method like this:

SaveFile(Encoding.UTF7);
SaveFile(Encoding.UTF8);
SaveFile(Encoding.Unicode);
SaveFile(Encoding.UTF32);

So we’ll have 4 files at the end each named after the encoding type. Depending on the supported code pages on your PC Notepad may or may not be able to handle the encoding types. Notepad should not have any problem with UTF8 and UTF16. The UTF7 file will probably look OK, whereas UTF32 will most likely look strange. In my case the UTF32 file content looked like this:

I a m f e e l i n g g r e a t .

…i.e. with some bonus white-space in between the characters. Notepad was not able to correctly read UTF32.

The default encoding type is UTF-16 which will suffice in most situations. If you’re unsure then select this code page.

Providing an encoding type which cannot handle certain characters will result in replacement characters to be shown. If we change the string to be saved to “öåä I am feeling great.” and call the SaveFile method like

SaveFile(Encoding.ASCII);

…then you’ll see the following content in Notepad:

??? I am feeling great. ASCII could not handle the Swedish characters öåä and replaced them with question marks.

Read all posts dedicated to file I/O here.

Using Amazon Kinesis with the AWS.NET API Part 5: validation

Introduction

In the previous post we got as far as having a simple but functioning messaging system. The producer and client apps are both console based and the message handler is the ready-to-use Amazon Kinesis. We have a system that we can built upon and scale up as the message load increases. Kinesis streams can be scaled to handle virtually unlimited amounts of messages.

This post on Kinesis will discuss message validation.

You’ll need to handle the incoming messages from the stream. Normally they should follow the specified format, such as JSON or XML with the predefined property names and casing. However, this is not always guaranteed as Kinesis does not itself validate any incoming message. Also, your system might be subject to fake data. So you’ll almost always need to have some message validation in place and log messages that cannot be processed or are somehow invalid.

Open the demo application we’ve been working on so far and let’s get to it.

Validation

We ended up with the following bit of code in AmazonKinesisConsumer:

if (records.Count > 0)
{
	Console.WriteLine("Received {0} records. ", records.Count);
	foreach (Record record in records)
	{
		string json = Encoding.UTF8.GetString(record.Data.ToArray());
		Console.WriteLine("Json string: " + json);
	}
}

We’ll build up the new code step by step and present the new version of the ReadFromStream() method at the end.

Our first task is to check if “json” is in fact valid JSON. There’s no dedicated method for that in JSON.NET so we’ll just see if the string can be parsed into a generic JToken:

string json = Encoding.UTF8.GetString(record.Data.ToArray());
try
{
        JToken token = JContainer.Parse(json);
}
catch (Exception ex)
{
        //simulate logging
	Console.WriteLine("Could not parse the following message, invalid json: {0}", json);
}

Normally every message that cannot be parsed should be logged and analysed. Here we just print the unparseable message to the console. If you’re interested in logging you can check out the posts on this blog here and here.

Next we want to parse the JSON into a WebTransaction object:

try
{
	JToken token = JContainer.Parse(json);
        try
	{
		WebTransaction wt = JsonConvert.DeserializeObject<WebTransaction>(json);
	}
	catch (Exception ex)
	{
		//simulate logging
		Console.WriteLine("Could not parse the following message to a WebTransaction object: {0}", json);
	}
}
catch (Exception ex)
{
	//simulate logging
	Console.WriteLine("Could not parse the following message, invalid json: {0}", json);
}

Next we can perform some validation on the object itself. We’ll make up some arbitrary rules:

  • The web method can only be one of the following: GET, POST, PUT, HEAD, DELETE, OPTIONS, TRACE, CONNECT
  • Acceptable range for response times: 0-30000 ms, probably not wide enough, but it’s OK for now
  • We only accept valid URLs using a validator function I’ve found here. It might not be perfect but at least we can filter out useless inputs like “this is spam” or “you’ve been hacked”

We’ll add the validation rules to WebTransaction.cs of the AmazonKinesisConsumer app:

public class WebTransaction
{
	private string[] _validMethods = { "get", "post", "put", "delete", "head", "options", "trace", "connect" };
	private int _minResponseTimeMs = 0;
	private int _maxResponseTimeMs = 30000;

        public long UtcDateUnixMs { get; set; }
	public string CustomerName { get; set; }
	public string Url { get; set; }
	public string WebMethod { get; set; }
	public int ResponseTimeMs { get; set; }

	public List<string> Validate()
	{
		List<string> brokenRules = new List<string>();
		if (!IsWebMethodValid())
		{
			brokenRules.Add(string.Format("Invalid web method: {0}", WebMethod));
		}
		if (!IsResponseTimeValid())
		{
			brokenRules.Add(string.Format("Response time outside acceptable limits: {0}", ResponseTimeMs));
		}
		if (!IsValidUrl())
		{
			brokenRules.Add(string.Format("Invalid URL: {0}", Url));
		}
		return brokenRules;
	}

	private bool IsWebMethodValid()
	{
		return _validMethods.Contains(WebMethod.ToLower());
	}

	private bool IsResponseTimeValid()
	{
		if (ResponseTimeMs < _minResponseTimeMs
			|| ResponseTimeMs > _maxResponseTimeMs)
		{
			return false;
		}
        	return true;
	}

	private bool IsValidUrl()
	{
		Uri uri;
		string urlToValidate = Url;
		if (!urlToValidate.Contains(Uri.SchemeDelimiter)) urlToValidate = string.Concat(Uri.UriSchemeHttp, Uri.SchemeDelimiter, urlToValidate);
		if (Uri.TryCreate(urlToValidate, UriKind.RelativeOrAbsolute, out uri))
		{
			try
			{
				if (Dns.GetHostAddresses(uri.DnsSafeHost).Length > 0)
				{
					return true;
				}
			}
			catch
			{
				return false;
			}
		}

		return false; 
	}

}

The Validate method will collect all validation errors. IsWebMethodValid() and IsResponseTimeValid() should be quite straightforward. If you don’t understand the IsValidUrl function check out the StackOverflow link referred to above.

We can use the Validate method from within the ReadFromStream() method as follows:

List<WebTransaction> newWebTransactions = new List<WebTransaction>();
foreach (Record record in records)
{
	string json = Encoding.UTF8.GetString(record.Data.ToArray());
	try
	{
        	JToken token = JContainer.Parse(json);
		try
		{									
			WebTransaction wt = JsonConvert.DeserializeObject<WebTransaction>(json);
          		List<string> validationErrors = wt.Validate();
			if (!validationErrors.Any())
			{
				Console.WriteLine("Valid entity: {0}", json);
				newWebTransactions.Add(wt);
			}
			else
			{
				StringBuilder exceptionBuilder = new StringBuilder();
				exceptionBuilder.Append("Invalid WebTransaction object from JSON: ")
				.Append(Environment.NewLine).Append(json)
				.Append(Environment.NewLine).Append("Validation errors: ")
				.Append(Environment.NewLine);
				foreach (string error in validationErrors)
				{
					exceptionBuilder.Append(error).Append(Environment.NewLine);																										
				}
				Console.WriteLine(exceptionBuilder.ToString());
			}									
		}
        	catch (Exception ex)
		{
			//simulate logging
			Console.WriteLine("Could not parse the following message to a WebTransaction object: {0}", json);
		}
	}
	catch (Exception ex)
	{
		//simulate logging
		Console.WriteLine("Could not parse the following message, invalid json: {0}", json);
	}
}

As you can see we’re also collecting all valid WebTransaction objects into a list. That’s a preparation for the next post where we’ll store the valid objects on disk.

Here’s the current version of the ReadFromStream method:

private static void ReadFromStream()
{
	AmazonKinesisConfig config = new AmazonKinesisConfig();
	config.RegionEndpoint = Amazon.RegionEndpoint.EUWest1;
	AmazonKinesisClient kinesisClient = new AmazonKinesisClient(config);
	String kinesisStreamName = ConfigurationManager.AppSettings["KinesisStreamName"];

	DescribeStreamRequest describeRequest = new DescribeStreamRequest();
	describeRequest.StreamName = kinesisStreamName;

	DescribeStreamResponse describeResponse = kinesisClient.DescribeStream(describeRequest);
	List<Shard> shards = describeResponse.StreamDescription.Shards;

	foreach (Shard shard in shards)
	{
		GetShardIteratorRequest iteratorRequest = new GetShardIteratorRequest();
		iteratorRequest.StreamName = kinesisStreamName;
		iteratorRequest.ShardId = shard.ShardId;
		iteratorRequest.ShardIteratorType = ShardIteratorType.TRIM_HORIZON;

		GetShardIteratorResponse iteratorResponse = kinesisClient.GetShardIterator(iteratorRequest);
		string iteratorId = iteratorResponse.ShardIterator;

		while (!string.IsNullOrEmpty(iteratorId))
		{
			GetRecordsRequest getRequest = new GetRecordsRequest();
			getRequest.Limit = 1000;
			getRequest.ShardIterator = iteratorId;

			GetRecordsResponse getResponse = kinesisClient.GetRecords(getRequest);
			string nextIterator = getResponse.NextShardIterator;
			List<Record> records = getResponse.Records;

			if (records.Count > 0)
			{
				Console.WriteLine("Received {0} records. ", records.Count);
				List<WebTransaction> newWebTransactions = new List<WebTransaction>();
				foreach (Record record in records)
				{
					string json = Encoding.UTF8.GetString(record.Data.ToArray());
					try
					{
						JToken token = JContainer.Parse(json);
						try
						{									
							WebTransaction wt = JsonConvert.DeserializeObject<WebTransaction>(json);
							List<string> validationErrors = wt.Validate();
							if (!validationErrors.Any())
							{
								Console.WriteLine("Valid entity: {0}", json);
								newWebTransactions.Add(wt);
							}
							else
							{
								StringBuilder exceptionBuilder = new StringBuilder();
								exceptionBuilder.Append("Invalid WebTransaction object from JSON: ")
									.Append(Environment.NewLine).Append(json)
									.Append(Environment.NewLine).Append("Validation errors: ")
									.Append(Environment.NewLine);
								foreach (string error in validationErrors)
								{
									exceptionBuilder.Append(error).Append(Environment.NewLine);																										
								}
								Console.WriteLine(exceptionBuilder.ToString());
							}									
						}
						catch (Exception ex)
						{
							//simulate logging
							Console.WriteLine("Could not parse the following message to a WebTransaction object: {0}", json);
						}
					}
					catch (Exception ex)
					{
						//simulate logging
						Console.WriteLine("Could not parse the following message, invalid json: {0}", json);
					}
				}
			}

			iteratorId = nextIterator;
		}
	}
}

Run the application with F5. This will start the project that is set as the start-up project. You can start the other one using the technique we saw in the previous post: right-click, Debug, Start new instance. You’ll have two console windows running. If you had some messages left in the Kinesis stream then they should be validated now. I can see the following output:

Initial validation messages for Kinesis

Let’s now send some new messages to Kinesis:

Validation errors from messages to Kinesis

Great, we have some basic validation logic in place.

We’ll discuss storing the messages in the next post which will finish the series on Amazon Kinesis.

View all posts related to Amazon Web Services and Big Data here.

Getting the byte array of a string depending on Encoding in C# .NET

You can take any string in C# and view its byte array data depending on the Encoding type. You can get hold of the encoding type using the Encoding.GetEncoding method. Some frequently used code pages have their short-cuts:

  • Encoding.ASCII
  • Encoding.BigEndianUnicode
  • Encoding.Unicode – this is UTF16
  • Encoding.UTF7
  • Encoding.UTF32
  • Encoding.UTF8

Once you’ve got hold of an encoding you can call its GetBytes method to return the byte array representation of a string. You can use this method whenever another method requires a byte array input instead of a string.

For backward compatibility the positions 0-127 are the same in most encoding types. These cover the standard English alphabet – both lower and upper case -, the numbers, punctuation plus some other characters. So if you only take characters from this range then the byte values in the array will be the same. You can view the ASCII characters here: ASCII character set.

The following function will print the same values for both the ASCII and Chinese encoding types:

string input = "I am feeling great";
byte[] asciiEncoded = Encoding.ASCII.GetBytes(input);
Console.WriteLine("Ascii");
foreach (byte b in asciiEncoded)
{
	Console.WriteLine(b);
}

Encoding chinese = Encoding.GetEncoding("Chinese");
byte[] chineseEncoded = chinese.GetBytes(input);
Console.WriteLine("Chinese");
foreach (byte b in chineseEncoded)
{
	Console.WriteLine(b);
}

If you’re trying to ASCII-encode a Unicode string which contains non-ASCII characters then you’ll get see the ASCII byte value of 63, i.e. ‘?’:

string input = "öåä I am feeling great";
byte[] asciiEncoded = Encoding.ASCII.GetBytes(input);
Console.WriteLine("Ascii");
foreach (byte b in asciiEncoded)
{
	Console.WriteLine(b);
}

The first 3 positions will print 63 as the Swedish ‘öåä’ characters cannot be handled by ASCII. E.g. whenever you visit a website and see question marks and other funny characters instead of proper text then you know that there’s an encoding problem: the page has been encoded with an encoding type that’s not available on the user’s computer when viewed.

View all posts related to Globalization here.

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

ARCHIVED: Bite-size insight on Cyber Security for the not too technical.