Exercises in .NET with Andras Nemes

Using NumberStyles to parse numbers in C# .NET Part 2

March 13, 2015 1 Comment

In the previous post we looked at some basic usage of the NumberStyles enumeration. The enumeration allows to parse other representations of numeric values.

Occasionally negative numbers are shown with a trailing negative sign like this: “13-“. There’s a solution for that:

string number = "13-";
int parsed = int.Parse(number, NumberStyles.AllowTrailingSign);

“parsed” will be -13 as expected.

Read more of this post

Filed under .NET, Globalization Tagged with c#, globalization, number format

Using Amazon Elastic MapReduce with the AWS .NET API Part 8: connection to our Big Data demo

March 12, 2015 Leave a comment

Introduction

In the previous post we saw how to start an Amazon EMR cluster and have it execute a Hive script which performs a basic aggregation step.

This post will take up the Big Data thread where we left off at the end of the previous 2 series on the blob storage component Amazon S3 and the NoSql data store DynamoDb. Therefore the pre-requisite of following the code examples in this post is familiarity with what we discussed in those topics.

Read more of this post

Filed under .NET, Amazon, Big Data Tagged with amazon, amazon cloud, aws, big data, c#, elastic mapreduce

Using NumberStyles to parse numbers in C# .NET Part 1

March 11, 2015 1 Comment

There are a lot of number formats out there depending on the industry we’re looking at. E.g. negative numbers can be represented in several different ways:

-14
(14)
14-
14.00-
(14,00)

…and so on. Accounting, finance and other, highly “numeric” fields will have their own standards to represent numbers. Your application may need to parse all these strings and convert them into proper numeric values. The static Parse method of the numeric classes, like int, double, decimal all accept a NumberStyles enumeration. This enumeration is located in the System.Globalization namespace.

Read more of this post

Filed under .NET, Globalization Tagged with c#, globalization, number format

The Conditional attribute to control execution of parts of the code in C# .NET

March 10, 2015 Leave a comment

In this post we saw how to use the #if preprocessor to control the execution of code. The compiler will understand those instructions and compile away bits of code.

There’s another way to achieve something similar using the Conditional attribute. You can decorate methods with this attribute. It accepts a string parameter which defines the name of the symbol that must be defined in order for the method to be carried out.

Read more of this post

Filed under .NET, C# language features Tagged with c#, preprocessor

Using Amazon Elastic MapReduce with the AWS .NET API Part 7: indirect Hive with .NET

March 9, 2015 1 Comment

Introduction

In the previous post we looked at the basics of how EMR can interact with Amazon S3 and DynamoDb using some additional commands and properties of Hive. In part 5 of this series we saw how to start a new cluster and assign a couple of steps to it using .NET. In this post we’ll marry the two: we’ll start up a new cluster and assign the full aggregation job flow to it in code.

It’s not possible to send a set of Hive commands in plain text to the cluster. The Hive script must be stored somewhere where EMR can find it. The .NET code can only point out the location. All this means that we’ll need to prepare the Hive script beforehand. An ideal place to store all types of files is of course Amazon S3 and we’ll follow that line here as well. This is the reason why I called this post “indirect Hive”.

Preparation

First delete all records from the aggregation-tests DynamoDb table from the previous post so that we won’t confuse the old results with the new ones.

Next, create a text file called hive-aggregation-script.txt and insert the following statements:

create external table if not exists raw_urls (url string, response_time int) row format delimited fields terminated by '|' location 's3://raw-urls-data/';
create external table if not exists aggregations (id string, url string, avg_response_time double) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' tblproperties("dynamodb.table.name" = "aggregation-tests", "dynamodb.region" = "eu-west-1", "dynamodb.column.mapping" = "id:id,url:url_name,avg_response_time:average_response_time");
create table if not exists temporary_local_aggregation (id string, url string, avg_response_time double);
insert overwrite table temporary_local_aggregation SELECT reflect("java.util.UUID", "randomUUID"), url, avg(response_time) FROM raw_urls GROUP BY url;
insert into table aggregations select * from temporary_local_aggregation;
drop table aggregations;
drop table raw_urls;

All of these statements are copied directly from the previous post. It’s just a step-by-step instruction to create the external tables, run the aggregations and store them in DynamoDb.

Upload hive-aggregation-script.txt to S3 in a bucket called “hive-aggregation-scripts”.

.NET

Open the EMR tester console application we started building in part 5 referenced above. We have a method called StartCluster in EmrDemoService.cs. We’ll first insert a similar method. It starts a cluster and returns the job flow id. Insert the following method in EmrDemoService:

public string StartNewActiveWaitingJobFlow()
{
	using (IAmazonElasticMapReduce emrClient = GetEmrClient())
	{
		try
		{
			StepFactory stepFactory = new StepFactory(RegionEndpoint.EUWest1);

			StepConfig installHive = new StepConfig()
			{
				Name = "Install Hive step",
				ActionOnFailure = "TERMINATE_JOB_FLOW",
				HadoopJarStep = stepFactory.NewInstallHiveStep(StepFactory.HiveVersion.Hive_Latest)
			};

			StepConfig installPig = new StepConfig()
			{
				Name = "Install Pig step",
				ActionOnFailure = "TERMINATE_JOB_FLOW",
				HadoopJarStep = stepFactory.NewInstallPigStep()
			};

			JobFlowInstancesConfig jobFlowInstancesConfig = new JobFlowInstancesConfig()
			{
				Ec2KeyName = "insert-your-ec2-key-name",
				HadoopVersion = "2.4.0",
				InstanceCount = 3,
				KeepJobFlowAliveWhenNoSteps = true,
				MasterInstanceType = "m3.xlarge",
				SlaveInstanceType = "m3.xlarge"
			};

			RunJobFlowRequest runJobFlowRequest = new RunJobFlowRequest()
			{
				Name = "EMR cluster",
				Steps = { installHive, installPig },
				Instances = jobFlowInstancesConfig
			};

			RunJobFlowResponse runJobFlowResponse = emrClient.RunJobFlow(runJobFlowRequest);
			String jobFlowId = runJobFlowResponse.JobFlowId;
			Console.WriteLine(string.Format("Job flow id: {0}", jobFlowId));
			bool oneStepNotFinished = true;
			while (oneStepNotFinished)
			{
				DescribeJobFlowsRequest describeJobFlowsRequest = new DescribeJobFlowsRequest(new List<string>() { jobFlowId });
				DescribeJobFlowsResponse describeJobFlowsResponse = emrClient.DescribeJobFlows(describeJobFlowsRequest);
				JobFlowDetail jobFlowDetail = describeJobFlowsResponse.JobFlows.First();
				List<StepDetail> stepDetailsOfJob = jobFlowDetail.Steps;
				List<string> stepStatuses = new List<string>();
				foreach (StepDetail stepDetail in stepDetailsOfJob)
				{
					StepExecutionStatusDetail status = stepDetail.ExecutionStatusDetail;
					String stepConfigName = stepDetail.StepConfig.Name;
					String statusValue = status.State.Value;
					stepStatuses.Add(statusValue.ToLower());
					Console.WriteLine(string.Format("Step config name: {0}, step status: {1}", stepConfigName, statusValue));
				}
				oneStepNotFinished = stepStatuses.Contains("pending") || stepStatuses.Contains("running");
				Thread.Sleep(10000);
			}
			return jobFlowId;
		}
		catch (AmazonElasticMapReduceException emrException)
		{
			Console.WriteLine("Cluster list creation has failed.");
			Console.WriteLine("Amazon error code: {0}",
				string.IsNullOrEmpty(emrException.ErrorCode) ? "None" : emrException.ErrorCode);
			Console.WriteLine("Exception message: {0}", emrException.Message);
		}
	}

	return string.Empty;
}

You should understand all of that from part 5 referred to above. It is just a slightly reduced form of the StartCluster method: we have no logging, no tags, no termination protection, we don’t need those right now.

So the method returns a job flow ID. We can use that ID to add new steps to be executed in the cluster. Adding extra steps is also performed by the StepFactory and StepConfig objects. StepFactory has a method called NewRunHiveScriptStep which accepts the S3 address of the Hive script you’d like to have executed.

The below method adds a step to the job flow, monitors its progress and periodically prints out the step status, such as “pending” or “running”. When the step has completed the method prints the final status to the console window. The cluster is finally terminated:

public void RunHiveScriptStep()
{
	using (IAmazonElasticMapReduce emrClient = GetEmrClient())
	{
		try
		{
			String activeWaitingJobFlowId = StartNewActiveWaitingJobFlow();
			if (!string.IsNullOrEmpty(activeWaitingJobFlowId))
			{
				StepFactory stepFactory = new StepFactory(RegionEndpoint.EUWest1);
				String scriptLocation = "s3://hive-aggregation-scripts/hive-aggregation-script.txt";
				StepConfig runHiveScript = new StepConfig()
				{
					Name = "Run Hive script",
					HadoopJarStep = stepFactory.NewRunHiveScriptStep(scriptLocation),
					ActionOnFailure = "TERMINATE_JOB_FLOW"
				};
				AddJobFlowStepsRequest addHiveRequest = new AddJobFlowStepsRequest(activeWaitingJobFlowId, new List<StepConfig>(){runHiveScript});
				AddJobFlowStepsResult addHiveResponse = emrClient.AddJobFlowSteps(addHiveRequest);
				List<string> stepIds = addHiveResponse.StepIds;
				String hiveStepId = stepIds[0];

				DescribeStepRequest describeHiveStepRequest = new DescribeStepRequest() { ClusterId = activeWaitingJobFlowId, StepId = hiveStepId };
				DescribeStepResult describeHiveStepResult = emrClient.DescribeStep(describeHiveStepRequest);
				Step hiveStep = describeHiveStepResult.Step;
				StepStatus hiveStepStatus = hiveStep.Status;
				string hiveStepState = hiveStepStatus.State.Value.ToLower();
				bool failedState = false;
				StepTimeline finalTimeline = null;
				while (hiveStepState != "completed")
				{
					describeHiveStepRequest = new DescribeStepRequest() { ClusterId = activeWaitingJobFlowId, StepId = hiveStepId };
					describeHiveStepResult = emrClient.DescribeStep(describeHiveStepRequest);
					hiveStep = describeHiveStepResult.Step;
					hiveStepStatus = hiveStep.Status;
					hiveStepState = hiveStepStatus.State.Value.ToLower();
					finalTimeline = hiveStepStatus.Timeline;
					Console.WriteLine(string.Format("Current state of Hive script execution: {0}", hiveStepState));
					switch (hiveStepState)
					{
						case "pending":
						case "running":
							Thread.Sleep(10000);
							break;
						case "cancelled":
						case "failed":
						case "interrupted":
							failedState = true;
							break;
					}
					if (failedState)
					{
						break;
					}
				}
				if (finalTimeline != null)
				{
					Console.WriteLine(string.Format("Hive script step {0} created at {1}, started at {2}, finished at {3}"
						, hiveStepId, finalTimeline.CreationDateTime, finalTimeline.StartDateTime, finalTimeline.EndDateTime));
				}
				TerminateJobFlowsRequest terminateRequest = new TerminateJobFlowsRequest(new List<string> { activeWaitingJobFlowId });
				TerminateJobFlowsResponse terminateResponse = emrClient.TerminateJobFlows(terminateRequest);
			}
			else
			{
				Console.WriteLine("No valid job flow could be created.");
			}
		}
		catch (AmazonElasticMapReduceException emrException)
		{
			Console.WriteLine("Hive script execution step has failed.");
			Console.WriteLine("Amazon error code: {0}",
				string.IsNullOrEmpty(emrException.ErrorCode) ? "None" : emrException.ErrorCode);
			Console.WriteLine("Exception message: {0}", emrException.Message);
		}
	}
}

Call the method from Main:

static void Main(string[] args)
{		

	EmrDemoService emrDemoService = new EmrDemoService();
	emrDemoService.RunHiveScriptStep();

	Console.WriteLine("Main done...");
	Console.ReadKey();
}

Here’s a sample output from the start of the program execution:

Here’s a sample printout from the start of the Hive script execution:

The step is also visible in the EMR GUI:

The step has completed:

Let’s check in DynamoDb if the aggregation script has actually worked. Indeed it has:

In the next part we’ll apply all we have learnt about EMR to our original overall Big Data demo we started building in the series on Amazon Kinesis.

View all posts related to Amazon Web Services and Big Data here.

Filed under .NET, Amazon, Big Data Tagged with amazon, amazon cloud, aws, big data, c#, elastic mapreduce

Python language basics 5: floating point numbers

March 8, 2015 Leave a comment

Introduction

In the previous post we took up the integer built-in data type in Python. In this post we’ll continue with how floating point numbers are declared and used in Python.

As a reminder let’s restate that an important characteristic of Python is duck-typing. In practice it means that we don’t need to declare the type of a variable. It also means that you don’t need to look for keywords like “float” or “double” like in Java or C#, they don’t exist in Python.

Floating point numbers

Like most other programming languages Python supports floating point numbers. I’m not even aware of a single serious language out there which doesn’t support this data type. Floating point numbers in Python are signed and are represented as 64 bit double precision numbers. On this page you can read more about the degree of precision of floats in Python – it’s actually the same as how the double data type is supported in C# and Java.

As mentioned in the post on integers referenced above it’s not even possible to declare a variable like…

float x = 4

…in Python, it will give you a compilation error. Also, there’s no distinction between the different types of floating point numbers like “floats” and “doubles”. Here’s a very simple declaration of a float in Python:

d = 7.344

Scientific notation is represented by “e”. The following expression…:

scientific = 4e10
print(scientific)

…means 4 x (10 to the power of 8), i.e. 40000000000.0.

For small numbers we can also use scientific notation:

verysmall = 0.543e-2

…which reads 0.543 times 10 to the power of -2 and gives 0.00543.

We saw in the post on integers how the int constructor can be used to build integers. There’s a similar float constructor which converts other data types to float.

From a string:

f = float("4.56")

From an integer:

f = float(13)

…where f will become 13.0.

You don’t need to worry about converting between integers and floats when working with both types within the same statement. The result will be converted to float automatically so that you don’t run the risk of losing precision. Example:

fres = 13.6 / 4
print(res)

‘res’ will be 3.4 and not 3.

Read all Python-related posts on this blog here.

Filed under python Tagged with language course, python

Python language basics 4: integers

March 7, 2015 Leave a comment

Introduction

In the previous post we discussed the role of white space in Python We saw how it was used to give structure to the code. In a number of other popular languages like C# or Java curly braces ‘{}’ are used to delimit code blocks, but Python is cleaner.

In this post we’ll start looking at some of the built-in data types in Python:

Read more of this post

Filed under python Tagged with language course, python

Combining parts of full file path in C# .NET

March 6, 2015 1 Comment

The Path class in the System.IO namespace provides a number of useful filepath-related methods. Say that you have the following ingredients of a full file path:

string drive = @"C:\";
string folders = @"logfiles\october\";
string fileName = "log.txt";

Read more of this post

Filed under .NET, C# language features Tagged with c#, file i/o

Using Amazon Elastic MapReduce with the AWS .NET API Part 6: Hive with Amazon S3 and DynamoDb

March 5, 2015 1 Comment

Introduction

In the previous post we looked at how to interact with EMR using the AWS .NET SDK. We saw examples of how to get the cluster list, start a new cluster, assign steps to the cluster and finally terminate it.

In this post we’ll return to the Hive CLI to see how EMR can interact with Amazon S3 and DynamoDb. We investigated both components here and here.

Read more of this post

Filed under .NET, Amazon, Big Data Tagged with amazon, amazon cloud, aws, elastic mapreduce, hadoop, hive

Converting a string into a DateTime with an exact date format in C# .NET

March 4, 2015 Leave a comment

How do you read the date “03-10-2014”? For some of you this will be October 03 2014. For others, especially those from the US this will be interpreted as March 10 2014. This is because dates are represented in different formats in different cultures.

Consider the following code:

string dateString = "03-10-2014";
DateTime parsedDate = DateTime.Parse(dateString);
string toString = parsedDate.ToLongDateString();

Read more of this post

Filed under .NET, C# language features Tagged with c#, datetime, format

← Older posts

Newer posts →

Exercises in .NET with Andras Nemes

Using NumberStyles to parse numbers in C# .NET Part 2

Using Amazon Elastic MapReduce with the AWS .NET API Part 8: connection to our Big Data demo

Using NumberStyles to parse numbers in C# .NET Part 1

The Conditional attribute to control execution of parts of the code in C# .NET

Using Amazon Elastic MapReduce with the AWS .NET API Part 7: indirect Hive with .NET

Python language basics 5: floating point numbers

Python language basics 4: integers

Combining parts of full file path in C# .NET

Using Amazon Elastic MapReduce with the AWS .NET API Part 6: Hive with Amazon S3 and DynamoDb

Converting a string into a DateTime with an exact date format in C# .NET

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Share:

Share:

Share:

Share:

Share:

Share:

Share:

Share:

Share:

Share:

My profile

Verified Services

Follow my blog via email

Top Posts & Pages

History

Keywords

Blogs I Follow