← Checking for arithmetic overflow in C# .NET

Creating temporary files on Windows using C# .NET →

Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

April 2, 2015 3 Comments

Introduction

In the previous post we tested how to connect to the master node in code using the .NET AWS SDK and ODBC. We also executed our first simple Postgresql remotely. In this post we’ll continue in those tracks and execute some more Postgresql statements on our master node.

Preparation

We’ll execute most of the scripts we saw in this blog post. Prepare a text file called postgresscript.txt with the following content and save it somewhere on your harddrive:

drop table if exists url_response_times;
drop table if exists url_aggregations;

create table if not exists url_response_times (url varchar, response_time int);
insert into url_response_times(url, response_time) values ('www.facebook.com', 512);
insert into url_response_times(url, response_time) values ('www.facebook.com',742);
insert into url_response_times(url, response_time) values ('www.yahoo.com',983);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1532);
insert into url_response_times(url, response_time) values ('www.facebook.com',783);
insert into url_response_times(url, response_time) values ('www.facebook.com',912);
insert into url_response_times(url, response_time) values ('www.yahoo.com',437);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1764);
insert into url_response_times(url, response_time) values ('www.yahoo.com',142);
insert into url_response_times(url, response_time) values ('www.facebook.com',824);
insert into url_response_times(url, response_time) values ('www.facebook.com',381);
insert into url_response_times(url, response_time) values ('www.yahoo.com',798);
insert into url_response_times(url, response_time) values ('www.yahoo.com',725);
insert into url_response_times(url, response_time) values ('www.yahoo.com',236);
insert into url_response_times(url, response_time) values ('www.facebook.com',315);
insert into url_response_times(url, response_time) values ('www.facebook.com',853);
create table if not exists url_aggregations (url varchar, avg_response_time double precision);

insert into url_aggregations (url, avg_response_time) select url, avg(response_time) from url_response_times group by url;

Executing the batch and other statements

Open the C# RedShift demo console application we’ve been working on. Add the following method to RedShiftDemoService.cs:

public void ExecuteModelScriptInRedShift(string clusterName)
{
	using (IAmazonRedshift redshiftClient = GetRedShiftClient())
	{
		try
		{
			string script = File.ReadAllText(@"C:\path-to-your-script\postgresscript.txt");
			DescribeClustersRequest describeClustersRequest = new DescribeClustersRequest() { ClusterIdentifier = clusterName };
			DescribeClustersResponse describeClustersResponse = redshiftClient.DescribeClusters(describeClustersRequest);
			Cluster firstMatch = describeClustersResponse.Clusters[0];

			String mainDbName = firstMatch.DBName;
			String endpointAddress = firstMatch.Endpoint.Address;
			int endpointPort = firstMatch.Endpoint.Port;
			string masterUsername = firstMatch.MasterUsername;
			string password = "your master node password";

			string odbcConnectionString = string.Concat("Driver={PostgreSQL Unicode}; Server=", endpointAddress
				, "; Database=", mainDbName, "; UID=", masterUsername, "; PWD=", password
				, "; Port=", endpointPort);
			using (OdbcConnection conn = new OdbcConnection(odbcConnectionString))
			{
				try
				{
					conn.Open();
					OdbcCommand odbcCommand = new OdbcCommand(script, conn);
					int totalRowsInserted = odbcCommand.ExecuteNonQuery();
					string selectUrlRecords = "select * from url_response_times;";
					string selectAggregationRecords = "select * from url_aggregations;";

					odbcCommand.CommandText = selectUrlRecords;
					using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
					{
						Console.WriteLine("About to read the contents of url observations, {0} records found.", dataReader.RecordsAffected);
						int recordCounter = 1;
						while (dataReader.Read())
						{
							Console.WriteLine("Printing information on record #{0}", recordCounter);
							int columnCount = dataReader.FieldCount;									
							for (int i = 0; i < columnCount; i++)
							{
								Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
							}
							recordCounter++;
						}
					}

					Console.WriteLine();
					odbcCommand.CommandText = selectAggregationRecords;
					using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
					{
						Console.WriteLine("About to read the contents of url aggregations, {0} records found.", dataReader.RecordsAffected);
						int recordCounter = 1;
						while (dataReader.Read())
						{
							Console.WriteLine("Printing information on record #{0}", recordCounter);
							int columnCount = dataReader.FieldCount;
							for (int i = 0; i < columnCount; i++)
							{
								Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
							}
							recordCounter++;
						}
					}

					string deleteCommands = "drop table if exists url_response_times; drop table if exists url_aggregations;";
					odbcCommand.CommandText = deleteCommands;
					odbcCommand.ExecuteNonQuery();
				}
				catch (Exception ex)
				{
					Console.WriteLine("Exception caught while communicating with RedShift master node: ");
					Console.WriteLine(ex.Message);
				}
			}
		}
		catch (AmazonRedshiftException e)
		{
			Console.WriteLine("Postgresql command execution has failed.");
			Console.WriteLine("Amazon error code: {0}",
				string.IsNullOrEmpty(e.ErrorCode) ? "None" : e.ErrorCode);
			Console.WriteLine("Exception message: {0}", e.Message);
		}
	}
}

Let’s go through this step by step.

We first extract the contents of the script file. We then locate the cluster and build up the connection string as we saw in the previous post, there’s nothing new here. In the previous post we saw an example of using the OdbcDataAdapter object so here we do something different. The OdbcCommand is a very generic object in that it can be given any SQL command, even many at the same time and it will execute all of them using the various “Execute…” methods. ExecuteNonQuery can be used for inserts and deletes. It will return the number of rows affected. In the above example it returns 16 + 2 = 18: 16 from the insertions to the url_response_times table and 2 more when the url_aggregations table is filled in.

The ExecuteReader method is used when you query the database with a SELECT statement. It returns an OdbcDataReader object which you can iterate to extract all values. We do that for both tables and print their contents. You’ve probably seen that this is a lot of duplication of essentially the same code where we read the content of the tables. This can be easily avoided by joining the SELECT statements into one reader as follows:

odbcCommand.CommandText = string.Concat(selectUrlRecords, selectAggregationRecords);
using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
{								
	do
	{
		while (dataReader.Read())
		{
			int columnCount = dataReader.FieldCount;
			for (int i = 0; i < columnCount; i++)
			{
				Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
			}
		}
	}
	while (dataReader.NextResult());
}

After printing some information about the tables’ content we clean up after us with two DROP statements for the same OdbcCommand object.

We haven’t used the ExecuteScalar method here. It returns a single value: the first record in the first column and row. It’s typically used with UPDATEs and then you can check from the result how many rows have been affected.

SQL commands with parameters

You’ve probably heard of SQL injection attacks and how harmful they can be for your database driven applications.

One way to reduce the risks of such an attack is to use parameterised SQL statements. These are statements where you leave placeholders in the SQL statement. You fill in those placeholders with reliable data after checking the incoming values. This is a way of sanitising the users’ inputs on your public web page where you cannot always control how your visitors fill in e.g. a questionnaire.

Say you’d like to filter the records by URL and response time. Here’s how you can prepare a statement template and substitute the placeholders:

odbcCommand.CommandText = "select * from url_response_times where url = ? and response_time > ?;";
odbcCommand.Parameters.Add("@url", OdbcType.NVarChar).Value = "www.facebook.com";
odbcCommand.Parameters.Add("@response_time", OdbcType.Int).Value = 700;
using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
{
	while (dataReader.Read())
	{
		int columnCount = dataReader.FieldCount;
		for (int i = 0; i < columnCount; i++)
		{
			Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
		}
	}
}

All placeholders are denoted by ‘?’. The variable names in the Add method do not make any difference. The values will be inserted in the order of appearance. The below code will give the same result set:

odbcCommand.Parameters.Add("@mickey", OdbcType.NVarChar).Value = "www.facebook.com";
odbcCommand.Parameters.Add("@pluto", OdbcType.Int).Value = 700;

We now know the basics of communicating with RedShift using the AWS .NET SDK and ODBC. We’ll get somewhat more serious in the next post where we’ll start looking into a database schema often used in conjunction with data warehousing and data mining: the star schema.

View all posts related to Amazon Web Services and Big Data here.

Filed under .NET, Amazon, Big Data Tagged with amazon, amazon cloud, aws, big data, c#, redshift

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

3 Responses to Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

dineshramitc says:

April 7, 2015 at 10:28 am

Reblogged this on Dinesh Ram Kali..

Reply
Jason Yang says:

July 25, 2017 at 4:40 pm

Hi Andras

Thank you for the great post!

I am new to Redshift, I wanted to connect to it without using Amazon Redshift driver, because our program will be used by clients, they may not have this installed on their computer. Could you please tell me if this is possible?

Thank you so much!

Regards
Jason

Reply
- Andras Nemes says:
  
  July 25, 2017 at 7:25 pm
  
  Hi Jason, you’ll need to package the AWS dependencies in your installation. This is true of every dependency that the client application needs.
  //Andras
  
  Reply

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

3 Responses to Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

Share:

Related

3 Responses to Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow