Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

Introduction

In the previous post we tested how to connect to the master node in code using the .NET AWS SDK and ODBC. We also executed our first simple Postgresql remotely. In this post we’ll continue in those tracks and execute some more Postgresql statements on our master node.

Preparation

We’ll execute most of the scripts we saw in this blog post. Prepare a text file called postgresscript.txt with the following content and save it somewhere on your harddrive:

drop table if exists url_response_times;
drop table if exists url_aggregations;

create table if not exists url_response_times (url varchar, response_time int);
insert into url_response_times(url, response_time) values ('www.facebook.com', 512);
insert into url_response_times(url, response_time) values ('www.facebook.com',742);
insert into url_response_times(url, response_time) values ('www.yahoo.com',983);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1532);
insert into url_response_times(url, response_time) values ('www.facebook.com',783);
insert into url_response_times(url, response_time) values ('www.facebook.com',912);
insert into url_response_times(url, response_time) values ('www.yahoo.com',437);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1764);
insert into url_response_times(url, response_time) values ('www.yahoo.com',142);
insert into url_response_times(url, response_time) values ('www.facebook.com',824);
insert into url_response_times(url, response_time) values ('www.facebook.com',381);
insert into url_response_times(url, response_time) values ('www.yahoo.com',798);
insert into url_response_times(url, response_time) values ('www.yahoo.com',725);
insert into url_response_times(url, response_time) values ('www.yahoo.com',236);
insert into url_response_times(url, response_time) values ('www.facebook.com',315);
insert into url_response_times(url, response_time) values ('www.facebook.com',853);
create table if not exists url_aggregations (url varchar, avg_response_time double precision);

insert into url_aggregations (url, avg_response_time) select url, avg(response_time) from url_response_times group by url;

Executing the batch and other statements

Open the C# RedShift demo console application we’ve been working on. Add the following method to RedShiftDemoService.cs:

public void ExecuteModelScriptInRedShift(string clusterName)
{
	using (IAmazonRedshift redshiftClient = GetRedShiftClient())
	{
		try
		{
			string script = File.ReadAllText(@"C:\path-to-your-script\postgresscript.txt");
			DescribeClustersRequest describeClustersRequest = new DescribeClustersRequest() { ClusterIdentifier = clusterName };
			DescribeClustersResponse describeClustersResponse = redshiftClient.DescribeClusters(describeClustersRequest);
			Cluster firstMatch = describeClustersResponse.Clusters[0];

			String mainDbName = firstMatch.DBName;
			String endpointAddress = firstMatch.Endpoint.Address;
			int endpointPort = firstMatch.Endpoint.Port;
			string masterUsername = firstMatch.MasterUsername;
			string password = "your master node password";

			string odbcConnectionString = string.Concat("Driver={PostgreSQL Unicode}; Server=", endpointAddress
				, "; Database=", mainDbName, "; UID=", masterUsername, "; PWD=", password
				, "; Port=", endpointPort);
			using (OdbcConnection conn = new OdbcConnection(odbcConnectionString))
			{
				try
				{
					conn.Open();
					OdbcCommand odbcCommand = new OdbcCommand(script, conn);
					int totalRowsInserted = odbcCommand.ExecuteNonQuery();
					string selectUrlRecords = "select * from url_response_times;";
					string selectAggregationRecords = "select * from url_aggregations;";

					odbcCommand.CommandText = selectUrlRecords;
					using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
					{
						Console.WriteLine("About to read the contents of url observations, {0} records found.", dataReader.RecordsAffected);
						int recordCounter = 1;
						while (dataReader.Read())
						{
							Console.WriteLine("Printing information on record #{0}", recordCounter);
							int columnCount = dataReader.FieldCount;									
							for (int i = 0; i < columnCount; i++)
							{
								Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
							}
							recordCounter++;
						}
					}

					Console.WriteLine();
					odbcCommand.CommandText = selectAggregationRecords;
					using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
					{
						Console.WriteLine("About to read the contents of url aggregations, {0} records found.", dataReader.RecordsAffected);
						int recordCounter = 1;
						while (dataReader.Read())
						{
							Console.WriteLine("Printing information on record #{0}", recordCounter);
							int columnCount = dataReader.FieldCount;
							for (int i = 0; i < columnCount; i++)
							{
								Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
							}
							recordCounter++;
						}
					}

					string deleteCommands = "drop table if exists url_response_times; drop table if exists url_aggregations;";
					odbcCommand.CommandText = deleteCommands;
					odbcCommand.ExecuteNonQuery();
				}
				catch (Exception ex)
				{
					Console.WriteLine("Exception caught while communicating with RedShift master node: ");
					Console.WriteLine(ex.Message);
				}
			}
		}
		catch (AmazonRedshiftException e)
		{
			Console.WriteLine("Postgresql command execution has failed.");
			Console.WriteLine("Amazon error code: {0}",
				string.IsNullOrEmpty(e.ErrorCode) ? "None" : e.ErrorCode);
			Console.WriteLine("Exception message: {0}", e.Message);
		}
	}
}

Let’s go through this step by step.

We first extract the contents of the script file. We then locate the cluster and build up the connection string as we saw in the previous post, there’s nothing new here. In the previous post we saw an example of using the OdbcDataAdapter object so here we do something different. The OdbcCommand is a very generic object in that it can be given any SQL command, even many at the same time and it will execute all of them using the various “Execute…” methods. ExecuteNonQuery can be used for inserts and deletes. It will return the number of rows affected. In the above example it returns 16 + 2 = 18: 16 from the insertions to the url_response_times table and 2 more when the url_aggregations table is filled in.

The ExecuteReader method is used when you query the database with a SELECT statement. It returns an OdbcDataReader object which you can iterate to extract all values. We do that for both tables and print their contents. You’ve probably seen that this is a lot of duplication of essentially the same code where we read the content of the tables. This can be easily avoided by joining the SELECT statements into one reader as follows:

odbcCommand.CommandText = string.Concat(selectUrlRecords, selectAggregationRecords);
using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
{								
	do
	{
		while (dataReader.Read())
		{
			int columnCount = dataReader.FieldCount;
			for (int i = 0; i < columnCount; i++)
			{
				Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
			}
		}
	}
	while (dataReader.NextResult());
}

After printing some information about the tables’ content we clean up after us with two DROP statements for the same OdbcCommand object.

We haven’t used the ExecuteScalar method here. It returns a single value: the first record in the first column and row. It’s typically used with UPDATEs and then you can check from the result how many rows have been affected.

SQL commands with parameters

You’ve probably heard of SQL injection attacks and how harmful they can be for your database driven applications.

One way to reduce the risks of such an attack is to use parameterised SQL statements. These are statements where you leave placeholders in the SQL statement. You fill in those placeholders with reliable data after checking the incoming values. This is a way of sanitising the users’ inputs on your public web page where you cannot always control how your visitors fill in e.g. a questionnaire.

Say you’d like to filter the records by URL and response time. Here’s how you can prepare a statement template and substitute the placeholders:

odbcCommand.CommandText = "select * from url_response_times where url = ? and response_time > ?;";
odbcCommand.Parameters.Add("@url", OdbcType.NVarChar).Value = "www.facebook.com";
odbcCommand.Parameters.Add("@response_time", OdbcType.Int).Value = 700;
using (OdbcDataReader dataReader = odbcCommand.ExecuteReader())
{
	while (dataReader.Read())
	{
		int columnCount = dataReader.FieldCount;
		for (int i = 0; i < columnCount; i++)
		{
			Console.WriteLine("Column name: {0}, column value: {1}", dataReader.GetName(i), dataReader.GetValue(i));
		}
	}
}

All placeholders are denoted by ‘?’. The variable names in the Add method do not make any difference. The values will be inserted in the order of appearance. The below code will give the same result set:

odbcCommand.Parameters.Add("@mickey", OdbcType.NVarChar).Value = "www.facebook.com";
odbcCommand.Parameters.Add("@pluto", OdbcType.Int).Value = 700;

We now know the basics of communicating with RedShift using the AWS .NET SDK and ODBC. We’ll get somewhat more serious in the next post where we’ll start looking into a database schema often used in conjunction with data warehousing and data mining: the star schema.

View all posts related to Amazon Web Services and Big Data here.

Advertisement

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

3 Responses to Using Amazon RedShift with the AWS .NET API Part 6: Postgresql to master node using ODBC

  1. Jason Yang says:

    Hi Andras

    Thank you for the great post!

    I am new to Redshift, I wanted to connect to it without using Amazon Redshift driver, because our program will be used by clients, they may not have this installed on their computer. Could you please tell me if this is possible?

    Thank you so much!

    Regards
    Jason

    • Andras Nemes says:

      Hi Jason, you’ll need to package the AWS dependencies in your installation. This is true of every dependency that the client application needs.
      //Andras

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: