← Python language basics 9: relational operators

Read the byte array representation of a C# base type →

Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

March 23, 2015 2 Comments

Introduction

In the previous post of this series we quickly looked at what a massively parallel processing database is. We also launched our first Amazon RedShift cluster.

In this post we’ll connect to the master node and start issuing Postgresql commands.

If you don’t have any RedShift cluster available at this point then you can follow the steps in the previous post so that you can try the example code.

Connecting to RedShift

There’s no freely available GUI specifically made for RedShift or even a standard one, like SQL Management Studio for MS SQL. We’ll use a generic DB tool called SQL WorkbenchJ to connect to the master node. If you have your cluster ready you can go through the following documentation on how to download and configure WorkbenchJ to connect to the master node. I used the JDBC connection steps under the “Connect to Your Cluster over a JDBC Connection in SQL Workbench/J” section: here‘s the how-to page.

Go through the entire process and connect to the master node using the JDBC connection string, the master node username and password. The JDBC connection string will include the name of the cluster such as “a-first-test” and the name of the database “urls” that we created. It’s important to note that there’s no easy way to switch the database context in Postgresql like you would in MS SQL with “use ‘database_name'”. If you want to switch to another database you’ll need to reconnect with the database name modified in the connection string.

If you get stuck you’re welcome to ask in the comments section below. At the end you should have an empty tab open in the WorkbenchJ GUI ready to write some SQL commands.

Postgresql basics

There’s a load of documentation available on the Postgresql docs page for different versions of the language. However, as mentioned before the RedShift version of Postgresql is very limited. Therefore another important documentation source is the RedShift language manual. You’ll find several links to important pages there that describe the differences between standard Postgresql and its RedShift implementation.

OK, let’s start. I’ll go through similar commands like in this post from the Elastic MapReduce series. We’ll simulate a very basic scenario where we’re measuring the response times of URLs and want to calculate an average by URL.

We’ll first check what databases are available in our cluster:

SELECT datname FROM pg_database WHERE datistemplate = false;

It gave me 3 databases: urls, dev and one called padb_harvest. We created urls in the previous post, dev is created for you automatically and I haven’t been able to figure out what padb_harvest is or does but we probably don’t need to worry about it too much.

So keep in mind that “urls” is the current database context so we don’t need to switch to it. All DB specific commands will be directed at that database.

Let’s create our first table:

create table if not exists url_response_times (url varchar, response_time int);

Let’s check the list of tables with the following command:

select * from information_schema.tables where table_schema = 'public' order by table_name;

It will show you the table we’ve just created.

Let’s insert a couple of records into the table:

insert into url_response_times(url, response_time) values ('www.facebook.com', 512);
insert into url_response_times(url, response_time) values ('www.facebook.com',742);
insert into url_response_times(url, response_time) values ('www.yahoo.com',983);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1532);
insert into url_response_times(url, response_time) values ('www.facebook.com',783);
insert into url_response_times(url, response_time) values ('www.facebook.com',912);
insert into url_response_times(url, response_time) values ('www.yahoo.com',437);
insert into url_response_times(url, response_time) values ('www.yahoo.com',1764);
insert into url_response_times(url, response_time) values ('www.yahoo.com',142);
insert into url_response_times(url, response_time) values ('www.facebook.com',824);
insert into url_response_times(url, response_time) values ('www.facebook.com',381);
insert into url_response_times(url, response_time) values ('www.yahoo.com',798);
insert into url_response_times(url, response_time) values ('www.yahoo.com',725);
insert into url_response_times(url, response_time) values ('www.yahoo.com',236);
insert into url_response_times(url, response_time) values ('www.facebook.com',315);
insert into url_response_times(url, response_time) values ('www.facebook.com',853);

Let’s make sure that the records have been entered:

select * from url_response_times;

Indeed, they have:

We’ll create another table which will hold the aggregations:

create table if not exists url_aggregations (url varchar, avg_response_time double precision);

We’ll calculate the response times averages and insert them into the above table as follows:

insert into url_aggregations (url, avg_response_time)
select url, avg(response_time) from url_response_times group by url;

We use the standard “avg” function to calculate the average response times.

Let’s check the records in url_aggregations:

select * from url_aggregations;

…and there you have it:

That’s wasn’t too difficult. You can see that Postgresql is not too different from MS SQL. As this is not a course dedicated to Posgresql I won’t go into too much detail on that front.

This is enough for now. We’ll start looking into the RedShift sections of the .NET AWS SDK in the next post.

View all posts related to Amazon Web Services and Big Data here.

Filed under .NET, Amazon, Big Data Tagged with amazon, amazon cloud, aws, big data, c#, redshift

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

2 Responses to Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

dineshramitc says:

April 8, 2015 at 10:37 am

Reblogged this on Dinesh Ram Kali..

Reply
Oyedepo Olufemi says:

November 27, 2018 at 4:20 pm

your articles are great but i’m still a bit confused concerning the usage of redshift + postgresql. i thought redshift on its own is a RDBMS available on AWS

Reply

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

2 Responses to Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

Share:

Related

2 Responses to Using Amazon RedShift with the AWS .NET API Part 3: connecting to the master node

Leave a comment Cancel reply

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow