postgresql | Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 9: data warehousing and the star schema 3

April 13, 2015 1 Comment

Introduction

In the previous post we started formulating a couple of Postgresql statements to fill in the dimension tables and the aggregation values. We saw that it wasn’t particularly difficult to calculate some basic aggregations over combinations of URL and Customer. We ignored the calculation of the median and percentile values and set them to 0. I’ve decided to dedicate a post just for those functions as I thought they were a lot more complex than min, max and average.

Median in RedShift

Median is also a percentile value, it is the 50th percentile. So we could use the percentile function for the median as well but median has its own dedicated function in RedShift. It’s not a compact function, like min() where you can pass in one or more arguments and you get a single value.

Using Amazon RedShift with the AWS .NET API Part 8: data warehousing and the star schema 2

April 9, 2015 1 Comment

Introduction

In the previous post we discussed the basics of data warehousing and the different commonly used database schemas associated with it. We also set up a couple of tables: one raw data table which we filled with some raw data records, two dimension tables and a fact table.

In this post we’ll build upon the existing tables and present a couple of useful Postgresql statements in RedShift. Keep in mind that Postgresql in RedShift is very limited compared to the full version so you often need to be resourceful.

Fill in the dimension tables

Recall that we have 2 dimension tables: DimUrl and DimCustomer. Both are referenced from the fact table by their primary keys. We haven’t added any data into them yet. We’ll do that now.

Using Amazon RedShift with the AWS .NET API Part 7: data warehousing and the star schema

April 5, 2015 1 Comment

Introduction

In the previous post we dived into Postgresql statement execution on a RedShift cluster using C# and ODBC. We saw how to execute a single statement or many of them at once. We also tested a parameterised query which can protect us from SQL injections.

In this post we’ll deviate from .NET a little and concentrate on the basics of data warehousing and data mining in RedShift. In particular we’ll learn about a popular schema type often used in conjunction with data mining: the star schema.

Star and snowflake schemas

I went through the basic characteristics of star and snowflake schemas elsewhere on this blog, I’ll copy the relevant parts here.

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 9: data warehousing and the star schema 3

Using Amazon RedShift with the AWS .NET API Part 8: data warehousing and the star schema 2

Using Amazon RedShift with the AWS .NET API Part 7: data warehousing and the star schema

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Blogs I Follow

Exercises in .NET with Andras Nemes

Using Amazon RedShift with the AWS .NET API Part 9: data warehousing and the star schema 3

Share:

Using Amazon RedShift with the AWS .NET API Part 8: data warehousing and the star schema 2

Share:

Using Amazon RedShift with the AWS .NET API Part 7: data warehousing and the star schema

Share:

My profile

Andras Nemes

Verified Services

Follow my blog via email

Top Posts & Pages

History

My tweets

Keywords

Blogs I Follow