Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 8: creating the lng/lat coordinates source file for DynamoDb

Introduction

In the previous post we successfully queried the limited IPv4 range table in DynamoDb and found the geoname ID that belongs to a single IP. We used 3 available integer properties in the table to narrow down the number of records that had to be scanned to reduce the query execution time and the risk for exceptions.

In this post we’ll start the same process for the lng/lat coordinate range. More specifically we’ll prepare the raw data file that can be uploaded into DynamoDb through S3. The process will be very similar to what we saw in this post where we created the IPv4 range source file. It is good idea to quickly re-scan that post to remind you of the process.

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 7: querying the IPv4 range table

Introduction

In the previous post we loaded the limited IP range records into DynamoDb. As we’re only talking about about 250 records we could have added them in code one by one. However, that strategy would never work for the full MaxMind IP data set of 10 million records. So instead we looked at the built-in Import/Export functionality in DynamoDb. You’ll be able to go through the same process when you’re ready to import the full data set.

In this post we’ll see how to query the IP range database to extract the ID of the nearest geolocation. We’ll get to use the AWS Java SDK.

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 6: uploading IPv4 range to DynamoDb

Introduction

In the previous post we successfully created a limited IPv4 range file ready to be uploaded to DynamoDb. We saw how the relevant bits were extracted from the reduced subset of the MaxMind CSV source file and how the DynamoDb-specific input file was created.

In this post we’ll see how to upload the source file to DynamoDb using the bulk insertion tools available there. We’ll only import our limited test data but the same steps apply for large data sets as well.

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 5: creating the IPv4 source file for DynamoDb

Introduction

In the previous post we went through our strategy to save the longitude-latitude coordinates in DynamoDb for our geo-spatial queries later on. We said that we would save the records in DynamoDb in a way so that it fits queries according to a library designed by AWS which in turn uses a geo-library from Google.

In this post we’ll finally see some action. We’re ready to format an upload the IP range to DynamoDb. Actually we’ll show the techniques using only a small subset of the MaxMind raw data source. I strongly recommend you follow the same strategy and not try to upload 10 million rows at once. Make sure the process works for a small subset from start to finish and then go for the real thing. The steps outlined in this series will also apply to the full, paid version of the CSV source.

Read more of this post

Big Data series: a summary of Amazon Big Data tools we have discussed

Introduction

We have gone through a lot of material about Big Data on this blog. This post summarises each Amazon Cloud component one by one, what they do and what their roles are in a Big Data architecture.

The components

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 4: lng/lat range strategy

Introduction

In the previous post we discussed our strategy to save the IP ranges in DynamoDb. We saw that it would be very inefficient to store the IPs as strings and run our queries based on some string manipulation. Instead we’ll store the IP ranges as lower and upper limit integers.

In this post we’ll discuss our strategy to save the longitude-latitude ranges for cities.

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 3: IPv4 range strategy

Introduction

In the previous post we went through the details of the CSV source files that show the IP and lng/lat ranges and the actual locations. We saw that the two source files are linked by the location ID.

The next task is to import the source into DynamoDb. Recall that we want to handle queries based on IPs and lng/lat pairs separately, those are the primary goals of this series. The way to query an IP database is very different from querying a lng/lat database. An IP will fit into some IP range and we’d like to find that record. Whereas if you have a lng/lat co-ordinate pair and would like to find the nearest city/school/hospital/etc. within a certain radius then that query will involve some complex maths instead.

Read more of this post

Using Amazon RedShift with the AWS .NET API Part 10: RedShift in Big Data

Introduction

In the previous post we discussed how to calculate the more complex parts of the aggregation script: the median and nth percentile if the URL response time.

This post will take up the Big Data thread where we left off at the end of the series on Amazon S3. We’ll also refer to what we built [at the end of the series on Elastic MapReduce]. That post took up how to run an aggregation job via the AWS .NET SDK on an available EMR cluster. Therefore the pre-requisite of following the code examples in this post is familiarity with what we discussed in those topics.

In this post our goal is to show an alternative to EMR. We’ll also see how to import the raw data source from S3 into RedShift.

Read more of this post

Using Amazon RedShift with the AWS .NET API Part 9: data warehousing and the star schema 3

Introduction

In the previous post we started formulating a couple of Postgresql statements to fill in the dimension tables and the aggregation values. We saw that it wasn’t particularly difficult to calculate some basic aggregations over combinations of URL and Customer. We ignored the calculation of the median and percentile values and set them to 0. I’ve decided to dedicate a post just for those functions as I thought they were a lot more complex than min, max and average.

Median in RedShift

Median is also a percentile value, it is the 50th percentile. So we could use the percentile function for the median as well but median has its own dedicated function in RedShift. It’s not a compact function, like min() where you can pass in one or more arguments and you get a single value.

Read more of this post

Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 2: MaxMind source files

Introduction

In the previous post we outlined the goals of this series and the tools that we’re going to use. We’ve got as far as downloading MaxMind’s free version of their geo-location source with IPs and longitude-latitude co-ordinates. We saw that the downloaded package had a number of CSV files.

In this post we’ll start off by looking at the structure of those files and how they are connected.

The source files

Read more of this post

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: