Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 3: IPv4 range strategy

Introduction

In the previous post we went through the details of the CSV source files that show the IP and lng/lat ranges and the actual locations. We saw that the two source files are linked by the location ID.

The next task is to import the source into DynamoDb. Recall that we want to handle queries based on IPs and lng/lat pairs separately, those are the primary goals of this series. The way to query an IP database is very different from querying a lng/lat database. An IP will fit into some IP range and we’d like to find that record. Whereas if you have a lng/lat co-ordinate pair and would like to find the nearest city/school/hospital/etc. within a certain radius then that query will involve some complex maths instead.

Strategy

As a result we’ll divide the “Blocks” CSV file into two different tables in DynamoDb: one for the IP range and another one for the lng/lat range.

In this post we’ll go through the strategy to store the IP ranges.

IP range table

Let’s consider first how we want to save the IP range data. The IP ranges in the the source are represented as strings in their CIDR formats, e.g. 1.0.0.0/24. /24 denotes the subnet. The entry 1.0.0.0/24 means the following range:

From 1.0.0.1 to 1.0.0.254

We’ll later see how to convert the CIDR format into lower and upper range IP addresses later on.

Say you get get the IP address 1.0.0.22. You could do some unwieldy string manipulation query and find that it lies within that range. However, that query will be extremely inefficient. Keep in mind that the database will have about 10 million rows. It would take a considerable amount of time even in a fast, full-blown relational database like MySQL to find that very row. In DynamoDb you’ll need to face some extra limitations such as read throughput and the upper limit of number of rows to scan within one query which is about 12000 rows at once. Therefore we need to come up with a query strategy that needs to scan as few rows as possible without involving strings.

Luckily for us this is nothing new and there’s a well-tested way of achieving this. The IPs need to be turned into their decimal representations. You can read the details of what that means and how it is achieved in this post.

So the idea is that we turn the IP ranges into lower and upper limit integers – or longs to be exact. Furthermore we’ll take the first element in the IP range, e.g. “127” of 127.0.0.1 and use it as part of a composite key in a DynamoDb IP range table. The other element in the composite key will be the lower limit of the IP range. Here’s an extract from our own IP range table to help you visualise what I mean:

IP range table in DynamoDb

network_head is the first digit in the IP range and will act as the primary hash key. network_start_integer represents the lower limit of the IP as decimal and will act as the primary range key in our DynamoDb table. If you don’t know what is meant by those key types then read this introduction on this blog.

You’ll recognise the geoname_id column. network_last_integer is the decimal representation of the upper limit of the IP range.

Hence our IP query can be based on integers only. We’ll see exactly how it’s done using the AWS Java SDK later on.

We’ll take a look at our strategy for the lng/lat range in the next part.

View all posts related to Amazon Web Services and Big Data here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

iReadable { }

.NET Tips & Tricks

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: