Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 2: MaxMind source files

Introduction

In the previous post we outlined the goals of this series and the tools that we’re going to use. We’ve got as far as downloading MaxMind’s free version of their geo-location source with IPs and longitude-latitude co-ordinates. We saw that the downloaded package had a number of CSV files.

In this post we’ll start off by looking at the structure of those files and how they are connected.

The source files

At the time of writing this post the ZIP package was just below 30MB in size and contained the files visible in the screenshot below. Unzip it, which will be a lengthy process due to the large files inside.

MaxMind downloaded source files

Those are quite large but are still smaller than the 546MB of the IPv4 “Blocks” CSV file of the paid version. If you already have an application designed for opening large files then I recommend you use that. If not then you can go for the preview version of EmEditor available here. It can effortlessly open large files in chunks.

In this series we’ll concentrate on IPv4 and lng/lat co-ordinates and ignore IPv6. Open the the file called GeoLite2-City-Blocks-IPv4.csv:

Screenshot from IPv4 range source file

The file contains millions of rows like that.

Let’s look at the columns:

  • network: the IP address ranges in CIDR format. We’ll look into this in more details later on in the series
  • geoname_id: the location ID where the IP belongs – we’ll see soon where this ID is stored
  • registered_country_geoname_id: location ID where the IP is registered
  • represented_country_geoname_id: location ID which the IP represents
  • is_anonymous_proxy,is_satellite_provider: whether the IP is a proxy or belongs to a satellite
  • postal_code,latitude,longitude: you probably understand what these mean

registered_country_geoname_id may look a bit mysterious but they are strongly related to the proxy and satellite provider flags. Here’s an example of a proxy in the CSV file:

2.62.9.177/32,,2017370,,1,0,,,

This IP doesn’t point to a “real” physical location. You’ll see that the “geoname_id” and the lng-lat co-ordinates are empty. However, we know that the proxy was registered in ID 2017370. A satellite location is similar:

5.11.17.0/24,,2635167,,0,1,,,

I copy the documentation of the different “country” definition from MaxMinds DEV page:

“We now distinguish between several types of country data. The country is the country where the IP address is located. The registered_country is the country in which the IP is registered. These two may differ in some cases.

Finally, we also include a represented_country key for some records. This is used when the IP address belongs to something like a military base. The represented_country is the country that the base represents. This can be useful for managing content licensing, among other uses.”

So where do these IDs point to? The answer lies in the other CSV files that come in different languages which you can see in the file names: “de”, “fr” etc. We’ll concentrate on the English version so open the one called GeoLite2-City-Locations-en.csv:

Screenshot from MaxMind Locations file

That’s the ID that the “Blocks” file is pointing at. The file contains more columns but my screen is not wide enough to show all of them. View all columns in the source to see what’s available: continent, country, region, city etc. You’ll see that in some places the source is incomplete. That’s the “price” you pay for the free version but it’s great for evaluation and code testing.

Our next goal is to put all that into DynamoDb. However, we cannot just copy-paste these files into DynamoDb tables. We’ll need to import the records in a special format that they fit the queries we’ll be executing and the libraries that we’ll use.

We’ll consider our strategy in the next post.

View all posts related to Amazon Web Services and Big Data here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

One Response to Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 2: MaxMind source files

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: