Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 5: creating the IPv4 source file for DynamoDb

Introduction

In the previous post we went through our strategy to save the longitude-latitude coordinates in DynamoDb for our geo-spatial queries later on. We said that we would save the records in DynamoDb in a way so that it fits queries according to a library designed by AWS which in turn uses a geo-library from Google.

In this post we’ll finally see some action. We’re ready to format an upload the IP range to DynamoDb. Actually we’ll show the techniques using only a small subset of the MaxMind raw data source. I strongly recommend you follow the same strategy and not try to upload 10 million rows at once. Make sure the process works for a small subset from start to finish and then go for the real thing. The steps outlined in this series will also apply to the full, paid version of the CSV source.

Preparation

Open the Blocks-IPv4 CSV file, copy some records from it over to another CSV file and save it somewhere on disk. Make sure you copy the headers as well. I’ll go with the following range in the sample:

From the first record…

1.0.0.0/24,2077456,2077456,,0,0,,-27.0000,133.0000

…until the end of the 1.0.x range i.e….

1.0.255.0/24,1151254,1605651,,0,0,83110,7.9833,98.3667

The source file gives 275 records at the time of writing this post. I saved the file as IPv4-range-sample.csv.

Next, you’ll need a Java project in your preferred IDE. I use NetBeans for all Java development. I’ll go with a Maven application as we’ll need to be able to download a couple of libraries from the Maven repository.

Transforming the IPv4 Block file

Actually we won’t transform the source file but rather read the necessary elements from it and create an input file for DynamoDb ready to be imported from Amazon S3. Here’s a short description of the steps we’re going to take:

  • Create the source file for DynamoDb based on the reduced IPv4 sample
  • Upload it to S3
  • Import it from S3 to DynamoDb using the built-in bulk insertion tool in DynamoDb

Before I present any code let’s see step by step what it will need to carry out:

  • Read the reduced CSV source file line by line
  • Extract the first and second columns, i.e. the CIDR address and the geoname ID
  • Convert the CIDR address into lower and upper limit IPs, i.e. we’ll get two IP numbers out of one CIDR address
  • Convert the IPs into their decimal representations
  • Extract the head element of the IP, i.e. the first digit
  • Use all those elements to build a string that can be attached to a DynamoDb-formatted JSON file. DynamoDb cannot just be fed any textual source file. It needs to clearly show the boundaries of each data record and its type. In this example we only have numeric fields so all of them will be denoted by “n”. The individual elements must be delimited by end-of-text and start-of-text characters, denoted by 0x03 and 0x02

There’s an Apache commons library that can convert a CIDR address into lower and higher limit IPs. You’ll need to add the following commons-net library into your Java project:

http://mvnrepository.com/artifact/commons-net/commons-net/2.0

If you’ve started a Maven project then your POM file should include commons-net as follows:

<dependency>
       <groupId>commons-net</groupId>
       <artifactId>commons-net</artifactId>
       <version>2.0</version>
</dependency>

An object called SubnetUtils will be used from that library. The other libraries used in this code block are all part of the standard Java SDK.

Insert the following code blocks into your project:

private static void createIpRangeSourceFileForDynamoDb()
{
    String sourceFileFullPath = "C:\\path-to-your-limited-csv-source-file\\IPv4-range-sample.csv";
    String targetFileFullPath = "C:\\path-to-your-limited-csv-source-file\\GeoIP2-City-Blocks-IPv4-S3-ImportSourceFile.json";

    char endOfTextCharacter = 0x03;
    char startOfTextCharacter = 0x02;

    try
    {
        File targetFile = new File(targetFileFullPath);
        if (!targetFile.exists())
        {
            boolean createNewFile = targetFile.createNewFile();
            if (!createNewFile)
            {
                throw new IOException("Could not create target file");
            }
        }
        InputStream fis = new FileInputStream(sourceFileFullPath);
        BufferedReader br = new BufferedReader(new InputStreamReader(fis, Charset.forName("UTF-8")));
        String line;
        int lineCounter = 0;
        try (PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(targetFile, true))))
        {
            while ((line = br.readLine()) != null)
            {
                if (lineCounter == 0)
                {
                    lineCounter++;
                    continue;
                }
                lineCounter++;
                try
                {
                    String[] columns = line.split(",");
                    String ipCidr = columns[0];
                    String geonameId = columns[1];
                    SubnetUtils subnetUtils = new SubnetUtils(ipCidr);
                    SubnetUtils.SubnetInfo info = subnetUtils.getInfo();
                    String lowAddress = info.getLowAddress();
                    String highAddress = info.getHighAddress();
                    Pattern pattern = Pattern.compile(Pattern.quote("."));
                    String[] split = pattern.split(lowAddress);
                    String headElement = split[0].trim();
                    long lowIpAsDecimal = convertIpToDecimalValue(lowAddress);
                    long highIpAsDecimal = convertIpToDecimalValue(highAddress);
                    StringBuilder rowBuilder = new StringBuilder();
                    rowBuilder.append("geoname_id").append(endOfTextCharacter)
                                .append("{\"n\":\"").append(geonameId).append("\"}")
                                    .append(startOfTextCharacter).append("network_head").append(endOfTextCharacter)
                                    .append("{\"n\":\"").append(headElement).append("\"}")
                                    .append(startOfTextCharacter).append("network_start_integer").append(endOfTextCharacter)
                                    .append("{\"n\":\"").append(lowIpAsDecimal).append("\"}")
                                    .append(startOfTextCharacter).append("network_last_integer").append(endOfTextCharacter)
                                    .append("{\"n\":\"").append(highIpAsDecimal).append("\"}")
                                    .append(System.lineSeparator());
                    out.print(rowBuilder.toString());
                    System.out.println();
                }
                catch (Exception ex)
                {
                    System.out.println(ex.getMessage());
                }
            }
        }
    }catch (IOException ex)
    {
        System.out.println(ex.getMessage());
    }

    System.out.println("File creation done");
}

private static long convertIpToDecimalValue(String ip) throws UnknownHostException
{
    byte[] bytes = InetAddress.getByName(ip).getAddress();
    String bi = new BigInteger(1, bytes).toString(2);
    long decimalValue = Long.parseLong(bi, 2);
    return decimalValue;
}

It’s probably best if you run that code in Debug mode and see what happens exactly. If everything goes well then you should have a .json file which if opened in Notepad++ should look as follows:

DynamoDb compatible IPv4 source file

Upload the file to S3 within some folder. Make sure that this input file is the only object in that folder. You can already now create another empty folder called “logs” where the DynamoDb import process will send the log messages.

We’ll see how to upload these records into DynamoDb in the next post.

View all posts related to Amazon Web Services and Big Data here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

Elliot Balynn's Blog

A directory of wonderful thoughts

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

My goal with this blog is to offend everyone in the world at least once with my words… so no one has a reason to have a heightened sense of themselves. We are all ignorant, we are all found wanting, we are all bad people sometimes.

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: