Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 7: querying the IPv4 range table

Introduction

In the previous post we loaded the limited IP range records into DynamoDb. As we’re only talking about about 250 records we could have added them in code one by one. However, that strategy would never work for the full MaxMind IP data set of 10 million records. So instead we looked at the built-in Import/Export functionality in DynamoDb. You’ll be able to go through the same process when you’re ready to import the full data set.

In this post we’ll see how to query the IP range database to extract the ID of the nearest geolocation. We’ll get to use the AWS Java SDK.

Querying DynamoDb

We won’t go into the details of querying DynamoDb in general. You can read about it the AWS documentation pages if you want to know more. We’ll concentrate on the specific query to get one data record from our test IP range table.

First off we’ll need the AWS Java SDK in our project. You can either download it from here or from the Maven repository if you have a Maven project:

<dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk</artifactId>
            <version>1.9.9</version>
</dependency>

It’s possible that there will be a higher version by the time you read this post.

Next we’ll need a client to DynamoDb. There are multiple ways to get hold of an object which can communicate with DynamoDb. The easiest is to build an AmazonDynamoDBClient object as follows:

String amazonAccessKey = "your-amazon-access-key";
String secretAccessKey = "your-secret-key-id";
String dynamoDbEndpoint = "dynamodb.eu-west-1.amazonaws.com"; //set this to the region where you have set up the DynamoDb table
AWSCredentials credentials = new BasicAWSCredentials(amazonAccessKey, secretAccessKey);
ClientConfiguration clientConfiguration = new ClientConfiguration().withMaxErrorRetry(5);
AmazonDynamoDBClient dynamoDbClient = new AmazonDynamoDBClient(credentials, clientConfiguration);
dynamoDbClient.setEndpoint(dynamoDbEndpoint);

An alternative is to save the credentials in a file called “credentials” with no file extension. On Windows I had to save this in my user directory in a folder called “.aws”. Note the ‘.’ in the folder name. So in my case I saved the file in the c:\users\andras.nemes\.aws folder. The file contents follows a special format to save the AWS keys:

[default]
aws_access_key_id = your-amazon-access-key
aws_secret_access_key = your-secret-key-id

You can also store this file on Amazon in your user profile. The following code should then work both locally and after deploying the application on Amazon:

public AmazonDynamoDBClient getDynamoDbClient()
{
    AWSCredentialsProvider credentialsProvider = null;
    AWSCredentials awsCredentials = null;
    String dynamoDbEndpoint = "dynamodb.eu-west-1.amazonaws.com";
    try
    {
        credentialsProvider = new InstanceProfileCredentialsProvider();
        awsCredentials = credentialsProvider.getCredentials();
    } catch (AmazonClientException e)
    {
        credentialsProvider = new ProfileCredentialsProvider();
        awsCredentials = credentialsProvider.getCredentials();
    }
    AmazonDynamoDBClient dynamoDbClient = new AmazonDynamoDBClient(awsCredentials);
    dynamoDbClient.setEndpoint(dynamoDbEndpoint);

    return dynamoDbClient;
}

Again, make sure you set the correct DynamoDb region endpoint otherwise the code won’t find your table. This page lists the region endpoints for every Amazon service that you can use in your code.

The following code will show you a way to find the geoname ID belonging to a single IP address. I selected an IP that exists in the limited IP range table so that I know for sure that we’ll get a valid result. We’ll go through the code in words afterwards:

public void testSingleIpLookup()
{
    String ip = "1.0.0.5";
    String ipRangeTable = "geo-ip-range-test";
    try
    {
        Pattern pattern = Pattern.compile(Pattern.quote("."));
        String[] split = pattern.split(ip);
        String headElement = split[0];
        long ipAsDecimal = convertIpToDecimalValue(ip);

        AmazonDynamoDBClient dynamoClient = getDynamoDbClient();
        DynamoDB dynamoDb = new DynamoDB(dynamoClient);            
        Table table = dynamoDb.getTable(ipRangeTable);
        RangeKeyCondition networkStartCondition = new RangeKeyCondition("network_start_integer")
                .le(ipAsDecimal);
        Map<String, Object> expressionAttributeValues = new HashMap<>();
        expressionAttributeValues.put(":val", ipAsDecimal);

        QuerySpec querySpec = new QuerySpec().withHashKey("network_head", Integer.parseInt(headElement))
                .withRangeKeyCondition(networkStartCondition)
                .withFilterExpression("network_last_integer >= :val")
                .withValueMap(expressionAttributeValues);
        ItemCollection<QueryOutcome> query = table.query(querySpec);
        Iterator<Item> iterator = query.iterator();
        if (iterator.hasNext())
        {
            Item next = iterator.next();
            String geonameId = next.get("geoname_id").toString();
            System.out.println("Found geoname id: " + geonameId);
        }

    }catch (Exception ex)
    {
        System.out.println("Exception while scanning the IP range table: " + ex.getMessage());
    }
}

We first declare the IP to be searched for and the name of the DynamoDb table. Next we find the head element and decimal form of the IP the same way as we did before. Here comes a reminder of the convertIpToDecimalValue method:

private long convertIpToDecimalValue(String ip) throws UnknownHostException
{
    byte[] bytes = InetAddress.getByName(ip).getAddress();
    String bi = new BigInteger(1, bytes).toString(2);
    long decimalValue = Long.parseLong(bi, 2);
    return decimalValue;
}

Then comes a series of query declarations. We want to use as many parameters as possible to narrow down the range of records that must be scanned: the head element, the lower and the upper limit of the IP range. If we don’t use a narrow enough search, especially if we leave out the head element, then the query will need to scan all records until it finds a match. If the matching record is located towards the end of the table then our query will almost certainly fail with an exception when scanning the full IP range table with millions of rows: either the read throughput or the upper limit of scanned rows – which is slightly above 12k – will be exceeded.

In the above case we want to find the record where the decimal value of the IP we’re looking for lies between the upper and lower limit of an IP range where the IP range head element is equal to the head element of the IP.

The query will return a collection of items. We’re only expecting one so we extract it using iterator.next().

If you run this code then geoname ID should be 2077456. IP “1.0.0.5” gives a decimal value of 16777221 which lies between the upper and lower limit of the following data record in DynamoDb:

DynamoDb record found based on query

We’ll go through a similar process for the longitude-latitude ranges start from the next post.

View all posts related to Amazon Web Services and Big Data here.

Advertisement

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: