Using Amazon DynamoDb for IP and co-ordinate based geo-location services part 11: uploading the geolocation range to DynamoDb
May 16, 2015 Leave a comment
Introduction
In the previous post we successfully queried the coordinate range database in DynamoDb. We used the query endpoints that are built into the AWS geo-location library to find the data records within the radius around a centre point.
Where are we now?
We’ve got quite far with our project. We have the ability to query an IPv4 and coordinate range table in DynamoDb. We can extract a geoname ID that belongs to either an IP or a latitude-longitude pair. The next step is to dress up those IDs with real location data such as “Stockholm” or “Tehran”.
We’ve looked at the contents of the MaxMind geolocation CSV data source in this post. We saw that it contained a number of geographical properties such as continent, time zone and city. It’s up to you how much of those properties you’d like to save in the database. In our own real life example we store the following columns in DynamoDb:
- geoname ID
- continent
- country ISO code
- country
- subdivision one
- subdivision two
- time zone
- city
We’ll follow this structure in this post but if you later need to get hold of e.g. the continent code as well then you can obviously extend the code accordingly.
Geolocation range in DynamoDb
We’ve now seen two full examples of how to create a DynamoDb import source file and how to import it into DynamoDb using S3 and the import/export mechanism available in DynamoDb. In this post I’ll only present the Java code to create the import source file and the geolocation range table structure for DynamoDb. You can figure out the rest based on what we’ve seen so far.
As usual, we’ll only test with a subset of the full “City Locations-en.csv” source file. Create a file called “City-location-sample.csv”. I will go with the following random records:
geoname_id,locale_code,continent_code,continent_name,country_iso_code,country_name,subdivision_1_iso_code,subdivision_1_name,subdivision_2_iso_code,subdivision_2_name,city_name,metro_code,time_zone
1392,en,AS,Asia,IR,Iran,21,”Mazandaran”,,,Shahr,,Asia/Tehran
6349,en,AS,Asia,IR,Iran,30,”Razavi Khorasan”,,,Shadman,,Asia/Tehran
104625,en,AS,Asia,SA,”Saudi Arabia”,03,”Al Madinah al Munawwarah”,,,”Madinat Yanbu` as Sina`iyah”,,Asia/Riyadh
104699,en,AS,Asia,SA,”Saudi Arabia”,08,”Northern Borders”,,,Linah,,Asia/Riyadh
104923,en,AS,Asia,SA,”Saudi Arabia”,02,”Makkah Province”,,,Khulays,,Asia/Riyadh
105072,en,AS,Asia,SA,”Saudi Arabia”,14,”Mintaqat `Asir”,,,”Khamis Mushait”,,Asia/Riyadh
2984496,en,EU,Europe,FR,France,A,Alsace,68,Haut-Rhin,Rammersmatt,,Europe/Paris
2984513,en,EU,Europe,FR,France,J,”Île-de-France”,78,Yvelines,Rambouillet,,Europe/Paris
2984516,en,EU,Europe,FR,France,M,Lorraine,88,Vosges,Rambervillers,,Europe/Paris
2984517,en,EU,Europe,FR,France,C,Auvergne,03,Allier,Rambert,,Europe/Paris
3200546,en,EU,Europe,SI,Slovenia,,,,,Gmajnica,,Europe/Ljubljana
3200560,en,EU,Europe,RS,Serbia,,,,,Glozan,,Europe/Belgrade
3200582,en,EU,Europe,SI,Slovenia,,,,,Globoko,,Europe/Ljubljana
3200600,en,EU,Europe,HR,Croatia,03,”Sisacko-Moslavacka Zupanija”,,,Glina,,Europe/Zagreb
3200742,en,EU,Europe,HR,Croatia,01,”Zagreb County”,,,Galgovo,,Europe/Zagreb
3200799,en,EU,Europe,HR,Croatia,08,”Primorsko-Goranska Zupanija”,,,Fuzine,,Europe/Zagreb
3200802,en,EU,Europe,RS,Serbia,,,,,Futog,,Europe/Belgrade
4349727,en,NA,”North America”,US,”United States”,MD,Maryland,,,”Bryans Road”,511,America/New_York
4349755,en,NA,”North America”,US,”United States”,MD,Maryland,,,Buckeystown,511,America/New_York
4349903,en,NA,”North America”,US,”United States”,MD,Maryland,,,Burtonsville,511,America/New_York
9320468,en,EU,Europe,DE,Germany,RP,Rheinland-Pfalz,,,Kirchsahr,,Europe/Berlin
9403875,en,EU,Europe,NO,Norway,12,”Hordaland Fylke”,,,Valestrandfossen,,Europe/Oslo
9403879,en,EU,Europe,NO,Norway,14,”Sogn og Fjordane Fylke”,,,”Ovre Ardal”,,Europe/Oslo
9403962,en,EU,Europe,NO,Norway,12,”Hordaland Fylke”,,,”Vestra Mosterhamn”,,Europe/Oslo
9404001,en,EU,Europe,NO,Norway,11,”Rogaland Fylke”,,,Hauge,,Europe/Oslo
9404004,en,EU,Europe,NO,Norway,02,Akershus,,,Oppegard,,Europe/Oslo
9404172,en,EU,Europe,NO,Norway,12,”Hordaland Fylke”,,,Soreidgrenda,,Europe/Oslo
9404188,en,EU,Europe,NO,Norway,02,Akershus,,,Brarud,,Europe/Oslo
9408331,en,NA,”North America”,CA,Canada,QC,Quebec,,,Saint-Antonin,,America/Toronto
9521847,en,NA,”North America”,MX,Mexico,MEX,”Estado de Mexico”,,,Tonatico,,America/Mexico_City
9536679,en,EU,Europe,NO,Norway,20,”Finnmark Fylke”,,,Akkarfjord,,Europe/Oslo
9536767,en,EU,Europe,NO,Norway,14,”Sogn og Fjordane Fylke”,,,”Byrknes Nordre”,,Europe/Oslo
9858348,en,NA,”North America”,US,”United States”,MI,Michigan,,,”Harsens Island”,505,America/Detroit
9865215,en,AS,Asia,JP,Japan,18,Fukui,,,Fuku,,Asia/Tokyo
9865229,en,AS,Asia,JP,Japan,18,Fukui,,,Togo-nikacho,,Asia/Tokyo
9866845,en,EU,Europe,IE,Ireland,L,Leinster,MH,”Co Meath”,Irishtown,,Europe/Dublin
9875219,en,NA,”North America”,MX,Mexico,MEX,”Estado de Mexico”,,,”La Gloria”,,America/Mexico_City
9883562,en,EU,Europe,ES,Spain,EX,Extremadura,BA,Badajoz,Olivenza,,Europe/Madrid
Here’s the code to turn the raw data sample file into a DynamoDb import file. Note that we’ll use “unknown” in case some element is missing. Also, there are some elements within quotations such as “North America”. The quotations will be removed during the source file transformation process:
private static void createLocationRangeSourceFile() { String sourceFileFullPath = "C:\\path-to-limited-source.file\\City-location-sample.csv"; String targetFileFullPath = "C:\\path-to-limited-source.file\\GeoIP2-City-Location-S3-ImportSourceFile.json"; char endOfTextCharacter = 0x03; char startOfTextCharacter = 0x02; try { File targetFile = new File(targetFileFullPath); if (!targetFile.exists()) { boolean createNewFile = targetFile.createNewFile(); if (!createNewFile) { throw new IOException("Could not create target file"); } } InputStream fis = new FileInputStream(sourceFileFullPath); BufferedReader br = new BufferedReader(new InputStreamReader(fis, Charset.forName("UTF-8"))); String line; int lineCounter = 0; try (PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(targetFile, true)))) { while ((line = br.readLine()) != null) { if (lineCounter == 0) { lineCounter++; continue; } lineCounter++; try { String[] columns = line.split(","); String geonameId = columns[0]; String continent = columns[3].equals("") ? "unknown" : columns[3]; String country_code = columns[4].equals("") ? "unknown" : columns[4]; String country_name = columns[5].equals("") ? "unknown" : columns[5]; String subdivision_1 = columns[7].equals("") ? "unknown" : columns[7]; String subdivision_2 = columns[9].equals("") ? "unknown" : columns[9]; String city = columns[10].equals("") ? "unknown" : columns[10]; String timezone = columns[12].equals("") ? "unknown" : columns[12]; continent = removeQuotes(continent); country_name = removeQuotes(country_name); subdivision_1 = removeQuotes(subdivision_1); subdivision_2 = removeQuotes(subdivision_2); city = removeQuotes(city); StringBuilder rowBuilder = new StringBuilder(); rowBuilder.append("geoname_id").append(endOfTextCharacter) .append("{\"n\":\"").append(geonameId).append("\"}") .append(startOfTextCharacter).append("city_name").append(endOfTextCharacter) .append("{\"s\":\"").append(city).append("\"}") .append(startOfTextCharacter).append("continent_name").append(endOfTextCharacter) .append("{\"s\":\"").append(continent).append("\"}") .append(startOfTextCharacter).append("country_iso_code").append(endOfTextCharacter) .append("{\"s\":\"").append(country_code).append("\"}") .append(startOfTextCharacter).append("country_name").append(endOfTextCharacter) .append("{\"s\":\"").append(country_name).append("\"}") .append(startOfTextCharacter).append("time_zone").append(endOfTextCharacter) .append("{\"s\":\"").append(timezone).append("\"}") .append(startOfTextCharacter).append("subdivision_1_name").append(endOfTextCharacter) .append("{\"s\":\"").append(subdivision_1).append("\"}") .append(startOfTextCharacter).append("subdivision_2_name").append(endOfTextCharacter) .append("{\"s\":\"").append(subdivision_2).append("\"}") .append(System.lineSeparator()); out.print(rowBuilder.toString()); System.out.println(); } catch (Exception ex) { System.out.println(ex.getMessage()); } } } } catch (IOException ex) { System.out.println(ex.getMessage()); } System.out.println("File creation done"); } private static String removeQuotes(String source) { String res = source; if (res.startsWith("\"")) { res = res.substring(1); } if (res.endsWith("\"")) { res = res.substring(0, res.length() - 1); } return res; }
Here’s a portion of the file generated by the above code:
Here are the details of the geolocation range table in DynamoDb:
- Name: geo-location-range-test
- Primary key type: hash
- Hash attribute name: “geoname_id” of type Number
- (Click Continue to come to the indexes page)
- Skip this page, we won’t have any additional global or local index in this table
- (Click continue)
- Specify read and write capacity units at 5. The same remarks apply as before: the write throughput should be much larger when you’re ready to import the real database. The read throughput should be large as well in production.
- (Click continue)
- You can select to set up a basic alarm – this is not vital for this demo exercise but is very useful for production databases. I let you decide whether you want to be notified in case of a throughput limit breach
- Click continue to reach the Review pane and click Create
Wait for the table to reach status ACTIVE.
Finally go through the same process to insert the records into DynamoDb:
- Save the import source file in S3
- Use the Import/Export button in DynamoDb to set up an Import job
- Wait for the job to finish
If all goes well then you should have the records available in DynamoDb:
In the next post we’ll finish the series by extracting the actual geo-location from DynamoDb.
View all posts related to Amazon Web Services and Big Data here.