Introduction to MongoDb with .NET part 5: data import

Introduction

In the previous post we started discussing queries in MongoDb. Specifically we looked at the find and findOne functions. Both can accept a JSON parameter – a JSON document to be exact – to limit the result set returned. find() returns all documents that match the filter or return all documents if there was no filter provided. findOne will always return a single document even if there are more that match the criteria. findOne can be very useful if you’d like to get familiar with a collection by viewing one of its documents. We’ve also quickly looked at two additional functions. The pretty() function produces a better formatted JSON result set on the screen whereas the count() function returns the number of documents within a collection.

In this post we’ll step back a little from querying and instead look at how to import data into a MongoDb database. Specifically we’ll create two real-life collections. We don’t want to keep adding the records ourselves, that’s very tedious. There are at least two different readily available and importable MongoDb collections. The goal is to be able to run meaningful and real-life queries against realistic data sets.

Importing JSON collections

In this section we’ll look at a tool called mongoimport with which we can import JSON documents saved in a JSON file to the Mongo server. There’s a related tool called mongorestore. It performs a similar data import task like mongoimport but it’s used for binary BSON files. We won’t go into mongorestore in this post but its usage is just as simple as that of mongoimport. You can find more information about mongorestore here.

Mongoimport is a “tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool.” It’s available in the same bin folder as mongo.exe and mongod.exe.

There are at least two downloadable record sets that can be imported into MongoDb:

We’ll simulate how to restore a database from a previously exported json file.

Let’s start with the restaurants collection. The json file can be viewed on this web page. Copy the JSON content from the browser and save it in a file called restaurants.json on your hard drive where you easily find it, e.g. directly on your main drive, like C:\restaurants.json. The full file path to the restaurants data set will be important in a second. I saved it in my users folder, i.e. c:\users\andras.nemes\.

Next open two command prompts. Start mongod.exe in one of them. In the other first navigate to the folder location with “cd” where you’ve saved the data set. Then run the following command from the other command prompt:

mongoimport --db model --collection restaurants --drop --file restaurants.json

The above command will import the restaurants data set into a database called “model” and collection called “restaurants”. The “restaurants” collection will be dropped if it exists so that we can always recreate the data from scratch. If everything goes well then you’ll see the following output in the command prompt:

[current date] connected to: localhost
[current date] dropping: model.restaurants
[current date] imported 25359 documents

Awesome, let’s continue with the ZIP codes collection. The process is identical to the one above, let’s go through it quickly.

  1. Copy the JSON from this link and save it somewhere as zipcodes.json
  2. Run the following mongoimport command from the same console as above: mongoimport –db model –collection zipcodes –drop –file zipcodes.json
  3. Check the response, it should say “imported 29353 documents”

Let’s use what we’ve learnt so far and check a couple of things.

Connect to the server with “mongo.exe” from the command prompt where you executed the mongoimport commands. We’ll first view the available databases:

show dbs

This command should at least list “local” and “model”, so the model database has been created. Next we’ll switch to the model database and ask it list its collections:

use model
show collections

This should result in two collections being shown: restaurants and zipcodes, just as expected.

The collections

Let’s look at one document in each collection to get some idea what they look like:

db.restaurants.findOne()

Here’s a restaurant:

{
        "_id" : ObjectId("56edc2ff03a1cd840734dba8"),
        "address" : {
                "building" : "2780",
                "coord" : [
                        -73.98241999999999,
                        40.579505
                ],
                "street" : "Stillwell Avenue",
                "zipcode" : "11224"
        },
        "borough" : "Brooklyn",
        "cuisine" : "American",
        "grades" : [
                {
                        "date" : ISODate("2014-06-10T00:00:00Z"),
                        "grade" : "A",
                        "score" : 5
                },
                {
                        "date" : ISODate("2013-06-05T00:00:00Z"),
                        "grade" : "A",
                        "score" : 7
                },
                {
                        "date" : ISODate("2012-04-13T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                },
                {
                        "date" : ISODate("2011-10-12T00:00:00Z"),
                        "grade" : "A",
                        "score" : 12
                }
        ],
        "name" : "Riviera Caterer",
        "restaurant_id" : "40356018"
}

You’ll probably understand most properties. I’m not actually sure what “grade” means in the grades array. All grades have a grade “A” but different scores.

There are a couple of new features:

  • The ISODate function can parse string dates and also return the current date time. E.g. if you type “ISODate()” in the Mongo client and press enter it will respond with the current UTC date such as ISODate(“2016-03-21T19:44:17.749Z”)
  • The restaurant has both an Object ID and a numeric restaurant ID. We’ve come back to the discussion we had about the ID field of a domain object. The above JSON demonstrates a mixed strategy. We let MongoDb assign its own object ID but we also supply our own numeric ID. It’s very likely that our client .NET – or Java, Python, etc. – code will ignore the object ID and only use the restaurant ID for CRUD operations, i.e. SELECT, UPDATE, INSERT and DELETE. We’ll see an example of that later on in the series when it’s time to discuss the .NET MongoDb driver

Next we’ll get familiar with the ZIP codes collection:

db.zipcodes.findOne()

Here’s an example document:

{
        "_id" : "01001",
        "city" : "AGAWAM",
        "loc" : [
                -72.622739,
                42.070206
        ],
        "pop" : 15338,
        "state" : "MA"
}

I’ll just copy the explanation of each property from the relevant MongoDb documentation:

  • The _id field holds the zip code as a string.
  • The city field holds the city name. A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.
  • The state field holds the two letter state abbreviation.
  • The pop field holds the population.
  • The loc field holds the location as a latitude longitude pair.

Great, we’ll use these two collections for the upcoming query examples.

Read the next part here.

You can view all posts related to data storage on this blog here.

Advertisement

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Elliot Balynn's Blog

A directory of wonderful thoughts

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

WEB APPLICATION DEVELOPMENT TUTORIALS WITH OPEN-SOURCE PROJECTS

Once Upon a Camayoc

Bite-size insight on Cyber Security for the not too technical.

%d bloggers like this: