Harvesting SNAP Data with Cloud9 and MongoDB (Part 1)

Overview

In my last post, I discussed SNAP data. In this post, we will cover the process of harvesting USDA SNAP CSV data. We will store data in MongoDB and augment it to support fast geo queries. These geo queries will be used to locate nearby SNAP stores from a particular location.

We will exploit two Cloud-hosted services to reduce the amount of effort it takes to create an environment for working with the data. For our purposes, the free plan of both services will be adequate.

  • Cloud9 is a Cloud-hosted IDE. We will use the shell environment it provides to perform the tasks necessary to download SNAP data and import it into a mongo database.

  • MongoLab provides a Cloud-hosted MongoDB service. We will use it to import and serve SNAP data to support our queries.

Cloud9

Go to Cloud9 and sign up using your GitHub account.

Create a new workspace

Create a Cloud9 workspace

Name your new workspace

Call it “snapdb” and select the “Custom” option (which gives you a barebones workspace with just a README.md file). Cloud9 will take a moment while it provisions your new cloud-hosted development workspace.

Name your Cloud9 workspace

Open your new workspace

Click the “Start Editing” button. Your workspace will open in a new window (or tab) and initialize.

Open workspace

Prepare your workspace

Right-click on README.md and choose “Delete” from the context menu to get rid of it.

Delete README

Right-click on the snapdata workspace folder and create a new file. Call it “notes.md”. Double-click to open the file. We’ll use the file to save notes about our mongo database that we’ll create in the next step.

MongoLab

Create a new database

Go to MongoLab and sign up for an account. Once you’re signed in, create a new database.

Create database

Create a new database - configuration options

Choose a free sandbox database on Amazon; name it “snapdb”.

Configure database

Open the database page

Open database page

Add a database user

To access the database from a client, we need to create a database user. The database user credentials will be used when connecting to snapdb. Click the link that says “Click here”.

Create a database user

Set database user credentials

Choose “apiuser” for the username and “snapdb” for the password.

Set database user credentials

Save the database information

You’re done with database creation. Save the connection information and database credentials in your Cloud9 notes.

Database info

Database notes

Next steps

We now have everything in place to get SNAP data and loaded into a mongo database. In our next post, we’ll walk through the steps of downloading, importing, transforming, and querying SNAP data interactively from the Cloud9 shell.