This post gives a quick and very brief introduction on how to use MongoDB with Ruby. We will see how to install and connect to a MongoDB, how to populate the database with fake documents generated in just a couple of lines of code, and how to build some queries to gain insights into the stored documents.
Prerequisites
Let’s assume you already have Ruby installed.
Throughout this post, we are going to create a small playground consisting of a few files.
Let’s create a project directory for them: mkdir -p ~/projects/rumo
and cd ~/projects/rumo
(rumo for Ruby Mongo).
I also like to use rvm
to create a fresh and separate Ruby environment for each project: rvm --rvmrc --create 2.3.1
and cd ./
to load the configuration.
But that’s optional.
The next thing you need is MongoDB.
Using brew (package manager for macOS) it’s as easy as a brew install mongo
.
In any other case, like building MongoDB from source, there is excellent guidance available on the official website.
In case you haven’t read the (full) instruction on the official page, then here’s the gist how to start Mongo:
- MongoDB needs a directory where it can store all its data. For me, it’s:
mkdir -p ~/.mongodb/data/
- Open another shell and start MongoDB with the previously created data directory passed as an argument:
mongod --dbpath ~/.mongodb/data
.
Having MongoDB installed and RVM in place, it’s time to create the Ruby files for our playground. Our project file structure is going to look like this:
├── .rvmrc
├── Gemfile
├── Rakefile
└── lib
├── boot.rb
├── database.rb
└── document.rb
Create Gemfile
with the following lines:
source 'https://rubygems.org'
gem 'rake'
gem 'mongo'
gem 'faker' # for generating fake documents
We will use the faker
gem in a moment to generate test data to populate our MongoDB with, while the rake
gem will allow us to boot an IRB session with our project files already loaded.
The Rakefile
to achieve the latter looks like follows:
task default: %w[ console ]
task :console do
require 'irb'
require 'irb/completion'
require_relative './lib/boot.rb'
ARGV.clear
IRB.start
end
The lib/boot.rb
ties everything together and loads both the installed gems and our custom files (which we are going to see hereafter):
require 'mongo'
require 'json'
require 'faker'
require 'date'
require_relative 'document'
require_relative 'database'
Generating documents
To have something in the database to run queries against, we need data.
Faker fulfills exactly this need: generating data for all kinds of purposes.
The Faker::Book
class, for example, provides generators for author names, ISBNs, publishers, and much more.
We mainly use this to generate documents for a fictitious book store.
The below code should be sufficiently self-explanatory.
Create lib/document.rb
with the following contents:
module RuMo
class Document
def self.generate_many(qty = 5)
(1..qty).map { generate_one }
end
def self.generate_one
{ title: Faker::Book.title,
author: Faker::Book.author,
publisher: Faker::Book.publisher,
isbn: Faker::Code.isbn,
price: Faker::Commerce.price,
release_date: Faker::Date.between( Date.new(2000,1,1), Date.today ),
genres: (0..rand(4)).map { Faker::Book.genre }
}
end
end
end
Accessing the database
Our generated books need a place to be stored. Accessing Mongo and storing data is surprisingly easy:
- creating a connection via
Mongo::Client.new(<uri>)
, - accessing a collection by
Mongo::Client#[Symbol]
, and - inserting an array of documents with
Mongo::Collection#insert_many(Array)
.
module RuMo
class Database
attr_reader :name, :port
def initialize(name: 'test', port: 27017)
@name = name
@port = port
end
def conn
@conn ||= Mongo::Client.new("mongodb://127.0.0.1:#{port}/#{name}")
end
def collection
@collection ||= conn[:books]
end
def insert(documents = [])
collection.insert_many(documents)
end
end
end
Populating the database
Now that we have everything set up, let’s insert some documents and start writing some queries. Recall that we can directly boot into our project using the rake tasks we wrote:
rake
2.3.1 :001 > db = RuMo::Database.new
2.3.1 :002 > db.insert RuMo::Document.generate_many(100)
D, [2016-11-19T11:06:58.950593 #85952] DEBUG -- : MONGODB | 127.0.0.1:27017 | test.insert | STARTED | {"insert"=>"books", "documents"=>[{:title=>"I Sing the Body Electric", :author=>"Humberto Friesen", :publisher=>"Parragon", :isbn=>"777712218-5", :price=>47.51, :genres=>["Narrative nonfiction", "Classic"], :_id=>BSON::ObjectId('583024421c705c4fc09494...
D, [2016-11-19T11:06:58.952993 #85952] DEBUG -- : MONGODB | 127.0.0.1:27017 | test.insert | SUCCEEDED | 0.002316s
=> #<Mongo::BulkWrite::Result:0x007fe047969a40 @results={"n_inserted"=>100, "n"=>100, "inserted_ids"=>[BSON::ObjectId('583024421c705c4fc09494de'), <... and 99 more BSON::ObjectId>]}>
That’s it. From now on, we omit Mongo’s debug outputs.
Now that we got a couple of sample books in our database, we can start to build useful queries. Let’s start by finding books that cost not more than (i.e., lte: less than or equal to) 15 (whatever the unit of price in our system is):
2.3.1 :003 > affordable_books = db.collection.find({ price: { :$lte => 15 } })
=> #<Mongo::Collection::View:0x70300620577380 namespace='test.books' @filter={"price"=>{"$lte"=>15}} @options={}>
2.3.1 :004 > affordable_books.count
=> 16
2.3.1 :005 > affordable_books.first
=> {"_id"=>BSON::ObjectId('583024421c705c4fc094947e'), "title"=>"The Monkey's Raincoat", "author"=>"Prince Bosco", "publisher"=>"Target Books", "isbn"=>"964194353-7", "price"=>0.33, "genres"=>["Fiction narrative", "Humor", "Reference book", "Mythology"]}
Obviously, your numbers might vary.
Aggregation
Aggregations go beyond simply finding data: they allow you to transform and filter (hey map-reduce) your data, and also to apply some basic statistical analyses on them. We will take a look at two concrete examples of aggregations. Every aggregation consists of one to N operations which together form a pipeline. Each pipeline step can be skipped, occur once or even multiple times.
Most popular genres
Our first aggregation counts how often each genre is associated with the books in our database, sorted in descending order.
query =
[
{ :$unwind => '$genres' },
{
:$group => {
_id: '$genres',
count: { :$sum => 1 }
}
},
{ :$sort => { count: -1 } }
]
The query is comprised of three stages:
- The
$unwind
stage splits each into document by the specified field. More concrete and using the above example query: One document containing genres, say, Graphic Novel and Fiction, goes in, and two documents with one genre each come out. Each document still has thegenres
field, but now the first document has the valuegenre: 'Graphic Novel'
while the second one hasgenres: 'Fiction'
. - The
$group
stage groups the incoming documents bygenres
. The field (or number of fields) to group a set of documents by is specified through the_id
key. As part of this stage, we also count the number of documents in each group and store the result in thecount
field. - As the last step in the above pipeline, we sort the documents by the
count
field in descending order (-1
: descending,1
: ascending).
Use Mongo::Collection#aggregate
to run the above query:
2.3.1 :006 > db.collection.aggregate(query).each { |doc| puts doc }
{"_id"=>"Fiction narrative", "count"=>16}
{"_id"=>"Biography/Autobiography", "count"=>16}
...
{"_id"=>"Folklore", "count"=>2}
Number of books published per year
Our second aggregation is slightly more complex: we extract the publication year from within each document and count the number of books publishes each year. This query is enlightening insofar as it shows how to extract data from deep within a document.
query =
[
{
:$project => {
year: { :$year => '$release_date.year' }
}
},
{
:$group => {
_id: '$year',
books: { :$sum => 1 }
}
},
{ :$sort => { books: -1 } }
]
The query is comprised of three stages:
- The
$project
stage is essentially a map function: each incoming document is transformed into one outgoing document for the next pipeline stage. Put differently, consider a box full of candy bars with each candy bar consisting of n pieces. The unwind operation now breaks each candy bar into its pieces and puts them back into the box. This box is then passed to the next step in the pipeline. In our concrete example, we extract the year from therelease_date
using MongoDB’s built-in$year
aggregation operator and store the result in a field calledyear
. After this stage, each document contains two fields:_id
(that’s always included), and the extractedyear
. - The
$group
stage groups the incoming documents byyear
(recall: the fields to group the data by is specified through the_id
field), and captures the number of books per year in a newbooks
field. - As the last step in the above pipeline, we sort the documents by the
books
field in descending order (-1
: descending,1
: ascending).
We already know how to run the query, and after the above explanation, it’s also clear what it should return: a list of documents with two fields—_id
as the year and books
as the number of books sold in that year—sorted descending by the book count.
2.3.1 :007 > db.collection.aggregate(query).each { |doc| puts doc }
{"_id"=>2009, "books"=>12}
{"_id"=>2003, "books"=>12}
...
{"_id"=>2004, "books"=>3}
Conclusion
We saw how to install MongoDB and get it running, how easy it is to connect to the database from Ruby, and how queries and aggregations are structured. I hope you find this brief introduction and basic project helpful for your further explorations with Ruby and MongoDB. As a starter, you could try to use a more sophisticated wrapper around the plain MongoDB interface; Mongoid for example.