Join Query in MongoDB

As a general rule, you should try to avoid joining as much as possible in MongoDB. While in traditional relational databases you try to avoid repetition of data and therefore joins are necessary, it is perfectly all right to repeat data in MongoDB and consequently avoid joins. The hardest part is to step away from the relational mindset and to think in terms of organizing your data in so called documents. That may not be easy at first, especially if you have been working for many years with relational data structures.

All that being said, there are still cases where joins are acceptable in MongoDB. At this time MongoDB joins can only be done programmatically, however, since none of the drivers provide any join capabilities. Since the join cannot be done in one step, you would have to loop through and query the related data for each document (‘record’) from your main collection (‘table’).

For manual queries you could do the join in just one step as you will see near the bottom of this article.

Let’s take a closer look at an example. In a traditional relational database we have a DEPARTMENT table and an EMPLOYEE table. Each employee record contains a reference DEPID to a record in the department table. All department records are unique in their table, for instance if we have 2 employees in the IT department, there is only 1 IT department record. Below illustrates the data model.

empdeprel

In MongoDB, we have a different model. Within each employee document we simply include the department data directly. That means if we have more than one employee within the same department, the department data will be repeated. That’s right, we have duplicate data and that is OK. Think of this in terms of querying data. It’s a lot faster to get all the data from one place than it would be from 2 places. In this example it doesn’t seem like much, but on a larger scale (think Big Data) it can make all the difference.

A case where a join is acceptable in MongoDB would be if you want to include details of each employee’s manager. Since the manager is also an employee and has a document in the same collection, his/her subordinates will simply contain a reference to the manager’s document, as opposed to contain a copy of the manager’s employee data.

The following screenshot illustrates how the employee and department data would look like in a relational database.
empdepreldata

In MongoDB the employee and department data could go together into a single “employees” collection and would look like this:

db.employees.find().sort({"_id": 1});

JSON Output:

{ 
    "_id" : NumberInt(1), 
    "lastName" : "Mouse", 
    "firstName" : "Mickey", 
    "department" : {
        "depName" : "IT"
    }
}
{ 
    "_id" : NumberInt(2), 
    "lastName" : "Mouse", 
    "firstName" : "Minnie", 
    "department" : {
        "depName" : "IT"
    }, 
    "manId" : NumberInt(1)
}
{ 
    "_id" : NumberInt(3), 
    "lastName" : "Duck", 
    "firstName" : "Donald", 
    "department" : {
        "depName" : "IT"
    }, 
    "manId" : NumberInt(1)
}
{ 
    "_id" : NumberInt(4), 
    "lastName" : "Smith", 
    "firstName" : "Jane", 
    "department" : {
        "depName" : "HR"
    }
}
{ 
    "_id" : NumberInt(5), 
    "lastName" : "Smith", 
    "firstName" : "John", 
    "department" : {
        "depName" : "HR"
    }, 
    "manId" : NumberInt(4)
}

If you wanted to include the manager details with each employee document you could run the following “join” on the mongo command line:

db.employees.find().sort({"_id": 1}).forEach(
  function(curEmp) {
    if (curEmp.manId != null) {
      curEmp.manager = db.employees.findOne({ "_id": curEmp.manId });
    }
    printjson (curEmp);
  }
);

This gives us the following output:

{ 
    "_id" : NumberInt(1), 
    "lastName" : "Mouse", 
    "firstName" : "Mickey", 
    "department" : {
        "depName" : "IT"
    }
}
{ 
    "_id" : NumberInt(2), 
    "lastName" : "Mouse", 
    "firstName" : "Minnie", 
    "department" : {
        "depName" : "IT"
    }, 
    "manId" : NumberInt(1), 
    "manager" : {
        "_id" : NumberInt(1), 
        "lastName" : "Mouse", 
        "firstName" : "Mickey", 
        "department" : {
            "depName" : "IT"
        }
    }
}
{ 
    "_id" : NumberInt(3), 
    "lastName" : "Duck", 
    "firstName" : "Donald", 
    "department" : {
        "depName" : "IT"
    }, 
    "manId" : NumberInt(1), 
    "manager" : {
        "_id" : NumberInt(1), 
        "lastName" : "Mouse", 
        "firstName" : "Mickey", 
        "department" : {
            "depName" : "IT"
        }
    }
}
{ 
    "_id" : NumberInt(4), 
    "lastName" : "Smith", 
    "firstName" : "Jane", 
    "department" : {
        "depName" : "HR"
    }
}
{ 
    "_id" : NumberInt(5), 
    "lastName" : "Smith", 
    "firstName" : "John", 
    "department" : {
        "depName" : "HR"
    }, 
    "manId" : NumberInt(4), 
    "manager" : {
        "_id" : NumberInt(4), 
        "lastName" : "Smith", 
        "firstName" : "Jane", 
        "department" : {
            "depName" : "HR"
        }
    }
}

Edit: 29-Nov-2015 – MongoDB 3.2 introduces the $lookup Aggregation Operator. This operator performs a left outer join to another collection in the same database. The syntax is as follows:

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

To follow the example of including each employee’s manager we can use the following $lookup:

db.employees.aggregate([
{$lookup:{
  from:"employees",
  localField:"manId",
  foreignField:"_id",
  as:"manager"
}}
]);

An example of mapping an array of values to individual documents can be found in Join Array in MongoDB.

Leave a Reply

Your email address will not be published. Required fields are marked *

*