Exercise 3 — Grouping and Joining

Topics: $group, $bucket, $lookup, accumulator operators ($sum, $avg, $min, $max, $push, $addToSet)

Reading before you start: 06 - Relationship Stages, 07 - Aggregation Stages, 13 - Accumulator Operators

Database: aggregation_lab — see Setup - Database


Part A — Grouping with $group

Task A1

Group all employees by their role. For each role, count how many employees have that role.

Expected output format:

{ _id: "DEVELOPER", count: 6 }
{ _id: "LEAD", count: 2 }
// ...

Task A2

Group employees by department_id. For each department, compute:

  • employeeCount: the number of employees
  • avgSalary: the average salary
  • maxSalary: the highest salary

Task A3

Group tasks by status. For each status, compute:

  • taskCount: total number of tasks
  • totalHours: sum of all hours

Sort by totalHours descending.

Task A4

Group orders by paymentMethod. For each method, count the number of orders and collect all unique status values using $addToSet.

Task A5

Group tasks by project_id. For each project, collect all task titles into an array using $push. Also compute totalHours.

Sort by totalHours descending.


Part B — Bucketing with $bucket

Task B1

Bucket employees by salary into these ranges:

  • 0 – 40,000
  • 40,000 – 55,000
  • 55,000 – 70,000
  • 70,000 – 100,000

For each bucket, output the count and the employee names (using $push).

Task B2

Bucket employees by age into:

  • 20 – 30
  • 30 – 40
  • 40 – 60

For each bucket, compute the average salary and count.


Part C — Joining with $lookup

Task C1

Start from the employees collection. Use $lookup to join each employee with their department from the departments collection.

  • from: "departments"
  • localField: "department_id"
  • foreignField: "_id"
  • as: "department"

Then use $unwind to turn the department array into a single object.

Finally, $project only the employee name, role, salary, and the department name (as departmentName).

Task C2

Start from the tasks collection. Use $lookup to join each task with its project.

  • from: "projects"
  • localField: "project_id"
  • foreignField: "_id"
  • as: "project"

Unwind the project. Then $project the task title, status, hours, and the project’s title (as projectTitle).

Task C3

Start from the orders collection. Use $lookup to join each order with its customer.

  • from: "customers"
  • localField: "customer_id"
  • foreignField: "_id"
  • as: "customer"

Unwind the customer. Project the order date, status, paymentMethod, the customer name (as customerName), and the customer loyaltyTier.

Sort by date ascending.


Part D — Combining Grouping and Joining

These tasks require you to combine $lookup with $group and other stages you’ve learned previously.

Task D1

Which department has the highest total salary cost?

  1. Group employees by department_id, summing their salaries as totalSalary
  2. Sort by totalSalary descending
  3. Limit to 1

Task D2

How many tasks does each project have, and what is the project title?

  1. Group tasks by project_id, counting as taskCount
  2. Use $lookup to join with projects to get the project title
  3. Unwind the project
  4. Project projectTitle, taskCount
  5. Sort by taskCount descending

Task D3

What is the total revenue per customer?

  1. Unwind the items array in orders
  2. Add a lineTotal field (price * quantity)
  3. Group by customer_id, summing lineTotal as totalSpent
  4. Use $lookup to join with customers
  5. Unwind the customer
  6. Project customerName, totalSpent
  7. Sort by totalSpent descending

Task D4

Create a department overview that shows for each department:

  • Department name (from departments collection)
  • Number of employees
  • Average employee salary
  • Total employee salary

Start from employees, group by department_id, then use $lookup to get the department name.


Part E — Creative Challenge

Task E1

Log analysis: Group the log_entries by level (INFO, WARNING, ERROR). For each level, count the entries and collect all unique service names. Sort by count descending.

Task E2

Order analytics dashboard: Design a pipeline that answers the question: “Which products are ordered most frequently, and what is their total revenue?”

You will need to: unwind items, group by product, compute both count and total revenue, then sort.

Task E3

Cross-collection report: Create a comprehensive “team productivity report” that shows, for each employee:

  • Their name and role
  • Their department name
  • How many tasks are assigned to them
  • Total hours of their assigned tasks

This requires you to think about which collection to start from and how to join and group the data. There are multiple valid approaches — pick one and implement it.

Save the result to a collection called team_productivity_report.


Reflection:

  • What is the difference between $push and $addToSet in a $group stage?
  • After $lookup, why do you usually need $unwind?
  • What happens in a $lookup when there is no matching document in the foreign collection?
  • Why is _id: null useful in a $group stage?