Exercise 3 — Grouping and Joining
Topics:
$group,$bucket,$lookup, accumulator operators ($sum,$avg,$min,$max,$push,$addToSet)Reading before you start: 06 - Relationship Stages, 07 - Aggregation Stages, 13 - Accumulator Operators
Database:
aggregation_lab— see Setup - Database
Part A — Grouping with $group
Task A1
Group all employees by their role. For each role, count how many employees have that role.
Expected output format:
{ _id: "DEVELOPER", count: 6 }
{ _id: "LEAD", count: 2 }
// ...Task A2
Group employees by department_id. For each department, compute:
employeeCount: the number of employeesavgSalary: the average salarymaxSalary: the highest salary
Task A3
Group tasks by status. For each status, compute:
taskCount: total number of taskstotalHours: sum of allhours
Sort by totalHours descending.
Task A4
Group orders by paymentMethod. For each method, count the number of orders and collect all unique status values using $addToSet.
Task A5
Group tasks by project_id. For each project, collect all task titles into an array using $push. Also compute totalHours.
Sort by totalHours descending.
Part B — Bucketing with $bucket
Task B1
Bucket employees by salary into these ranges:
- 0 – 40,000
- 40,000 – 55,000
- 55,000 – 70,000
- 70,000 – 100,000
For each bucket, output the count and the employee names (using $push).
Task B2
Bucket employees by age into:
- 20 – 30
- 30 – 40
- 40 – 60
For each bucket, compute the average salary and count.
Part C — Joining with $lookup
Task C1
Start from the employees collection. Use $lookup to join each employee with their department from the departments collection.
from:"departments"localField:"department_id"foreignField:"_id"as:"department"
Then use $unwind to turn the department array into a single object.
Finally, $project only the employee name, role, salary, and the department name (as departmentName).
Task C2
Start from the tasks collection. Use $lookup to join each task with its project.
from:"projects"localField:"project_id"foreignField:"_id"as:"project"
Unwind the project. Then $project the task title, status, hours, and the project’s title (as projectTitle).
Task C3
Start from the orders collection. Use $lookup to join each order with its customer.
from:"customers"localField:"customer_id"foreignField:"_id"as:"customer"
Unwind the customer. Project the order date, status, paymentMethod, the customer name (as customerName), and the customer loyaltyTier.
Sort by date ascending.
Part D — Combining Grouping and Joining
These tasks require you to combine $lookup with $group and other stages you’ve learned previously.
Task D1
Which department has the highest total salary cost?
- Group employees by
department_id, summing their salaries astotalSalary - Sort by
totalSalarydescending - Limit to 1
Task D2
How many tasks does each project have, and what is the project title?
- Group tasks by
project_id, counting astaskCount - Use
$lookupto join withprojectsto get the projecttitle - Unwind the project
- Project
projectTitle,taskCount - Sort by
taskCountdescending
Task D3
What is the total revenue per customer?
- Unwind the
itemsarray inorders - Add a
lineTotalfield (price * quantity) - Group by
customer_id, summinglineTotalastotalSpent - Use
$lookupto join withcustomers - Unwind the customer
- Project
customerName,totalSpent - Sort by
totalSpentdescending
Task D4
Create a department overview that shows for each department:
- Department name (from
departmentscollection) - Number of employees
- Average employee salary
- Total employee salary
Start from employees, group by department_id, then use $lookup to get the department name.
Part E — Creative Challenge
Task E1
Log analysis: Group the log_entries by level (INFO, WARNING, ERROR). For each level, count the entries and collect all unique service names. Sort by count descending.
Task E2
Order analytics dashboard: Design a pipeline that answers the question: “Which products are ordered most frequently, and what is their total revenue?”
You will need to: unwind items, group by product, compute both count and total revenue, then sort.
Task E3
Cross-collection report: Create a comprehensive “team productivity report” that shows, for each employee:
- Their name and role
- Their department name
- How many tasks are assigned to them
- Total hours of their assigned tasks
This requires you to think about which collection to start from and how to join and group the data. There are multiple valid approaches — pick one and implement it.
Save the result to a collection called team_productivity_report.
Reflection:
- What is the difference between
$pushand$addToSetin a$groupstage?- After
$lookup, why do you usually need$unwind?- What happens in a
$lookupwhen there is no matching document in the foreign collection?- Why is
_id: nulluseful in a$groupstage?