Exercise 5 — Advanced Patterns and Capstone
Topics:
$expr,$let, set operators, multi-stage pipeline designReading before you start: 14 - Variable and Reference Expressions, 15 - Set Operators
Database:
aggregation_lab— see Setup - Database
Part A — Reference Expressions: $expr and $let
Task A1
Find all projects where actualCost is greater than half of estimatedCost.
Use $expr inside $match to compare the two fields:
$expr: { $gt: ["$actualCost", { $divide: ["$estimatedCost", 2] }] }Task A2
Find all employees where the number of skills is greater than the number of ratings.
Use $expr with $gt and $size on both arrays.
Task A3
Use $let to define a variable totalFunding as estimatedCost for each project, then compute fundingPerTeamMember as totalFunding / teamSize.
$addFields: {
fundingPerTeamMember: {
$let: {
vars: {
totalFunding: "$estimatedCost",
teamSize: { $size: "$team" }
},
in: { $divide: ["$$totalFunding", "$$teamSize"] }
}
}
}Project title, fundingPerTeamMember.
Task A4
Use $let to compute a “weighted score” for each employee:
- Define
avgRatas the average of ratings - Define
skillCountas the size of skills array - Compute the result as
(avgRat * 10) + (skillCount * 3)
Project name and weightedScore.
Task A5
Find all tasks where hours is greater than the average hours across all TODO tasks (which is about 12.5).
Approach:
- First, compute the average in a separate mental step (or hardcode it)
- Then use
$matchwith$exprand$gtto filter
Alternatively, challenge yourself: can you solve this with a single pipeline? (Hint: $group with _id: null first, then $lookup back.)
Part B — Set Operators
Task B1
The departments collection has a tags array. Write a pipeline that computes the intersection of tags between the “Engineering” and “Product” departments.
Approach:
- Match only those two departments
- Group them with
_id: null, pushing alltagsarrays - Use
$setIntersectionon the two arrays
Or use a simpler approach: match Engineering, add a $lookup for Product, then compute the intersection.
Task B2
For each department, compute the set difference between its tags and the array ["tech", "product"]. This shows which tags are unique to each department (not in the common set).
$addFields: {
uniqueTags: {
$setDifference: ["$tags", ["tech", "product"]]
}
}Task B3
For each department, check whether its tags are a subset of ["tech", "product", "core", "management"].
Use $setIsSubset.
Task B4
Compute the union of all skills across all employees in the Engineering department.
- Match employees in Engineering (
department_id: ObjectId("bbb000000000000000000001")) - Unwind skills
- Group with
_id: null, using$addToSetto collect unique skills
Alternatively, use $setUnion by grouping the skill arrays and reducing them.
Part C — Capstone Challenge
Time budget: This section is designed to take about 90 minutes. Plan your pipelines on paper before typing.
Capstone 1: Company Dashboard Report
Build a single pipeline that produces a company overview report with the following information for each department:
departmentName: the name of the departmentlocation: where the department is locatedemployeeCount: number of employeesactiveCount: number of active employeesaverageSalary: average salarytopSkills: a deduplicated list of all skills across all employees in the department (you’ll need$unwindon skills followed by$addToSetin$group)averageRating: the average of all individual ratings across all employees in the department (hint:$unwindratings before grouping)
Save the result to company_dashboard.
Requirements:
- Start from the
employeescollection - Join with
departmentsusing$lookup - The final output should have one document per department
- Think carefully about the order of
$unwindand$groupoperations
Capstone 2: Project Risk Assessment
Build a pipeline that identifies at-risk projects. A project is “at-risk” if:
- Its status is not
"COMPLETED" - Its budget utilization (
actualCost / estimatedCost) is above 70% - OR it has fewer than 50% of milestones completed
The output should include:
titlestatus,prioritybudgetUtilization(percentage)completionRate(percentage of milestones completed)riskLevel: use$switchto assign"CRITICAL"if both conditions are met,"HIGH"if budget > 70%,"MEDIUM"if completion < 50%,"LOW"otherwiseteamSizeteamMembers: use$lookupto resolve the team member ObjectIds to employee names
Sort by riskLevel (you may need to assign numeric values and sort by those).
Capstone 3: Customer Lifetime Value
Build a pipeline that computes each customer’s lifetime value and profile:
customerName,customerType,loyaltyTier,citytotalOrders: number of orderstotalSpent: sum of all order totals (each order’s total = sum ofprice * quantityfor all items)averageOrderValue:totalSpent / totalOrdersproductsPurchased: deduplicated list of all product names they’ve orderedspendingTier: use$switch—"VIP"if totalSpent > 2000,"REGULAR"if > 500,"NEW"otherwise
Save to customer_lifetime_value.
Part D — Open Design Challenge
Task D1
Design your own analysis. Pick any combination of the collections in the database and design a pipeline that answers a business question you find interesting.
Requirements:
- Must use at least 5 different stages
- Must use at least 2 different expression operators
- Must include a comment at the top explaining what question you’re answering
Some ideas for inspiration:
- “Which employees work on the most projects?”
- “What is the average task load per person per project?”
- “Which services produce the most log entries per hour?”
- “Do business customers spend more per order than private customers?”
- “Is there a correlation between team size and budget utilization?”
Present your pipeline and explain the result.
Final Reflection:
- Look back at your Exercise 1 pipelines. How much more complex can you build now?
- Which operator or concept was the hardest to understand? What helped it click?
- When would you use
$exprvs. a regular$matchcondition?- How would you approach designing a pipeline for a new problem from scratch? What’s your process?