Exercise 5 — Advanced Patterns and Capstone

Topics: $expr, $let, set operators, multi-stage pipeline design

Reading before you start: 14 - Variable and Reference Expressions, 15 - Set Operators

Database: aggregation_lab — see Setup - Database

Part A — Reference Expressions: `$expr` and `$let`

Task A1

Find all projects where actualCost is greater than half of estimatedCost.

Use $expr inside $match to compare the two fields:

$expr: { $gt: ["$actualCost", { $divide: ["$estimatedCost", 2] }] }

Task A2

Find all employees where the number of skills is greater than the number of ratings.

Use $expr with $gt and $size on both arrays.

Task A3

Use $let to define a variable totalFunding as estimatedCost for each project, then compute fundingPerTeamMember as totalFunding / teamSize.

$addFields: {
  fundingPerTeamMember: {
    $let: {
      vars: {
        totalFunding: "$estimatedCost",
        teamSize: { $size: "$team" }
      },
      in: { $divide: ["$$totalFunding", "$$teamSize"] }
    }
  }
}

Project title, fundingPerTeamMember.

Task A4

Use $let to compute a “weighted score” for each employee:

Define avgRat as the average of ratings
Define skillCount as the size of skills array
Compute the result as (avgRat * 10) + (skillCount * 3)

Project name and weightedScore.

Task A5

Find all tasks where hours is greater than the average hours across all TODO tasks (which is about 12.5).

Approach:

First, compute the average in a separate mental step (or hardcode it)
Then use $match with $expr and $gt to filter

Alternatively, challenge yourself: can you solve this with a single pipeline? (Hint: $group with _id: null first, then $lookup back.)

Part B — Set Operators

Task B1

The departments collection has a tags array. Write a pipeline that computes the intersection of tags between the “Engineering” and “Product” departments.

Approach:

Match only those two departments
Group them with _id: null, pushing all tags arrays
Use $setIntersection on the two arrays

Or use a simpler approach: match Engineering, add a $lookup for Product, then compute the intersection.

Task B2

For each department, compute the set difference between its tags and the array ["tech", "product"]. This shows which tags are unique to each department (not in the common set).

$addFields: {
  uniqueTags: {
    $setDifference: ["$tags", ["tech", "product"]]
  }
}

Task B3

For each department, check whether its tags are a subset of ["tech", "product", "core", "management"].

Use $setIsSubset.

Task B4

Compute the union of all skills across all employees in the Engineering department.

Match employees in Engineering (department_id: ObjectId("bbb000000000000000000001"))
Unwind skills
Group with _id: null, using $addToSet to collect unique skills

Alternatively, use $setUnion by grouping the skill arrays and reducing them.

Part C — Capstone Challenge

Time budget: This section is designed to take about 90 minutes. Plan your pipelines on paper before typing.

Capstone 1: Company Dashboard Report

Build a single pipeline that produces a company overview report with the following information for each department:

departmentName: the name of the department
location: where the department is located
employeeCount: number of employees
activeCount: number of active employees
averageSalary: average salary
topSkills: a deduplicated list of all skills across all employees in the department (you’ll need $unwind on skills followed by $addToSet in $group)
averageRating: the average of all individual ratings across all employees in the department (hint: $unwind ratings before grouping)

Save the result to company_dashboard.

Requirements:

Start from the employees collection
Join with departments using $lookup
The final output should have one document per department
Think carefully about the order of $unwind and $group operations

Capstone 2: Project Risk Assessment

Build a pipeline that identifies at-risk projects. A project is “at-risk” if:

Its status is not "COMPLETED"
Its budget utilization (actualCost / estimatedCost) is above 70%
OR it has fewer than 50% of milestones completed

The output should include:

title
status, priority
budgetUtilization (percentage)
completionRate (percentage of milestones completed)
riskLevel: use $switch to assign "CRITICAL" if both conditions are met, "HIGH" if budget > 70%, "MEDIUM" if completion < 50%, "LOW" otherwise
teamSize
teamMembers: use $lookup to resolve the team member ObjectIds to employee names

Sort by riskLevel (you may need to assign numeric values and sort by those).

Capstone 3: Customer Lifetime Value

Build a pipeline that computes each customer’s lifetime value and profile:

customerName, customerType, loyaltyTier, city
totalOrders: number of orders
totalSpent: sum of all order totals (each order’s total = sum of price * quantity for all items)
averageOrderValue: totalSpent / totalOrders
productsPurchased: deduplicated list of all product names they’ve ordered
spendingTier: use $switch — "VIP" if totalSpent > 2000, "REGULAR" if > 500, "NEW" otherwise

Save to customer_lifetime_value.

Part D — Open Design Challenge

Task D1

Design your own analysis. Pick any combination of the collections in the database and design a pipeline that answers a business question you find interesting.

Requirements:

Must use at least 5 different stages
Must use at least 2 different expression operators
Must include a comment at the top explaining what question you’re answering

Some ideas for inspiration:

“Which employees work on the most projects?”
“What is the average task load per person per project?”
“Which services produce the most log entries per hour?”
“Do business customers spend more per order than private customers?”
“Is there a correlation between team size and budget utilization?”

Present your pipeline and explain the result.

Final Reflection:

Look back at your Exercise 1 pipelines. How much more complex can you build now?

Which operator or concept was the hardest to understand? What helped it click?

When would you use $expr vs. a regular $match condition?

How would you approach designing a pipeline for a new problem from scratch? What’s your process?

Deep Thought

Explorer

Exercise 5 - Advanced and Capstone

Exercise 5 — Advanced Patterns and Capstone

Part A — Reference Expressions: `$expr` and `$let`

Task A1

Task A2

Task A3

Task A4

Task A5

Part B — Set Operators

Task B1

Task B2

Task B3

Task B4

Part C — Capstone Challenge

Capstone 1: Company Dashboard Report

Capstone 2: Project Risk Assessment

Capstone 3: Customer Lifetime Value

Part D — Open Design Challenge

Task D1

Graph View

Table of Contents

Backlinks

Deep Thought

Explorer

Exercise 5 - Advanced and Capstone

Exercise 5 — Advanced Patterns and Capstone

Part A — Reference Expressions: $expr and $let

Task A1

Task A2

Task A3

Task A4

Task A5

Part B — Set Operators

Task B1

Task B2

Task B3

Task B4

Part C — Capstone Challenge

Capstone 1: Company Dashboard Report

Capstone 2: Project Risk Assessment

Capstone 3: Customer Lifetime Value

Part D — Open Design Challenge

Task D1

Graph View

Table of Contents

Backlinks

Part A — Reference Expressions: `$expr` and `$let`