Exercise 2 — Reshaping Documents

Topics: $addFields, $project, $unwind, $out, $replaceRoot

Reading before you start: 04 - Document Stages (sections on $unwind and $out), 05 - Structure Stages

Database: aggregation_lab — see Setup - Database


Part A — Adding Fields with $addFields

Task A1

Add a field ratingCount to each employee that contains the number of ratings they have. Use $size.

Task A2

Add a field skillCount to each employee that counts how many skills they have. Also add a field seniorEmployee set to the constant value false.

Task A3

For each employee, add a field emailDomain that copies the value of the email field.

Note: We’re just copying the full email here — extracting the domain part requires string expressions we haven’t learned yet. That’s fine.

Task A4

For each project, add a field teamSize that counts how many members are in the team array. Also add a field milestoneCount with the number of milestones.

Task A5

For each project, add the following computed fields:

  • costDifference: the difference between estimatedCost and actualCost (use $subtract)
  • totalReviewScores: the sum of all review ratings for the first employee in the team array — this is tricky, skip if stuck and come back later.

Part B — Projecting Fields with $project

Task B1

From the employees collection, create a pipeline that outputs only name, role, and salary for each employee. The _id field should be excluded.

Task B2

From the projects collection, project only the title and status. Additionally, rename the field estimatedCost to budget.

Task B3

Create an employee “summary card” by projecting:

  • name (keep as-is)
  • role (keep as-is)
  • department renamed from department_id
  • numberOfSkills computed using $size on skills
  • topRating computed using $max on ratings
  • averageRating computed using $avg on ratings

Exclude _id.

Task B4

From the orders collection, project:

  • date
  • status
  • itemCount: the number of items in the items array
  • customer renamed from customer_id

Part C — Unwinding Arrays

Task C1

Unwind the skills array in the employees collection. How many documents do you get? Why is this number different from the original 12?

Task C2

Unwind the items array in the orders collection. Project only the date, status, and the unwound items field. How many documents are in the result?

Task C3

Unwind the milestones array in the projects collection. Then filter ($match) to keep only milestones where completed is true. Project title (from the project) and the milestone name and deadline.

Task C4

Unwind the ratings array of employees. Then sort the unwound documents by the individual ratings value in descending order. Limit to 10 results. Project name and ratings.


Part D — Combining Structure Stages

Task D1

Find all active developers. Add a field salaryLevel that is the string "HIGH" if salary is greater than 50000, or "STANDARD" otherwise.

Hint: You haven’t learned $cond yet — instead, solve this by running two separate pipelines: one that matches salary > 50000 and one that matches salary <= 50000, and observe the results.

Task D2

Build a pipeline on projects:

  1. Match only projects with status "IN_PROGRESS"
  2. Add a field budgetUsedPercent by dividing actualCost by estimatedCost and multiplying by 100 — use $multiply and $divide
  3. Project only title, budgetUsedPercent, and priority
  4. Sort by budgetUsedPercent descending

Task D3

Build a pipeline on tasks:

  1. Match only tasks with status "DONE"
  2. Project title, hours, and project_id
  3. Sort by hours descending
  4. Save the result to a new collection called completed_task_report using $out

After running it, verify by querying: db.completed_task_report.find()

Task D4

Unwind the items array in orders. Then add a field lineTotal computed as items.price * items.quantity (use $multiply with "$items.price" and "$items.quantity"). Project date, items.product, items.quantity, items.price, and lineTotal. Sort by lineTotal descending.


Part E — Creative Challenge

Task E1

Using $replaceRoot, take the log_entries collection and create a pipeline that replaces each document’s root with just the metadata sub-object. What happens for entries where metadata contains different fields?

Task E2

Design a “project health report”. Using any combination of stages you’ve learned so far ($match, $sort, $limit, $skip, $addFields, $project, $unwind, $out), create a pipeline on the projects collection that:

  • Filters to only in-progress or planning projects
  • Computes useful metrics (team size, milestone count, budget usage)
  • Presents only the most relevant fields
  • Saves the output to a new collection

There is no single correct answer — design the report the way you think would be most useful. Write a brief comment at the top explaining what your report shows.


Reflection:

  • What is the difference between $addFields and $project? When would you choose one over the other?
  • What happens to the total document count after $unwind? Why?
  • Why must $out always be the last stage?