Aggregation Stages

Group and summarize — process data across multiple documents.

Previous | Index | Next: 08 - Expressions Overview


Aggregation stages are fundamentally different from other stages: instead of processing documents one at a time, they process documents collectively. This enables grouping, summing, counting, and other cross-document operations.


$group — Group by Key

The $group stage groups documents by a specified key and applies accumulator expressions to each group.

db.<collection>.aggregate([
  {
    $group: {
      _id: <grouped_by_expression>,         // what to group by
      <new_field>: { <accumulator>: <expr> } // computed fields
    }
  }
]);
  • _id defines the grouping key — documents with the same _id value end up in the same group
  • Each additional field uses an accumulator like $sum, $avg, $min, $max, $push, etc.

Example: Count projects by type

db.projects.aggregate([
  {
    $group: {
      _id: "$type",
      projectCount: { $sum: 1 }
    }
  },
  { $out: "projectReport" }
]);

Output:

{ _id: "REQUEST_PROJECT",    projectCount: 1 }
{ _id: "RESEARCH_PROJECT",   projectCount: 1 }
{ _id: "MANAGEMENT_PROJECT", projectCount: 1 }

How $group works visually

Advanced Example: Find the highest-funded project

This combines multiple stages and techniques:

db.projects.aggregate([
  // Step 1: Calculate total funding per project
  {
    $addFields: {
      projectFunding: { $sum: "$fundings.amount" }
    }
  },
  // Step 2: Find the maximum funding across all projects
  {
    $group: {
      _id: null,                                  // group ALL documents
      projectFunding: { $max: "$projectFunding" }
    }
  },
  // Step 3: Look up which project(s) have that funding
  {
    $lookup: {
      from: "projects",
      let: { funds: "$projectFunding" },
      pipeline: [
        {
          $addFields: {
            funds: { $sum: "$fundings.amount" }
          }
        },
        {
          $match: {
            $expr: { $eq: ["$funds", "$$funds"] }
          }
        }
      ],
      as: "projects"
    }
  },
  // Step 4: Unwrap and promote the project to root
  { $unwind: { path: "$projects" } },
  { $replaceRoot: { newRoot: "$projects" } }
]);

Note: Setting _id: null groups all documents into a single group — useful for computing global aggregates like max, min, or total.


$bucket — Group by Value Ranges

The $bucket stage groups documents into “buckets” based on value intervals. Think of it as a histogram.

db.<collection>.aggregate([
  {
    $bucket: {
      groupBy: <expression>,                    // field to bucket by
      boundaries: [<low1>, <low2>, <low3>,...], // bucket boundaries
      output: {
        <field1>: { <accumulator>: <expr> },
        ...
      }
    }
  }
]);

The boundaries array defines the edges of each bucket. A value falls into bucket i if it is ≥ boundaries[i] and < boundaries[i+1].

Example: Group subprojects by research focus

// Buckets:
//   0–50:  "low applied research"
//   51–100: "high applied research"
 
db.subprojects.aggregate([
  {
    $bucket: {
      groupBy: "$appliedResearch",
      boundaries: [0, 51, 101],
      output: {
        count: { $sum: 1 },
        titles: { $push: "$title" }
      }
    }
  }
]);

Output:

{
  _id: 0,      // bucket for values 0–50
  count: 3,
  titles: ["ERP SAP", "Web-based Systems", "API Design SAP"]
}
{
  _id: 51,     // bucket for values 51–100
  count: 1,
  titles: ["Embedded Systems"]
}

When to use $bucket vs $group:

  • $group: when you want to group by discrete values (categories, types, names)
  • $bucket: when you want to group by numeric ranges (age ranges, price tiers, score bands)

Accumulators Available in Aggregation Stages

These operators only work inside $group and $bucket (and in $addFields when applied to arrays):

OperatorDescription
$sumSum of values or count (with $sum: 1)
$avgAverage of values
$minMinimum value
$maxMaximum value
$pushCollect all values into an array
$addToSetCollect unique values into an array

→ See 13 - Accumulator Operators for detailed examples.


Next: 08 - Expressions Overview — learn the expression language that powers all pipeline stages.