MongoDB Aggregation Pipeline


What is Aggregation?

When working with databases in real-world applications, you often need more than simple CRUD operations. You need to analyze, transform, and summarize data across documents and collections. This is where aggregation comes in.

MongoDB provides three mechanisms for aggregating data:

MechanismScopeComplexityUse Case
Query MethodsSingle collectionLowSimple counts, distinct values
Aggregation FrameworkMultiple collectionsMedium–HighComplex transformations, joins, analytics
MapReduceMultiple collectionsVery HighDeprecated since MongoDB 5.0 — use the Aggregation Framework instead

This guide focuses on the Aggregation Framework, which is the recommended approach for all aggregation tasks. Simple query methods are covered briefly as a starting point.

Note: MapReduce has been deprecated since MongoDB 5.0 and should no longer be used. The db.collection.group() method was removed in MongoDB 4.2. All aggregation tasks should use the Aggregation Pipeline instead.


How to Read This Guide

This scriptum is organized from simple to complex. Follow the chapters in order:

Part 1: Foundations

  1. 01 - Query Methods — Simple aggregation with count(), distinct(), and group()
  2. 02 - The Aggregation Pipeline — Core concept: how pipelines work
  3. 03 - Sample Data — The example dataset used throughout this guide

Part 2: Pipeline Stages

  1. 04 - Document Stages — Control which documents flow through: $match, $sort, $skip, $limit, $unwind, $out
  2. 05 - Structure Stages — Control what each document looks like: $addFields, $project, $replaceRoot
  3. 06 - Relationship Stages — Join data from other collections: $lookup
  4. 07 - Aggregation Stages — Group and summarize: $group, $bucket

Part 3: Expressions

  1. 08 - Expressions Overview — What expressions are and how they work
  2. 09 - Arithmetic Operators$add, $subtract, $multiply, $divide, $abs
  3. 10 - Comparison Operators$eq, $ne, $gt, $lt, $gte, $lte
  4. 11 - Boolean and Control Flow Operators$and, $or, $not, $cond, $switch, $cmp
  5. 12 - Array Operators$arrayElemAt, $concatArrays, $in, $map, $filter, $reduce, $isArray
  6. 13 - Accumulator Operators$sum, $min, $max, $avg, $addToSet, $push
  7. 14 - Variable and Reference Expressions$$CURRENT, $expr, $let
  8. 15 - Set Operators$setDifference, $setIntersection, $setUnion, $setIsSubset

Quick Reference: All Pipeline Stages

StagePurpose
$matchFilter documents (like find())
$sortSort documents
$skipSkip first n documents
$limitLimit to first n documents
$unwindDeconstruct arrays into separate documents
$outWrite results to a collection (replaces entire collection)
$mergeWrite results to a collection (can insert, update, or replace individual documents — more flexible than $out)
$addFieldsAdd new fields to documents
$projectSelect/rename/compute fields
$replaceRootReplace the document root
$lookupJoin with another collection
$groupGroup by key and aggregate
$bucketGroup by value ranges
$countCount documents in the stream

Quick Reference: All Expression Operators

CategoryOperators
Arithmetic$add, $subtract, $multiply, $divide, $abs
Comparison$eq, $ne, $gt, $gte, $lt, $lte, $cmp
Boolean$and, $or, $not
Control Flow$cond, $switch
Array$arrayElemAt, $concatArrays, $in, $map, $filter, $reduce, $isArray
Accumulators$sum, $min, $max, $avg, $addToSet, $push
Variables$$CURRENT, $expr, $let
Set$setDifference, $setIntersection, $setUnion, $setIsSubset