|
|
Java 8 Collectors - groupingBy, partitioningBy and joining
Author: Venkata Sudhakar
The Collectors utility class in Java 8 provides a rich set of terminal operations for the Stream API. While methods like collect(Collectors.toList()) and collect(Collectors.toSet()) are well known, the most powerful collectors are groupingBy(), partitioningBy(), counting(), summarizingInt(), and joining(). These allow you to perform SQL-like aggregations entirely in Java without writing loops - grouping orders by status, partitioning customers by region, computing average order values, or joining names into a comma-separated string. Collectors can be composed: a downstream collector is passed as the second argument to groupingBy() to perform a secondary aggregation on each group. For example, groupingBy(status, counting()) counts orders per status, groupingBy(status, summingDouble(Order::getAmount)) sums order amounts per status, and groupingBy(status, toList()) collects each group into a List. This composability makes Collectors one of the most expressive features of the Java Stream API for data aggregation tasks. The below example demonstrates the most useful Collectors on a list of orders from a data migration context - grouping, partitioning, summarising, and joining.
It gives the following output,
Orders by status:
COMPLETED: 3 orders
PENDING: 2 orders
FAILED: 2 orders
Count by status: {COMPLETED=3, PENDING=2, FAILED=2}
Revenue by region:
APAC avg: $290.00
EMEA avg: $780.00
AMER avg: $825.00
It gives the following output,
High value (>$500): ORD-2, ORD-4, ORD-6
Normal value: ORD-1, ORD-3, ORD-5
Amount stats: count=6, sum=3630.00, min=120.00, max=1200.00, avg=605.00
Order IDs: [ORD-1, ORD-2, ORD-3, ORD-4, ORD-5, ORD-6]
Lookup ORD-3: Order[id=ORD-3, status=COMPLETED, region=APAC, amount=120.0]
Key Collectors to memorise: Collectors.toList(), toSet(), toUnmodifiableList() - basic collection. Collectors.toMap(keyFn, valueFn) - build a Map, add a merge function as third arg to handle duplicate keys. Collectors.groupingBy(classifier) and groupingBy(classifier, downstream) - group by a key with optional secondary aggregation. Collectors.partitioningBy(predicate) - split into exactly two groups. Collectors.joining(delimiter, prefix, suffix) - build strings. Collectors.counting(), summingInt/Long/Double(), averagingInt/Long/Double(), summarizingInt/Long/Double() - numeric aggregations. Collectors.collectingAndThen(downstream, finisher) - apply a final transformation to the collected result.
|
|