G

[Golang] 1.25版本新GC回收——Green Tea 🍵 Garbage Collector

RoLingG Golang 2025-11-24

Golang 1.25版本新GC回收——Green Tea 🍵 Garbage Collector

官方改动文档:green tea garbage collector

宏观设计目标差异

设计哲学差异

老 GC 原文描述:

Go’s garbage collector implements a classic tri-color parallel marking algorithm. This is, at its core, just a graph flood, where heap objects are nodes in the graph, and pointers are edges. However, this graph flood affords no consideration to the memory location of the objects that are being processed. As a result, it exhibits extremely poor spatial locality—jumping between completely different parts of memory—poor temporal locality—blithely spreading repeated accesses to the same memory across the GC cycle—and no concern for topology.

分析:

【老 GC】把堆当成纯图结构,只做“洪水漫灌”式并发标记,完全不顾对象在物理内存上的位置。

老 GC 问题:

As a result, it exhibits extremely poor spatial locality—jumping between completely different parts of memory—poor temporal locality—blithely spreading repeated accesses to the same memory across the GC cycle—and no concern for topology.

它表现出极差的空间局域性——在内存中完全不同的部分间跳跃——时间局域性差——在同一记忆循环中反复访问,且对拓扑毫无关心。

新 GC 原文描述:

Green Tea: a parallel marking algorithm that, if not memory-centric, is at least memory-aware, in that it endeavors to process objects close to one another together.

分析:

【新 GC】首要目标改为“内存感知”,优先把物理上相邻的对象一起处理,以改善 cache/NUMA 表现。


核心算法流程差异

最小工作单元

原文:

The core idea behind the new parallel marking algorithm is simple. Instead of scanning individual objects, the garbage collector scans memory in much larger, contiguous blocks. The shared work queue tracks these coarse blocks instead of individual objects, and the individual objects waiting to be scanned in a block are tracked in that block itself.

新并行标记算法的核心思想很简单。垃圾回收器不是扫描单个对象,而是扫描更大、相连的内存块。共享工作队列跟踪的是这些粗数据块,而非单个对象,等待扫描的单个物体则在该数据块内被跟踪。

分析:

【老 GC】以“单个对象”为最小单位;
【新 GC】以“span(8 KiB 连续块)”为最小单位。

入队/出队动作

原文:

When scanning finds a pointer to a small object, it sets that object’s gray bit … If the gray bit was not already set and the object’s span is not already enqueued … it enqueues the span.”
“When the scan loop dequeues a span, it computes the difference between the gray bits and the black bits … scans any objects that had their gray bit set but not their black bit.

分析:

【老 GC】每遇到一个指针就把目标对象立即压栈;
【新 GC】仅当指针落在小对象 span 且该 span 首次变灰时才整 span 入队;出队时批量扫描 span 内所有待标对象。

大对象路径

原文:

Larger objects continue to use the old algorithm … The choice of which algorithm to use is made when scanning encounters a pointer.

分析:

【老 GC】所有对象一视同仁;
【新 GC】大对象仍走老算法,小对象走新算法,形成“混合路径”。

这里的大对象指的是 > 32 KiB,详细来说:

  • ≤ 32 KiB → 小对象,可能占 8 KiB 或 16 KiB… 的 span受 Green Tea 新算法管理
  • > 32 KiB → 大对象,直接整页分配,继续用老 GC 的标记方式

实现层关键优化

单对象退化防护

原文:

If a span has only a single object to scan … we track the object that was marked when the span was enqueued … if the hit flag is not set, then the garbage collector can directly scan the span’s representative.

分析:
【老 GC】无此概念;
【新 GC】通过“代表对象 + hit 标志”保证稀疏场景不额外吃亏。

工作分布机制

原文:

Go’s current garbage collector … each scanner aggressively checks and populates global lists. This frequent mutation … is a significant source of contention …
The prototype implementation has a separate queue dedicated to spans and based on the distributed work-stealing runqueues … fewer items to queue … inherently lower contention.

分析:
【老 GC】用全局对象栈,多核频繁争抢;
【新 GC】复用 goroutine 调度器的 steal-dequespan 为粗粒度单元,竞争天然减少。

以前是一个个小对象去争抢处理,现在则是每次处理以 8KiB 大小的小对象页进行处理。

相当于是揽一些量的活放着一件件处理,而不是做完一个活揽一个活的方式,肯定要快很多。

而且无论用不用 Green Tea,Go 的内存管理器都把堆切成 8 KiB span,并在每个 span 内维护对象级标记位。新算法只是复用了已有的位图,没有新增 per-object 空间。所以峰值内存占用不会变大,缓存压力反而减小

地址算术定位元数据

原文:

Since small object spans are always 8 KiB large and 8 KiB aligned … simple address arithmetic to find the object’s metadata within the span, thus avoiding indirections and dependent loads.

分析:
【老 GC】需要通过对象头或 side table 间接取标记位;
【新 GC】指针对齐后移位即可得元数据,去掉一次依赖加载。

队列顺序策略

原文:

FIFO turned out to accumulate the highest average density of objects to scan on a span by the time it was dequeued.

分析:
【老 GC】无 span 概念,自然无此策略;
【新 GC】显式对比多种顺序,实测 FIFO 能让 span 在等待期间累积最多待扫对象。


性能与可扩展性表现

微基准

原文:

In select GC-heavy microbenchmarks … we observed anywhere from a 10–50% reduction in GC CPU costs … cache misses was reduced by half.

分析:
【老 GC】CPU stalled on memory 占 35%;
【新 GC】同场景 GC CPU ↓10–50%,cache-miss ↓50%。

核数扩展

原文:

The improvement generally rose with core count, indicating that the prototype scales better than the existing implementation.

分析:
【老 GC】全局栈成为多核瓶颈;
【新 GC】核数越多,优势越大。


新老GC差异

老 GC 的实际竞争点在哪?

原文:

Go’s current garbage collector … each scanner aggressively checks and populates global lists. This frequent mutation of the global lists is a significant source of contention in Go programs on many-core systems.

拆解:

  • 标记代码运行在某个 P 上,但它 push/pop 的却是同一个全局数据结构
  • 每遇到一个指针就要 CAS 抢这把“全局锁”;核数翻倍,CAS 失败率、缓存行乒乓指数级上升 → 扩展性撞墙。

Green Tea 到底改了什么?

原文:

The prototype implementation has a separate queue dedicated to spans and based on the distributed work-stealing runqueues … fewer items to queue … inherently lower contention.

实现细节:

  • Green Tea 新增了一条专属于 GC 标记工作的 per-P span 队列,和调度队列物理独立。(旧 GC 也 per-P,但那只管调度 goroutine
  • 工作单元从“单个对象”换成“8 KiB span”,项数立刻降到 1/16~1/几十。
  • 标记 goroutine 先拿自己 P 的本地队列;空了才 去别的 P 偷,全局区域几乎不被触碰。
    → 竞争面从“所有人抢一把锁”变成“各玩各的,偶尔偷一下”。

这里就可以看得出 Grenn Tea 的工作单元设计,其实有意向往 GMP 当时的设计靠,通过队列来控制自己处理的任务量,不够再去抢占、再去公有区域获取。


总结

新 Green Tea GC,其实就是在原来三色标记法的基础上,混合路径方式区分(即以 32 KiB 作为大小对象分界线)。原先小对象过多会影响 GC 回收资源占用率和效率,那么就把小对象多的情况分开,用新算法,大对象按原来算法进行操作处理。

新 Green Tea GC 以 8KiB span 小对象块为基本单位,不再是以单个对象为基本单位,主打一个批量处理,批量输出,减少原先全局区域竞争的问题。优先把物理上相邻的对象一起处理,减少处理完后内存上稀疏区域的问题。

而且现在因为 Green Tea GC 每个核都有了自己的 P 队列,且都是 8KiB span 为单位,当多个 span 同时等着被扫描时,先挑哪一个出队的规则很重要,最终测试下来 FIFO 最不错,这是原先 GC 没有的。

PREV
[测试] 单元测试与wrk压测工具
NEXT
[Golang] 关于游戏服务端TCP连接中不使用bufio获取数据的原因

评论(0)

发布评论