编译时缓存在 `torch.compile` ¶

创建于：2025 年 4 月 1 日 | 最后更新：2025 年 4 月 1 日 | 最后验证：2024 年 11 月 5 日

作者：Oğuz Ulgen

简介

PyTorch 编译器提供了多种缓存选项以减少编译延迟。本指南将详细介绍这些选项，以帮助用户根据其用例选择最佳方案。

查看编译时缓存配置，了解如何配置这些缓存。

还请查看我们的缓存基准测试，PT CacheBench 基准测试。

前提条件＿

在开始此菜谱之前，请确保您有以下条件：

了解 torch.compile 的基本概念。请参阅：
PyTorch 2.4 或更高版本

缓存方案 ¶

提供以下缓存服务：

端到端缓存（也称为 Mega-Cache ）
TorchDynamo 、 TorchInductor 和 Triton 的模块化缓存

需要注意的是，缓存验证缓存工件使用时与相同的 PyTorch 和 Triton 版本以及相同的 GPU（当设备设置为 cuda 时）。

`torch.compile` 端到端缓存 ( `Mega-Cache` ) ¶

端到端缓存，以下简称#1，是寻找可便携缓存解决方案且可存储在数据库中、以后可能在不同机器上检索的理想选择。

#1 提供两个编译器 API：

torch.compiler.save_cache_artifacts()
torch.compiler.load_cache_artifacts()

预期的使用场景是在编译和执行模型之后，用户调用#1，它将以便携的形式返回编译器工件。之后，可能在不同的机器上，用户可以使用这些工件调用#2 以预先填充#3 缓存，从而加速缓存过程。

考虑以下示例。首先，编译并保存缓存工件。

@torch.compile
def fn(x, y):
    return x.sin() @ y

a = torch.rand(100, 100, dtype=dtype, device=device)
b = torch.rand(100, 100, dtype=dtype, device=device)

result = fn(a, b)

artifacts = torch.compiler.save_cache_artifacts()

assert artifacts is not None
artifact_bytes, cache_info = artifacts

# Now, potentially store artifact_bytes in a database
# You can use cache_info for logging

然后，您可以通过以下方式启动缓存：

# Potentially download/fetch the artifacts from the database
torch.compiler.load_cache_artifacts(artifact_bytes)

此操作将填充下一节中将要讨论的所有模块化缓存，包括 PGO 、 AOTAutograd 、 Inductor 、 Triton 和 Autotuning 。

模块化缓存 `TorchDynamo` ， `TorchInductor` ，和 `Triton` ¶

前述的 Mega-Cache 由可以无需用户干预即可使用的单个组件组成。默认情况下，PyTorch 编译器自带本地磁盘缓存，包括 TorchDynamo 、 TorchInductor 和 Triton 。这些缓存包括：

FXGraphCache ：编译过程中使用的基于图的 IR 组件的缓存。
TritonCache ：包含由 cubin 生成的 Triton 和其他缓存文件的 Triton 编译结果的缓存。
InductorCache ：包含 FXGraphCache 和 Triton 缓存的捆绑包。
AOTAutogradCache : 缓存联合图工件。
PGO-cache : 缓存动态形状决策以减少重新编译次数。

所有这些缓存工件都写入到 TORCHINDUCTOR_CACHE_DIR ，默认情况下将类似于 /tmp/torchinductor_myusername 。

远程缓存 ¶

我们还提供远程缓存选项，供希望利用基于 Redis 的缓存的用户使用。查看编译时缓存配置，了解更多如何启用基于 Redis 的缓存信息。

结论 ¶

在本菜谱中，我们了解到 PyTorch Inductor 的缓存机制通过利用本地和远程缓存，显著降低了编译延迟，这些缓存在后台无缝运行，无需用户干预。

编译时缓存在 `torch.compile` ¶

简介

前提条件＿

缓存方案 ¶

`torch.compile` 端到端缓存 ( `Mega-Cache` ) ¶

模块化缓存 `TorchDynamo` ， `TorchInductor` ，和 `Triton` ¶

远程缓存 ¶

结论 ¶

文档

教程

资源

编译时缓存在 torch.compile ¶

简介

前提条件 ＿

缓存方案 ¶

torch.compile 端到端缓存 ( Mega-Cache ) ¶

模块化缓存 TorchDynamo ， TorchInductor ，和 Triton ¶

远程缓存 ¶

结论 ¶

文档

教程

资源

编译时缓存在 `torch.compile` ¶

前提条件＿

`torch.compile` 端到端缓存 ( `Mega-Cache` ) ¶

模块化缓存 `TorchDynamo` ， `TorchInductor` ，和 `Triton` ¶