torch.fx¶

概述 ¶

FX 是开发者用来转换 nn.Module 实例的工具包。FX 包含三个主要组件：符号追踪器、中间表示和 Python 代码生成。以下是这些组件在行动中的演示：

import torch


# Simple module for demonstration
class MyModule(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.param = torch.nn.Parameter(torch.rand(3, 4))
        self.linear = torch.nn.Linear(4, 5)

    def forward(self, x):
        return self.linear(x + self.param).clamp(min=0.0, max=1.0)


module = MyModule()

from torch.fx import symbolic_trace

# Symbolic tracing frontend - captures the semantics of the module
symbolic_traced: torch.fx.GraphModule = symbolic_trace(module)

# High-level intermediate representation (IR) - Graph representation
print(symbolic_traced.graph)
"""
graph():
    %x : [num_users=1] = placeholder[target=x]
    %param : [num_users=1] = get_attr[target=param]
    %add : [num_users=1] = call_function[target=operator.add](args = (%x, %param), kwargs = {})
    %linear : [num_users=1] = call_module[target=linear](args = (%add,), kwargs = {})
    %clamp : [num_users=1] = call_method[target=clamp](args = (%linear,), kwargs = {min: 0.0, max: 1.0})
    return clamp
"""

# Code generation - valid Python code
print(symbolic_traced.code)
"""
def forward(self, x):
    param = self.param
    add = x + param;  x = param = None
    linear = self.linear(add);  add = None
    clamp = linear.clamp(min = 0.0, max = 1.0);  linear = None
    return clamp
"""

符号追踪器执行 Python 代码的“符号执行”。它通过代码传递称为代理的假值。记录这些代理的操作。有关符号追踪的更多信息，请参阅 symbolic_trace() 和 Tracer 文档。

中间表示是记录在符号追踪期间的操作的容器。它由表示函数输入、调用点（到函数、方法或 torch.nn.Module 实例）和返回值的节点列表组成。有关 IR 的更多信息，请参阅 Graph 的文档。IR 是应用转换的格式。

Python 代码生成是 FX 成为 Python 到 Python（或模块到模块）转换工具包的原因。对于每个图 IR，我们可以创建与图语义匹配的有效 Python 代码。此功能封装在 GraphModule 中，它是一个 torch.nn.Module 实例，包含一个 Graph 以及从图生成的 forward 方法。

综合来看，这个组件管道（符号跟踪 -> 中间表示 -> 转换 -> Python 代码生成）构成了 FX 的 Python 到 Python 转换管道。此外，这些组件也可以单独使用。例如，符号跟踪可以单独使用来捕获代码的一种形式以供分析（而非转换）目的。代码生成可以用于程序化生成模型，例如从配置文件中生成。FX 有很多用途！

几个示例转换可以在示例仓库中找到。

编写转换

什么是 FX 转换？本质上，它是一个看起来像这样的函数。

import torch
import torch.fx

def transform(m: nn.Module,
              tracer_class : type = torch.fx.Tracer) -> torch.nn.Module:
    # Step 1: Acquire a Graph representing the code in `m`

    # NOTE: torch.fx.symbolic_trace is a wrapper around a call to
    # fx.Tracer.trace and constructing a GraphModule. We'll
    # split that out in our transform to allow the caller to
    # customize tracing behavior.
    graph : torch.fx.Graph = tracer_class().trace(m)

    # Step 2: Modify this Graph or create a new one
    graph = ...

    # Step 3: Construct a Module to return
    return torch.fx.GraphModule(m, graph)

您的转换将接受一个 torch.nn.Module ，从中获取一个 Graph ，进行一些修改，然后返回一个新的 torch.nn.Module 。您应该将您的 FX 转换返回的 torch.nn.Module 视为与常规的 torch.nn.Module 相同 – 您可以将其传递给另一个 FX 转换，可以传递给 TorchScript，或者可以运行它。确保您的 FX 转换的输入和输出是 torch.nn.Module 将允许进行组合。

注意

也可以修改现有的 GraphModule 而不是创建一个新的，如下所示：

import torch
import torch.fx

def transform(m : nn.Module) -> nn.Module:
    gm : torch.fx.GraphModule = torch.fx.symbolic_trace(m)

    # Modify gm.graph
    # <...>

    # Recompile the forward() method of `gm` from its Graph
    gm.recompile()

    return gm

注意，您必须调用 GraphModule.recompile() 来使生成的 forward() 方法与修改后的 GraphModule 同步。

既然您已经传递了一个已经被追踪到 Graph 的 torch.nn.Module ，现在有两种主要方法可以构建一个新的 Graph 。

图论快速入门指南

图的语义的全面介绍可以在 Graph 文档中找到，但在这里我们将介绍基础知识。 Graph 是一种数据结构，它表示在 GraphModule 上的方法。这需要的信息包括：

该方法有哪些输入？
方法内部运行的操作有哪些？
该方法输出的（即返回的）值是什么？

这三个概念都用 Node 实例表示。让我们用一个简短的例子来看看我们是什么意思：

import torch
import torch.fx

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Parameter(torch.rand(3, 4))
        self.linear = torch.nn.Linear(4, 5)

    def forward(self, x):
        return torch.topk(torch.sum(
            self.linear(x + self.linear.weight).relu(), dim=-1), 3)

m = MyModule()
gm = torch.fx.symbolic_trace(m)

gm.graph.print_tabular()

在这里，我们定义一个用于演示的模块 MyModule ，实例化它，进行符号跟踪，然后调用 Graph.print_tabular() 方法打印出显示此 Graph 节点的表格：

指令码

名称

目标

参数

关键字参数

占位符

x

x

()

{}

获取属性

线性权重

线性.weight

()

{}

调用函数

加 1

<内置函数加>

(x, 线性权重)

{}

调用模块

线性_1

线性

(add_1,)

{}

调用方法

relu_1

relu

(线性_1,)

{}

调用函数

求和 1

<内置方法 sum …>

(relu_1,)

{‘维度’：-1}

调用函数

topk_1

<内置方法 topk …>

(和_1, 3)

{}

输出

输出

输出

(topk_1,)

{}

我们可以使用这些信息来回答我们上面提出的问题。

该方法有哪些输入？在 FX 中，方法输入通过特殊的 placeholder 节点指定。在这种情况下，我们有一个单独的 placeholder 节点，其 target 为 x ，这意味着我们有一个名为 x 的单个（非自身）参数。
方法中包含哪些操作？ get_attr 、 call_function 、 call_module 和 call_method 节点代表方法中的操作。所有这些操作的语义的全面解释可以在 Node 文档中找到。
该方法返回值是什么？在 Graph 中，返回值由一个特殊的 output 节点指定。

既然我们已经了解了 FX 中代码表示的基础知识，现在我们可以探索如何编辑 Graph 。

图形操作 ¶

直接图形操作 ¶

建立这种新 Graph 的一种方法就是直接操作你的旧版本。为了帮助实现这一点，我们可以简单地从符号跟踪中获取的 Graph 进行修改。例如，假设我们希望用 torch.mul() 调用替换 torch.add() 调用。

import torch
import torch.fx

# Sample module
class M(torch.nn.Module):
    def forward(self, x, y):
        return torch.add(x, y)

def transform(m: torch.nn.Module,
              tracer_class : type = fx.Tracer) -> torch.nn.Module:
    graph : fx.Graph = tracer_class().trace(m)
    # FX represents its Graph as an ordered list of
    # nodes, so we can iterate through them.
    for node in graph.nodes:
        # Checks if we're calling a function (i.e:
        # torch.add)
        if node.op == 'call_function':
            # The target attribute is the function
            # that call_function calls.
            if node.target == torch.add:
                node.target = torch.mul

    graph.lint() # Does some checks to make sure the
                 # Graph is well-formed.

    return fx.GraphModule(m, graph)

我们还可以进行更复杂的 Graph 重写，例如删除或添加节点。为了帮助这些转换，FX 提供了用于转换图的实用函数，这些函数可以在 Graph 文档中找到。以下是一个使用这些 API 添加 torch.relu() 调用的示例。

# Specifies the insertion point. Any nodes added to the
# Graph within this scope will be inserted after `node`
with traced.graph.inserting_after(node):
    # Insert a new `call_function` node calling `torch.relu`
    new_node = traced.graph.call_function(
        torch.relu, args=(node,))

    # We want all places that used the value of `node` to
    # now use that value after the `relu` call we've added.
    # We use the `replace_all_uses_with` API to do this.
    node.replace_all_uses_with(new_node)

对于只包含替换的简单转换，你还可以使用子图重写器。

使用 replace_pattern() 进行子图重写

FX 还在直接图操作之上提供另一层自动化。 replace_pattern() API 实质上是一个用于编辑 Graph 的“查找/替换”工具。它允许您指定一个 pattern 和 replacement 函数，然后它会追踪这些函数，在 pattern 图中找到操作组的实例，并将这些实例替换为 replacement 图的副本。这可以帮助极大地自动化繁琐的图操作代码，随着变换变得更加复杂，这些代码可能会变得难以控制。

图操作示例 ¶

代理/重绘

另一种操作 Graph 的方法是重用符号跟踪中使用的 Proxy 机制。例如，让我们想象我们想要编写一个将 PyTorch 函数分解成更小操作的转换。它将把每个 F.relu(x) 调用转换成 (x > 0) * x 。一种可能性是在 F.relu 之后插入比较和乘法所需的图重写，然后清理原始的 F.relu 。然而，我们可以通过使用 Proxy 对象来自动记录操作到 Graph 来自动化此过程。

使用此方法，我们将要插入的操作以常规 PyTorch 代码的形式编写，并使用 Proxy 对象作为参数调用该代码。这些 Proxy 对象将捕获对它们的操作并将它们追加到 Graph 。

# Note that this decomposition rule can be read as regular Python
def relu_decomposition(x):
    return (x > 0) * x

decomposition_rules = {}
decomposition_rules[F.relu] = relu_decomposition

def decompose(model: torch.nn.Module,
              tracer_class : type = fx.Tracer) -> torch.nn.Module:
    """
    Decompose `model` into smaller constituent operations.
    Currently,this only supports decomposing ReLU into its
    mathematical definition: (x > 0) * x
    """
    graph : fx.Graph = tracer_class().trace(model)
    new_graph = fx.Graph()
    env = {}
    tracer = torch.fx.proxy.GraphAppendingTracer(new_graph)
    for node in graph.nodes:
        if node.op == 'call_function' and node.target in decomposition_rules:
            # By wrapping the arguments with proxies,
            # we can dispatch to the appropriate
            # decomposition rule and implicitly add it
            # to the Graph by symbolically tracing it.
            proxy_args = [
                fx.Proxy(env[x.name], tracer) if isinstance(x, fx.Node) else x for x in node.args]
            output_proxy = decomposition_rules[node.target](*proxy_args)

            # Operations on `Proxy` always yield new `Proxy`s, and the
            # return value of our decomposition rule is no exception.
            # We need to extract the underlying `Node` from the `Proxy`
            # to use it in subsequent iterations of this transform.
            new_node = output_proxy.node
            env[node.name] = new_node
        else:
            # Default case: we don't have a decomposition rule for this
            # node, so just copy the node over into the new graph.
            new_node = new_graph.node_copy(node, lambda x: env[x.name])
            env[node.name] = new_node
    return fx.GraphModule(model, new_graph)

除了避免显式图操作外，使用 Proxy 还可以让您将重写规则指定为原生 Python 代码。对于需要大量重写规则（如 vmap 或 grad）的转换，这通常可以提高规则的可读性和可维护性。请注意，在调用 Proxy 时，我们还传递了一个指向底层变量图的跟踪器。这样做是为了如果图中的操作是 n 元（例如，add 是一个二元运算符）时，调用 Proxy 不会创建多个图跟踪器实例，这可能导致意外的运行时错误。我们建议在底层运算符不能安全假设为一元时，特别使用此方法使用 Proxy 。

使用 Proxy 进行 Graph 操作的示例可以在此处找到。

解释器模式 ¶

在 FX 中，一种有用的代码组织模式是遍历一个模块中的所有 Node ，并执行它们。这可以用于多种用途，包括对通过图流动的值的运行时分析或通过 Proxy 进行回溯来转换代码。例如，假设我们想要运行一个 GraphModule 并记录节点在运行时看到的 torch.Tensor 形状和 dtype 属性。这可能看起来像：

import torch
import torch.fx
from torch.fx.node import Node

from typing import Dict

class ShapeProp:
    """
    Shape propagation. This class takes a `GraphModule`.
    Then, its `propagate` method executes the `GraphModule`
    node-by-node with the given arguments. As each operation
    executes, the ShapeProp class stores away the shape and
    element type for the output values of each operation on
    the `shape` and `dtype` attributes of the operation's
    `Node`.
    """
    def __init__(self, mod):
        self.mod = mod
        self.graph = mod.graph
        self.modules = dict(self.mod.named_modules())

    def propagate(self, *args):
        args_iter = iter(args)
        env : Dict[str, Node] = {}

        def load_arg(a):
            return torch.fx.graph.map_arg(a, lambda n: env[n.name])

        def fetch_attr(target : str):
            target_atoms = target.split('.')
            attr_itr = self.mod
            for i, atom in enumerate(target_atoms):
                if not hasattr(attr_itr, atom):
                    raise RuntimeError(f"Node referenced nonexistent target {'.'.join(target_atoms[:i])}")
                attr_itr = getattr(attr_itr, atom)
            return attr_itr

        for node in self.graph.nodes:
            if node.op == 'placeholder':
                result = next(args_iter)
            elif node.op == 'get_attr':
                result = fetch_attr(node.target)
            elif node.op == 'call_function':
                result = node.target(*load_arg(node.args), **load_arg(node.kwargs))
            elif node.op == 'call_method':
                self_obj, *args = load_arg(node.args)
                kwargs = load_arg(node.kwargs)
                result = getattr(self_obj, node.target)(*args, **kwargs)
            elif node.op == 'call_module':
                result = self.modules[node.target](*load_arg(node.args), **load_arg(node.kwargs))

            # This is the only code specific to shape propagation.
            # you can delete this `if` branch and this becomes
            # a generic GraphModule interpreter.
            if isinstance(result, torch.Tensor):
                node.shape = result.shape
                node.dtype = result.dtype

            env[node.name] = result

        return load_arg(self.graph.result)

如您所见，FX 的完整解释器并不复杂，但它非常有用。为了简化使用这种模式，我们提供了一个 Interpreter 类，它以这种方式封装了上述逻辑，使得解释器执行的一些方面可以通过方法重写来覆盖。

除了执行操作外，我们还可以通过将 Proxy 值通过解释器传递来生成一个新的图。同样，我们提供了一个 Transformer 类来封装这种模式。 Transformer 的行为与 Interpreter 类似，但您不是调用 run 方法从 Module 获取具体的输出值，而是调用 Transformer.transform() 方法返回一个新的 GraphModule ，该新图已应用您安装的任何转换规则作为重写方法。

解释器模式的示例

调试

简介

在编写变换的过程中，我们的代码可能并不完全正确。在这种情况下，我们可能需要进行一些调试。关键是要逆向工作：首先，检查调用生成的模块的结果以证明或反驳正确性。然后，检查和调试生成的代码。然后，调试导致生成代码的变换过程。

如果您不熟悉调试器，请参阅辅助部分“可用的调试器”。

常见在转换创作中的陷阱 ¶

非确定性的 set 迭代顺序。在 Python 中， set 数据类型是无序的。使用 set 来包含对象集合，例如，可以导致意外的非确定性。一个例子是遍历一组 Node s 以将其插入到 Node 。因为 Graph 数据类型是无序的，输出程序中操作的顺序将是非确定性的，并且可以在程序调用之间发生变化。建议的替代方案是使用 set 数据类型，该数据类型自 Python 3.7（以及 cPython 3.6）起为插入顺序。可以使用 dict 相当于一个集合，通过将需要去重的值存储在 dict 的键中。

检查模块的正确性 ¶

由于大多数深度学习模块的输出由浮点 torch.Tensor 实例组成，检查两个 torch.nn.Module 的结果之间的等价性并不像进行简单的相等性检查那样简单。为了说明这一点，让我们用一个例子来说明：

import torch
import torch.fx
import torchvision.models as models

def transform(m : torch.nn.Module) -> torch.nn.Module:
    gm = torch.fx.symbolic_trace(m)

    # Imagine we're doing some transforms here
    # <...>

    gm.recompile()

    return gm

resnet18 = models.resnet18()
transformed_resnet18 = transform(resnet18)

input_image = torch.randn(5, 3, 224, 224)

assert resnet18(input_image) == transformed_resnet18(input_image)
"""
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
"""

在这里，我们尝试使用 == 等价运算符检查两个深度学习模型的值是否相等。然而，这并不明确，一方面是因为该运算符返回一个张量而不是布尔值，另一方面是因为浮点数的比较应该使用误差范围（或 epsilon）来考虑浮点运算的非交换性（更多详情请见此处）。我们可以使用 torch.allclose() ，它将给出一个近似比较，考虑到相对和绝对容差阈值：

assert torch.allclose(resnet18(input_image), transformed_resnet18(input_image))

这是我们的工具箱中第一个检查转换后的模块是否按预期与参考实现相比表现良好的工具。

生成代码的调试

因为 FX 在 GraphModule 上生成 forward() 函数，所以使用传统的调试技术，如 print 语句或 pdb ，并不那么直接。幸运的是，我们有几种可以用来调试生成代码的技术。

使用 `pdb` ¶

使用 pdb 进入正在运行的程序。虽然代表 Graph 的代码不在任何源文件中，但在调用前向传递时，我们仍然可以使用 pdb 手动进入。

import torch
import torch.fx
import torchvision.models as models

def my_pass(inp: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:
    graph = tracer_class().trace(inp)
    # Transformation logic here
    # <...>

    # Return new Module
    return fx.GraphModule(inp, graph)

my_module = models.resnet18()
my_module_transformed = my_pass(my_module)

input_value = torch.randn(5, 3, 224, 224)

# When this line is executed at runtime, we will be dropped into an
# interactive `pdb` prompt. We can use the `step` or `s` command to
# step into the execution of the next line
import pdb; pdb.set_trace()

my_module_transformed(input_value)

打印生成的代码 ¶

如果你想多次运行相同的代码，那么使用 pdb 步进到正确的代码可能会有些繁琐。在这种情况下，一种方法是将生成的 forward 传递复制粘贴到你的代码中，并从那里进行检查。

# Assume that `traced` is a GraphModule that has undergone some
# number of transforms

# Copy this code for later
print(traced)
# Print the code generated from symbolic tracing. This outputs:
"""
def forward(self, y):
    x = self.x
    add_1 = x + y;  x = y = None
    return add_1
"""

# Subclass the original Module
class SubclassM(M):
    def __init__(self):
        super().__init__()

    # Paste the generated `forward` function (the one we printed and
    # copied above) here
    def forward(self, y):
        x = self.x
        add_1 = x + y;  x = y = None
        return add_1

# Create an instance of the original, untraced Module. Then, create an
# instance of the Module with the copied `forward` function. We can
# now compare the output of both the original and the traced version.
pre_trace = M()
post_trace = SubclassM()

使用 `to_folder` 函数从 `GraphModule` ¶

GraphModule.to_folder() 是 GraphModule 中的一个方法，允许您将生成的 FX 代码导出到文件夹。尽管将前向传递复制到代码中通常就足够了，如在“打印生成的代码”中，但使用 to_folder 可能更容易检查模块和参数。

m = symbolic_trace(M())
m.to_folder("foo", "Bar")
from foo import Bar
y = Bar()

在运行上述示例之后，我们就可以查看 foo/module.py 中的代码并根据需要对其进行修改（例如添加 print 语句或使用 pdb ）以调试生成的代码。

调试转换 ¶

现在我们已经确定一个转换正在生成错误的代码，现在是时候调试这个转换本身了。首先，我们将检查文档中的“符号跟踪的限制”部分。一旦我们验证跟踪是否按预期工作，目标就变成了找出我们的 GraphModule 转换中出了什么问题。在《编写转换》中可能有一个快速的答案，但如果没有，我们有几种方法可以检查我们的跟踪模块：

# Sample Module
class M(torch.nn.Module):
    def forward(self, x, y):
        return x + y

# Create an instance of `M`
m = M()

# Symbolically trace an instance of `M` (returns a GraphModule). In
# this example, we'll only be discussing how to inspect a
# GraphModule, so we aren't showing any sample transforms for the
# sake of brevity.
traced = symbolic_trace(m)

# Print the code produced by tracing the module.
print(traced)
# The generated `forward` function is:
"""
def forward(self, x, y):
    add = x + y;  x = y = None
    return add
"""

# Print the internal Graph.
print(traced.graph)
# This print-out returns:
"""
graph():
    %x : [num_users=1] = placeholder[target=x]
    %y : [num_users=1] = placeholder[target=y]
    %add : [num_users=1] = call_function[target=operator.add](args = (%x, %y), kwargs = {})
    return add
"""

# Print a tabular representation of the internal Graph.
traced.graph.print_tabular()
# This gives us:
"""
opcode         name    target                   args    kwargs
-------------  ------  -----------------------  ------  --------
placeholder    x       x                        ()      {}
placeholder    y       y                        ()      {}
call_function  add     <built-in function add>  (x, y)  {}
output         output  output                   (add,)  {}
"""

使用上面的实用函数，我们可以比较我们在应用转换前后跟踪的模块。有时，简单的视觉比较就足以追踪到错误。如果仍然不清楚出了什么问题，使用 pdb 这样的调试器可以是一个好的下一步。

根据上面的例子，考虑以下代码：

# Sample user-defined function
def transform_graph(module: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:
    # Get the Graph from our traced Module
    g = tracer_class().trace(module)

    """
    Transformations on `g` go here
    """

    return fx.GraphModule(module, g)

# Transform the Graph
transformed = transform_graph(traced)

# Print the new code after our transforms. Check to see if it was
# what we expected
print(transformed)

使用上面的例子，假设 print(traced) 的调用显示我们的转换中存在错误。我们想通过调试器找出错误所在。我们启动一个 pdb 会话。我们可以通过在 transform_graph(traced) 处中断，然后按 s “进入” transform_graph(traced) 的调用来查看转换过程中的情况。

我们也可以通过编辑 print_tabular 方法来打印图中节点不同的属性。（例如，我们可能想看到节点的 input_nodes 和 users 。）

可用调试器 §

最常用的 Python 调试器是 pdb。您可以通过在命令行中输入 python -m pdb FILENAME.py 以“调试模式”启动您的程序，其中 FILENAME 是要调试的文件名。之后，您可以使用 pdb 调试器命令逐步执行您的程序。通常，在开始 pdb 时设置一个断点（ b LINE-NUMBER ），然后调用 c 运行程序直到该点。这可以防止您必须逐行执行（使用 s 或 n ）才能到达想要检查的代码部分。或者，您可以在想要中断的行之前写入 import pdb; pdb.set_trace() 。如果添加 pdb.set_trace() ，则您的程序将在运行时自动进入调试模式。（换句话说，您可以直接在命令行中输入 python FILENAME.py 而不是 python -m pdb FILENAME.py 。）一旦您以调试模式运行文件，您就可以使用某些命令逐步执行代码并检查程序的内部状态。网上有许多关于 pdb 的优秀教程，包括 RealPython 的“使用 pdb 进行 Python 调试”。

PyCharm 或 VSCode 等集成开发环境通常内置了调试器。在您的 IDE 中，您可以选择以下方式之一：a) 通过在 IDE 中打开终端窗口（例如 VSCode 中的“视图→终端”）来使用 pdb ；b) 使用内置的调试器（通常是一个围绕 pdb 的图形包装器）。

符号跟踪的限制

FX 使用符号跟踪（也称为符号执行）系统来以可转换/可分析的形式捕获程序的语义。该系统是跟踪的，因为它执行程序（实际上是 torch.nn.Module 或函数）以记录操作。它是符号的，因为在执行过程中通过程序的数据不是真实数据，而是符号（在 FX 术语中为 Proxy ）。

虽然符号跟踪对于大多数神经网络代码都有效，但它有一些局限性。

动态控制流

符号跟踪的主要局限性是它目前不支持动态控制流。也就是说，循环或 if 语句的条件可能依赖于程序输入的值。

例如，让我们分析以下程序：

def func_to_trace(x):
    if x.sum() > 0:
        return torch.relu(x)
    else:
        return torch.neg(x)

traced = torch.fx.symbolic_trace(func_to_trace)
"""
  <...>
  File "dyn.py", line 6, in func_to_trace
    if x.sum() > 0:
  File "pytorch/torch/fx/proxy.py", line 155, in __bool__
    return self.tracer.to_bool(self)
  File "pytorch/torch/fx/proxy.py", line 85, in to_bool
    raise TraceError('symbolically traced variables cannot be used as inputs to control flow')
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
"""

if 语句的条件依赖于 x.sum() 的值，而 x.sum() 的值又依赖于 x 的值，这是一个函数输入。由于 x 可以改变（即如果你向跟踪函数传递新的输入张量），这就是动态控制流。回溯会沿着你的代码向上走，以显示这种情况发生的位置。

静态控制流 ¶

另一方面，所谓的静态控制流得到了支持。静态控制流是指值在调用之间不能改变的循环或 if 语句。通常，在 PyTorch 程序中，这种控制流出现在根据超参数做出决策的代码中。作为一个具体的例子：

import torch
import torch.fx

class MyModule(torch.nn.Module):
    def __init__(self, do_activation : bool = False):
        super().__init__()
        self.do_activation = do_activation
        self.linear = torch.nn.Linear(512, 512)

    def forward(self, x):
        x = self.linear(x)
        # This if-statement is so-called static control flow.
        # Its condition does not depend on any input values
        if self.do_activation:
            x = torch.relu(x)
        return x

without_activation = MyModule(do_activation=False)
with_activation = MyModule(do_activation=True)

traced_without_activation = torch.fx.symbolic_trace(without_activation)
print(traced_without_activation.code)
"""
def forward(self, x):
    linear_1 = self.linear(x);  x = None
    return linear_1
"""

traced_with_activation = torch.fx.symbolic_trace(with_activation)
print(traced_with_activation.code)
"""
import torch
def forward(self, x):
    linear_1 = self.linear(x);  x = None
    relu_1 = torch.relu(linear_1);  linear_1 = None
    return relu_1
"""

if-语句 if self.do_activation 不依赖于任何函数输入，因此它是静态的。 do_activation 可以被视为一个超参数，而 MyModule 的不同实例的痕迹，该参数有不同的值，有不同的代码。这是一个有效的模式，也是符号跟踪所支持的。

许多动态控制流的实例在语义上是静态控制流。通过移除对输入值的依赖，例如通过将值移动到 Module 属性或绑定具体值到符号跟踪期间的参数，可以将这些实例转换为支持符号跟踪：

def f(x, flag):
    if flag: return x
    else: return x*2

fx.symbolic_trace(f) # Fails!

fx.symbolic_trace(f, concrete_args={'flag': True})

对于真正的动态控制流，包含此代码的程序部分可以被视为对方法（请参阅使用 Tracer 类自定义跟踪）或函数（请参阅 wrap() ）的调用，而不是通过它们进行跟踪。

非函数 `torch`

FX 使用 __torch_function__ 作为拦截调用的机制（有关此内容的更多信息，请参阅技术概述）。一些函数，例如内置的 Python 函数或 math 模块中的函数，不受 __torch_function__ 的保护，但我们仍然希望将它们捕获在符号跟踪中。例如：

import torch
import torch.fx
from math import sqrt

def normalize(x):
    """
    Normalize `x` by the size of the batch dimension
    """
    return x / sqrt(len(x))

# It's valid Python code
normalize(torch.rand(3, 4))

traced = torch.fx.symbolic_trace(normalize)
"""
  <...>
  File "sqrt.py", line 9, in normalize
    return x / sqrt(len(x))
  File "pytorch/torch/fx/proxy.py", line 161, in __len__
    raise RuntimeError("'len' is not supported in symbolic tracing by default. If you want "
RuntimeError: 'len' is not supported in symbolic tracing by default. If you want this call to be recorded, please call torch.fx.wrap('len') at module scope
"""

错误提示我们内置函数 len 不受支持。我们可以使此类函数在跟踪中以直接调用方式记录，使用 wrap() API：

torch.fx.wrap('len')
torch.fx.wrap('sqrt')

traced = torch.fx.symbolic_trace(normalize)

print(traced.code)
"""
import math
def forward(self, x):
    len_1 = len(x)
    sqrt_1 = math.sqrt(len_1);  len_1 = None
    truediv = x / sqrt_1;  x = sqrt_1 = None
    return truediv
"""

使用 `Tracer` 类定制跟踪 ¶

Tracer 类是实现 symbolic_trace 的底层类。可以通过子类化 Tracer 来自定义跟踪的行为，如下所示：

class MyCustomTracer(torch.fx.Tracer):
    # Inside here you can override various methods
    # to customize tracing. See the `Tracer` API
    # reference
    pass


# Let's use this custom tracer to trace through this module
class MyModule(torch.nn.Module):
    def forward(self, x):
        return torch.relu(x) + torch.ones(3, 4)

mod = MyModule()

traced_graph = MyCustomTracer().trace(mod)
# trace() returns a Graph. Let's wrap it up in a
# GraphModule to make it runnable
traced = torch.fx.GraphModule(mod, traced_graph)

叶模块

叶模块是出现在符号跟踪中的调用模块，而不是通过跟踪实现的模块。默认的叶模块集是标准 torch.nn 模块实例的集合。例如：

class MySpecialSubmodule(torch.nn.Module):
    def forward(self, x):
        return torch.neg(x)

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(3, 4)
        self.submod = MySpecialSubmodule()

    def forward(self, x):
        return self.submod(self.linear(x))

traced = torch.fx.symbolic_trace(MyModule())
print(traced.code)
# `linear` is preserved as a call, yet `submod` is traced though.
# This is because the default set of "Leaf Modules" includes all
# standard `torch.nn` modules.
"""
import torch
def forward(self, x):
    linear_1 = self.linear(x);  x = None
    neg_1 = torch.neg(linear_1);  linear_1 = None
    return neg_1
"""

可以通过覆盖 Tracer.is_leaf_module() 来自定义叶模块集。

杂项 ¶

索引构造函数（例如 torch.zeros ， torch.ones ， torch.rand ， torch.randn ， torch.sparse_coo_tensor ）目前无法追踪。
- 确定性构造函数（ zeros ， ones ）可以使用，它们产生的值将被嵌入到追踪中作为常量。这只有在这些构造函数的参数引用动态输入大小时才会成为问题。在这种情况下， ones_like 或 zeros_like 可能是可行的替代品。
- 非确定性构造函数（ rand ， randn ）将在追踪中嵌入单个随机值。这很可能不是预期的行为。一种解决方案是将 torch.randn 包裹在 torch.fx.wrap 函数中，并调用该函数。
@torch.fx.wrap def torch_randn(x, shape): return torch.randn(shape) def f(x): return x + torch_randn(x, 5) fx.symbolic_trace(f)
- 该行为可能在未来的版本中得到修复。
类型注解
- 支持 Python 3 风格的类型注解（例如 func(x : torch.Tensor, y : int) -> torch.Tensor ），并且符号跟踪将保留这些注解。
- 目前不支持 Python 2 风格的注释类型注解 # type: (torch.Tensor, int) -> torch.Tensor 。
- 函数内部对局部名称的注释目前不支持。

关于 training 标志和子模块的注意事项。

当使用 torch.nn.functional.dropout 等函数时，训练参数通常会被传递为 self.training 。在 FX 追踪期间，这很可能会被固化为一个常量值。

import torch
import torch.fx

class DropoutRepro(torch.nn.Module):
  def forward(self, x):
    return torch.nn.functional.dropout(x, training=self.training)


traced = torch.fx.symbolic_trace(DropoutRepro())
print(traced.code)
"""
def forward(self, x):
  dropout = torch.nn.functional.dropout(x, p = 0.5, training = True, inplace = False);  x = None
  return dropout
"""

traced.eval()

x = torch.randn(5, 3)
torch.testing.assert_close(traced(x), x)
"""
AssertionError: Tensor-likes are not close!

Mismatched elements: 15 / 15 (100.0%)
Greatest absolute difference: 1.6207983493804932 at index (0, 2) (up to 1e-05 allowed)
Greatest relative difference: 1.0 at index (0, 0) (up to 0.0001 allowed)
"""

然而，当使用标准 nn.Dropout() 子模块时，训练标志被封装，并且由于保留了 nn.Module 对象模型，可以更改。

class DropoutRepro2(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.drop = torch.nn.Dropout()

  def forward(self, x):
    return self.drop(x)

traced = torch.fx.symbolic_trace(DropoutRepro2())
print(traced.code)
"""
def forward(self, x):
  drop = self.drop(x);  x = None
  return drop
"""

traced.eval()

x = torch.randn(5, 3)
torch.testing.assert_close(traced(x), x)

由于这种差异，请考虑将动态交互 training 标志的模块标记为叶模块。

API 参考指南

torch.fx.symbolic_trace(root, concrete_args=None)[source][source] 参考指南

符号追踪 API

给定一个 nn.Module 或函数实例 root ，此函数将返回一个通过记录在 root 中看到的操作所构建的 GraphModule 。

concrete_args 允许您部分专门化您的函数，无论是要移除控制流还是数据结构。

例如：

def f(a, b):
    if b == True:
        return a
    else:
        return a * 2

由于存在控制流，FX 通常无法追踪此操作。但是，我们可以使用 concrete_args 来专门化 b 的值以追踪此操作：

f = fx.symbolic_trace(f, concrete_args={"b": False})
assert f(3, False) == 6

注意，尽管您仍然可以传入不同的 b 值，但它们将被忽略。

我们还可以使用 concrete_args 来消除函数中的数据结构处理。这将使用 pytrees 来简化您的输入。为了避免过度专业化，对于不应专业化的值，请传入 fx.PH。例如：

def f(x):
    out = 0
    for v in x.values():
        out += v
    return out


f = fx.symbolic_trace(f, concrete_args={"x": {"a": fx.PH, "b": fx.PH, "c": fx.PH}})
assert f({"a": 1, "b": 2, "c": 4}) == 7

参数:

root (Union[torch.nn.Module, Callable]) – 要追踪和转换为图表示的模块或函数。
concrete_args (Optional[Dict[str, any]]) – 部分专业化的输入

返回:

由 root 记录的操作创建的模块

返回类型:

图模块

注意

本 API 向后兼容性得到保证。

torch.fx.wrap(fn_or_name)[source][source]¶

此函数可以在模块级作用域中调用，将 fn_or_name 注册为“叶子函数”。一个“叶子函数”将被保留为 FX 跟踪中的 CallFunction 节点，而不是被跟踪：

# foo/bar/baz.py
def my_custom_function(x, y):
    return x * x + y * y


torch.fx.wrap("my_custom_function")


def fn_to_be_traced(x, y):
    # When symbolic tracing, the below call to my_custom_function will be inserted into
    # the graph rather than tracing it.
    return my_custom_function(x, y)

此函数也可以等效地用作装饰器：

# foo/bar/baz.py
@torch.fx.wrap
def my_custom_function(x, y):
    return x * x + y * y

被包装的函数可以被视为“叶子函数”，类似于“叶子模块”的概念，即它们是保留在 FX 跟踪中的调用，而不是被跟踪的函数。

参数:: fn_or_name (Union[str, Callable]) – 当调用时，将函数或全局函数的名称插入到图中的函数

注意

本 API 保证向后兼容性。

class torch.fx.GraphModule(*args, **kwargs)[source][source]¶

GraphModule 是由 fx.Graph 生成的 nn.Module。Graphmodule 具有从 graph 属性生成的 code 以及 forward 和 graph 属性。

警告

当 graph 被重新赋值时， code 和 forward 将会自动重新生成。然而，如果您编辑了 graph 的内容而没有重新赋值 graph 属性本身，您必须调用 recompile() 来更新生成的代码。

注意

此 API 保证向后兼容。

__init__(root, graph, class_name='GraphModule')[source][source]¶

构建一个 GraphModule。

参数:

root (Union[torch.nn.Module, Dict[str, Any]) – root 可以是 nn.Module 实例或映射字符串到任何属性类型的 Dict。如果 root 是 Module，则 Graph 的 Nodes 的 target 字段中通过限定名称引用的基于 Module 的对象将被从 root 的 Module 层级复制到 GraphModule 的模块层级。如果 root 是 dict，则 Node 的 target 中找到的限定名称将直接在 dict 的键中查找。映射到 Dict 的对象将被复制到 GraphModule 的模块层级中相应的位置。
图（图）- graph 包含此 GraphModule 应使用的节点以进行代码生成
class_name（字符串）- name 表示此 GraphModule 的调试名称。如果未设置，所有错误消息都将报告为来自 GraphModule 。将其设置为 root 的原始名称或适合您转换上下文的名称可能很有帮助。

注意

此 API 向后兼容性得到保证。

add_submodule(target, m)[源代码][源代码] ¶

将给定的子模块添加到 self 。

如果不存在，则安装空的模块，这些模块是 target 的子路径。

参数:

target (str) – 新子模块的完全限定字符串名称（参见 nn.Module.get_submodule 中的示例，了解如何指定完全限定字符串。）
m (Module) – 子模块本身；我们想要安装到当前模块的实际对象

返回:

子模块是否可以插入。: 此方法返回 True，链中的每个对象（由 target 表示）必须满足以下条件之一：a)尚不存在，或 b)引用 nn.Module （不是参数或其他属性）

返回类型:

布尔型

注意

此 API 的后向兼容性得到保证。

属性 codestr ¶: 返回由 Graph 生成的 Python 代码。

delete_all_unused_submodules()[source][source]¶

从 self 中删除所有未使用的子模块。

模块被认为“被使用”，如果以下任何一个条件成立：1. 它有被使用的子模块 2. 它的前向操作通过一个 call_module 节点直接调用 3. 它有一个非模块属性，该属性从一个 get_attr 节点被使用

可以调用此方法来清理 nn.Module ，而无需手动对每个未使用的子模块调用 delete_submodule 。

注意

此 API 向后兼容性得到保证。

delete_submodule(target)[source][source]¶

属性图 Graph ¶: 返回此 Graph 的底层 GraphModule

打印可读输出（print_output=True，include_stride=False，include_device=False，colored=False）[source][source] ¶: 返回当前 GraphModule 及其子 GraphModule 生成的 Python 代码

警告

此 API 为实验性，且不向后兼容。

recompile()[source][source]¶

从其 graph 属性重新编译此 GraphModule。编辑包含的 graph 之后，应该调用此操作，否则此 GraphModule 生成的代码将过时。

注意

本 API 向后兼容性有保证。

返回类型:: Python 代码

to_folder(folder, module_name='FxModule')[source][source]¶

class torch.fx.Graph(拥有模块=None, 跟踪器类=None, 跟踪器额外参数=None)[source][source] ¶

class torch.fx.Node(graph, name, op, target, args, kwargs, return_type=None)[source][source]¶

class torch.fx.Tracer(autowrap_modules=(math,), autowrap_functions=())[source][source]¶

class torch.fx.Proxy(node, tracer=None)[source][source]¶

Proxy 对象是 Node 包装器，在符号跟踪期间通过程序流动并记录它们接触到的所有操作（ torch 函数调用、方法调用、运算符）到不断增长的 FX 图。

如果您正在进行图转换，您可以将自己的 Proxy 方法包装在原始的 Node 之上，这样您就可以使用重载的运算符向 Graph 添加额外的内容。

对象无法迭代。换句话说，如果在一个循环或作为 Proxy / *args / **kwargs 函数参数中使用 Proxy ，符号追踪器将抛出错误。

有两种主要的解决方案：1. 将不可追踪的逻辑提取到顶级函数中，并在其上使用 fx.wrap 。2. 如果控制流是静态的（即循环迭代次数基于某些超参数），则可以将代码保留在原始位置，并重构为类似以下内容：

for i in range(self.some_hyperparameter):
    indexed_item = proxied_value[i]

想要更详细地了解代理内部机制，请查看 torch/fx/README.md 中的“Proxy”部分。

注意

本 API 向后兼容性得到保证。

class torch.fx.Interpreter(module, garbage_collect_values=True, graph=None)[source][source]¶

class torch.fx.Transformer(module)[source][source]¶

Transformer 是一种特殊的解释器，它产生一个新的 Module 。它公开了一个 transform() 方法，该方法返回转换后的 Module 。 Transformer 不需要任何参数即可运行，而 Interpreter 需要。 Transformer 完全以符号方式工作。

示例

假设我们想要交换所有 torch.neg 的实例与 torch.sigmoid 以及它们的 Tensor 方法等效物（包括它们的 Tensor 方法等效物）。我们可以像这样子类化 Transformer ：

class NegSigmSwapXformer(Transformer):
    def call_function(
        self, target: "Target", args: Tuple[Argument, ...], kwargs: Dict[str, Any]
    ) -> Any:
        if target == torch.sigmoid:
            return torch.neg(*args, **kwargs)
        return super().call_function(target, args, kwargs)

    def call_method(
        self, target: "Target", args: Tuple[Argument, ...], kwargs: Dict[str, Any]
    ) -> Any:
        if target == "neg":
            call_self, *args_tail = args
            return call_self.sigmoid(*args_tail, **kwargs)
        return super().call_method(target, args, kwargs)


def fn(x):
    return torch.sigmoid(x).neg()


gm = torch.fx.symbolic_trace(fn)

transformed: torch.nn.Module = NegSigmSwapXformer(gm).transform()
input = torch.randn(3, 4)
torch.testing.assert_close(transformed(input), torch.neg(input).sigmoid())

参数:: 模块（GraphModule）- 要转换的 Module

注意

本 API 向后兼容性得到保证。

call_function(target, args, kwargs)[source][source]¶

注意

本 API 向后兼容性得到保证。

返回类型:: 任何

call_module(target, args, kwargs)[source][source]¶

注意

本 API 向后兼容性得到保证。

返回类型:: 任何

get_attr(target, args, kwargs)[source][source]¶

执行一个 get_attr 节点。在 Transformer 中，这被覆盖以在输出图中插入一个新的 get_attr 节点。

参数:

目标（目标）- 此节点的调用目标。有关语义详情，请参阅节点
args（元组）- 此调用位置参数的元组
kwargs（字典）- 此调用关键字参数的字典

返回类型:

代理

注意

本 API 向后兼容性得到保证。

placeholder(target, args, kwargs)[源码][源码] ¶

执行一个 placeholder 节点。在 Transformer 中，这被覆盖以向输出图插入一个新的 placeholder 。

参数:

目标（Target）- 此节点的调用目标。有关语义的详细信息，请参阅节点。
args（元组）- 本调用中位置参数的元组
kwargs（字典）- 本调用中关键字参数的字典

返回类型:

代理

注意

本 API 向后兼容性得到保证。

transform()[source][source]¶

Transform self.module and return the transformed GraphModule.

注意

此 API 向后兼容性得到保证。

返回类型:: GraphModule

torch.fx.replace_pattern(gm, pattern, replacement)[source][source]¶

匹配所有可能的非重叠操作符及其数据依赖集（ pattern ）在 GraphModule（ gm ）的图中，然后将这些匹配的子图替换为另一个子图（ replacement ）。

参数:

gm (GraphModule) – 包装图的 GraphModule 以进行操作
pattern (Union[Callable, GraphModule]) – 要在 gm 中匹配以进行替换的子图
替换（Union[Callable, GraphModule]）- 要替换 pattern 的子图

返回:

代表原始图中与 pattern 匹配的地点的 Match 对象列表。如果没有匹配项，则列表为空。 Match 定义如下：

class Match(NamedTuple):
    # Node from which the match was found
    anchor: Node
    # Maps nodes in the pattern subgraph to nodes in the larger graph
    nodes_map: Dict[Node, Node]

返回类型:

Match 列表

示例：

import torch
from torch.fx import symbolic_trace, subgraph_rewriter


class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, x, w1, w2):
        m1 = torch.cat([w1, w2]).sum()
        m2 = torch.cat([w1, w2]).sum()
        return x + torch.max(m1) + torch.max(m2)


def pattern(w1, w2):
    return torch.cat([w1, w2])


def replacement(w1, w2):
    return torch.stack([w1, w2])


traced_module = symbolic_trace(M())

subgraph_rewriter.replace_pattern(traced_module, pattern, replacement)

上述代码将首先在 traced_module 的 forward 方法中匹配 pattern 。模式匹配基于 use-def 关系，而不是节点名称。例如，如果您有 pattern 在 p = torch.cat([a, b]) 中，则可以在原始 forward 函数中匹配 m = torch.cat([a, b]) ，尽管变量名称不同（ p 与 m ）。

return 语句在 pattern 中仅根据其值进行匹配；它可能或可能不匹配到更大图中的 return 语句。换句话说，模式不必扩展到更大图的末尾。

当模式匹配时，它将从更大函数中移除，并由 replacement 替换。如果更大函数中有多个 pattern 匹配，则每个非重叠匹配都将被替换。在匹配重叠的情况下，重叠匹配集中找到的第一个匹配将被替换。（这里的“第一个”是指节点使用-定义关系的拓扑排序中的第一个。在大多数情况下，第一个节点是直接出现在 self 之后的参数，而最后一个节点是函数返回的内容。）

有一个重要的事情需要注意，那就是 pattern 可调用函数的参数必须在可调用函数本身中使用，而 replacement 可调用函数的参数必须匹配该模式。第一条规则是为什么在上面的代码块中， forward 函数有参数 x, w1, w2 ，但 pattern 函数只有参数 w1, w2 。 pattern 没有使用 x ，因此不应将其指定为参数。作为第二条规则的例子，考虑替换

def pattern(x, y):
    return torch.neg(x) + torch.relu(y)

替换为

def replacement(x, y):
    return torch.relu(x)

在这种情况下， replacement 需要和 pattern 相同数量的参数（ x 和 y 都需要），即使参数 y 在 replacement 中没有被使用。

调用 subgraph_rewriter.replace_pattern 之后，生成的 Python 代码看起来像这样：

def forward(self, x, w1, w2):
    stack_1 = torch.stack([w1, w2])
    sum_1 = stack_1.sum()
    stack_2 = torch.stack([w1, w2])
    sum_2 = stack_2.sum()
    max_1 = torch.max(sum_1)
    add_1 = x + max_1
    max_2 = torch.max(sum_2)
    add_2 = add_1 + max_2
    return add_2

注意

该 API 的向后兼容性得到保证。

指令码	名称	目标	参数	关键字参数
占位符	x	x	()	{}
获取属性	线性权重	线性.weight	()	{}
调用函数	加 1	<内置函数加>	(x, 线性权重)	{}
调用模块	线性_1	线性	(add_1,)	{}
调用方法	relu_1	relu	(线性_1,)	{}
调用函数	求和 1	<内置方法 sum …>	(relu_1,)	{‘维度’：-1}
调用函数	topk_1	<内置方法 topk …>	(和_1, 3)	{}
输出	输出	输出	(topk_1,)	{}

torch.fx¶

概述 ¶

编写转换

图论快速入门指南

图形操作 ¶

直接图形操作 ¶

使用 replace_pattern() 进行子图重写

图操作示例 ¶

代理/重绘

解释器模式 ¶

解释器模式的示例

调试

简介

常见在转换创作中的陷阱 ¶

检查模块的正确性 ¶

生成代码的调试

使用 pdb ¶

打印生成的代码 ¶

使用 to_folder 函数从 GraphModule ¶

调试转换 ¶

可用调试器 §

符号跟踪的限制

动态控制流

静态控制流 ¶

非函数 torch

使用 Tracer 类定制跟踪 ¶

叶模块

杂项 ¶

API 参考指南

文档

教程

资源

使用 `pdb` ¶

使用 `to_folder` 函数从 `GraphModule` ¶

非函数 `torch`

使用 `Tracer` 类定制跟踪 ¶