理解TensorFlow的@tf.function装饰器-EW帮帮网

理解TensorFlow的@tf.function装饰器

介绍
- Python装饰器(Decorator)和*tf.function()*
`tf.function()`是如何工作的
`@tf.function`的最佳实践
小结

介绍

在训练机器学习模型时，提高训练循环的性能可以节省训练时间。提高 TensorFlow 代码性能的方法之一是使用 tf.function() 装饰器，这行简单的代码可使函数的运行速度显著提高。

Python装饰器(Decorator)和tf.function()

在 Python 中，装饰器是一个可以修改其它函数行为的函数。

import tensorflow as tf
import time
import timeit

x = tf.random.uniform(shape=[100, 100], minval=-1, maxval=1, dtype=tf.dtypes.float32)
def some_costly_computation(x):
    aux = tf.eye(100, dtype=tf.dtypes.float32)
    result = tf.zeros(100, dtype = tf.dtypes.float32)
    for i in range(1,100):
        aux = tf.matmul(x,aux)/i
        result = result + aux
    return result

print("costly origin 10 repeat", timeit.repeat(stmt="some_costly_computation(x)", 
                    repeat=5,
                    number=10,
                    globals=globals()))

costly origin 10 repeat [0.126452, 0.12393320000000019, 0.11928129999999992, 0.1145529999999999, 0.1171840000000004]

如果我们将costly函数作为参数传给tf.function()

quicker_computation = tf.function(some_costly_computation)
print("costly quick 10 repeat", timeit.repeat(stmt="quicker_computation(x)", 
                    repeat=5,
                    number=10,
                    globals=globals()))

我们将获得一个比之前函数运行要快得多的quicker_computation()

costly quick 10 repeat [0.02621770000000012, 0.02595930000000024, 0.025386300000000084, 0.024983999999999895, 0.026588900000000137]

tf.function() 会修改 some_costly_computation()，并输出 quicker_computation() 函数。
如上所述，装饰器是一个修改其它函数行为的函数，因此tf.function() 自然是一个装饰器。

使用装饰器修饰符@与调用 tf.function(function) 的效果相同相同：

@tf.function
def quick_computation(x):
  aux = tf.eye(100, dtype=tf.dtypes.float32)
  result = tf.zeros(100, dtype = tf.dtypes.float32)
  for i in range(1,100):
    aux = tf.matmul(x,aux)/i
    result = result + aux
  return result
 print("costly2 10 repeat", timeit.repeat(stmt="quick_computation(x)", 
                    repeat=5,
                    number=10,
                    globals=globals()))

costly2 10 repeat [0.31421259999999984, 0.022836499999999926, 0.02309440000000018, 0.02442650000000013, 0.024493899999999957]

`tf.function()`是如何工作的

是什么让我们的函数运行效率提高了？

TensorFlow 代码可以在两种模式下运行：eager mode和graph mode。eager mode就是标准的交互式的运行python代码的方式：每次调用函数时，都会执行该函数。

graph mode略有不同。在graph mode下，在执行函数之前，TensorFlow 会创建一个计算图，计算图是一个包含执行函数所需operation的数据结构。计算图允许 TensorFlow 简化计算并尽量是计算并行化。计算图还将函数与上层 Python 代码隔离开来，从而使其可以在多种不同设备上高效运行。

被 @tf.function 修饰的函数分两步执行：

tracing:TensorFlow 执行函数的 Python 代码并将其编译为计算图，延迟执行 TensorFlow的operation。
运行计算图。

首次运行时由于需要将函数编译为计算图，因此会需要更长时间。在第二次运行以后，由于不需要创建新的计算图，第一步将被跳过。这将大幅提高函数的性能，但同时也意味着函数的执行不会像普通 Python 代码那样（即每一行可执行代码都会被执行）。例如，让我们修改之前的函数：

@tf.function
def quick_computation(x):
  print('Only prints the first time!')
  aux = tf.eye(100, dtype=tf.dtypes.float32)
  result = tf.zeros(100, dtype = tf.dtypes.float32)
  for i in range(1,100):
    aux = tf.matmul(x,aux)/i
    result = result + aux
  return result

quick_computation(x)
quick_computation(x)

输出为

Only prints the first time!

print() 只在tracing中即运行常规Python代码时执行一次。接下来的函数调用只执行计算图中的 TenforFlow operation。

但如果我们使用 tf.print() 代替print():

@tf.function
def quick_computation_with_print(x):
  tf.print("Prints every time!")
  aux = tf.eye(100, dtype=tf.dtypes.float32)
  result = tf.zeros(100, dtype = tf.dtypes.float32)
  for i in range(1,100):
    aux = tf.matmul(x,aux)/i
    result = result + aux
  return result

quick_computation_with_print(x)
quick_computation_with_print(x)

Prints every time!
Prints every time!

TensorFlow 将 tf.print() 包含在其计算图中，因为它是一个 TensorFlow operation，而不是一个普通的 Python 函数。

警告：在每次调用用 @tf.function 装饰的函数时，并非所有 Python 代码都会被执行。在tracing后，只有计算图中的operation才会被运行，这意味着我们在编写代码时必须小心谨慎。

`@tf.function`的最佳实践

用TensorFlow Operation编写代码

如上所示，代码的某些部分会被计算图忽略。这使得在使用 "正常 "的 Python 代码编码时，很难预测函数的行为，就像我们刚才看到的print()。为了避免软件的意外行为，最好使用 TensorFlow operations对函数进行编码。

例如，for 循环和 while 循环未必能转换成等效的 TensorFlow 循环。因此，最好将 "for "循环写成矢量operation。这将提高代码的性能，并确保函数的正确tracing。

x = tf.random.uniform(shape=[100, 100], minval=-1, maxval=1, dtype=tf.dtypes.float32)

@tf.function
def function_with_for(x):
    summ = float(0)
    for row in x:
      summ = summ + tf.reduce_mean(row)
    return summ

@tf.function
def vectorized_function(x):
  result = tf.reduce_mean(x, axis=0)
  return tf.reduce_sum(result)

print(function_with_for(x))
print(vectorized_function(x))

print(timeit.timeit(stmt="function_with_for(x)", 
                    number=10,
                    globals=globals()))
print(timeit.timeit(stmt="vectorized_function(x)", 
                    number=10,
                    globals=globals()))

tf.Tensor(-0.5351627, shape=(), dtype=float32)
tf.Tensor(-0.53516287, shape=(), dtype=float32)
0.009135900000000419
0.001650699999999894

采用TensorFlow operation的写法（矢量）的代码运行得更快。

避免引用全局变量

考虑以下代码：

x = tf.Variable(2, dtype=tf.dtypes.float32)
y = 2

@tf.function
def power(x):
  return tf.pow(x,y)

print(power(x))

y = 3

print(power(x))

tf.Tensor(4.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

第一次调用被装饰的函数 power()时，输出值是预期的 4。但第二次调用时，函数忽略了 y 的值发生了变化。
出现这种情况是因为 Python 全局变量的值在tracing后被冻结。

一个回避的方法是对所有变量使用 tf.Variable()，并将这它们作为参数传递给函数。

x = tf.Variable(2, dtype=tf.dtypes.float32)
y = tf.Variable(2, dtype=tf.dtypes.float32)

@tf.function
def power(x,y):
  return tf.pow(x,y)

print(power(x,y))

y.assign(3)

print(power(x,y))

tf.Tensor(4.0, shape=(), dtype=float32)
tf.Tensor(8.0, shape=(), dtype=float32)

Debugging `@tf.function`

一般来说，你需要在 eager mode下调试你的函数，然后在代码正常运行后再用 @tf.function 装饰它们，因为 eager mode下的错误更容易调试。
我们常见的错误是type errors和shape errors。

当代码中涉及的变量类型不匹配时，就会发生type error：

x = tf.Variable(1, dtype = tf.dtypes.float32)
y = tf.Variable(1, dtype = tf.dtypes.int32)

z = tf.add(x,y)

InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2]

type error很容易出现，只要将变量转换成不同的类型就能轻松解决：

y = tf.cast(y, tf.dtypes.float32)
z = tf.add(x, y) 
tf.print(z) # 2

当你的tensors不匹配你的operation所需的shape时，就会发生shape errors

x = tf.random.uniform(shape=[100, 100], minval=-1, maxval=1, dtype=tf.dtypes.float32)
y = tf.random.uniform(shape=[1, 100], minval=-1, maxval=1, dtype=tf.dtypes.float32)

z = tf.matmul(x,y)

InvalidArgumentError: Matrix size-incompatible: In[0]: [100,100], In[1]: [1,100] [Op:MatMul]

解决这两种错误的一个便捷工具是交互式 Python 调试器。在 Jupyter Notebook 中，可以使用 %pdb 自动调用它。使用该调试器，您可以编写函数代码，并测试一些常见用例。如果出现错误，会打开一个交互式提示。通过该提示，您可以在代码中跳转，并检查 TensorFlow 变量的值、types和shapes。

小结

如上所述，使用TensorFlow 的 tf.function()可以使函数更高效地运行，@tf.function 装饰器会拥有等价的效果。
这种加速对于会被多次调用的函数非常有用，例如机器学习模型的自定义训练步骤。

参考：understanding-tensorflows-tffunction-decorator

理解TensorFlow的@tf.function装饰器

理解TensorFlow的@tf.function装饰器

介绍

Python装饰器(Decorator)和tf.function()

`tf.function()`是如何工作的

`@tf.function`的最佳实践

用TensorFlow Operation编写代码

避免引用全局变量

Debugging `@tf.function`

小结

网站公告

今日签到

热门文章

最新发布

理解TensorFlow的@tf.function装饰器

理解TensorFlow的@tf.function装饰器

介绍

Python装饰器(Decorator)和tf.function()

tf.function()是如何工作的

@tf.function的最佳实践

用TensorFlow Operation编写代码

避免引用全局变量

Debugging @tf.function

小结

网站公告

今日签到

热门文章

最新发布

`tf.function()`是如何工作的

`@tf.function`的最佳实践

Debugging `@tf.function`