在本教程中,您将学习如何使用 Python 生成器轻松创建迭代,它与迭代器和常规函数的区别以及为什么要使用它。
Python 中的生成器
在 Python 中构建迭代器的工作很多。 我们必须使用__iter__()
和__next__()
方法实现一个类,跟踪内部状态,并在没有值要返回时提高StopIteration
。
这既冗长又违反直觉。 在这种情况下,发电机将进行救援。
Python 生成器是创建迭代器的简单方法。 我们上面提到的所有工作都由 Python 的生成器自动处理。
简而言之,生成器是一个函数,它返回一个对象(迭代器),我们可以对其进行迭代(一次一个值)。
用 Python 创建生成器
在 Python 中创建生成器非常简单。 它与定义普通函数一样简单,但是使用yield
语句而不是return
语句。
如果一个函数至少包含一个yield
语句(它可能包含其他yield
或return
语句),则它将成为生成器函数。yield
和return
都将从函数返回一些值。
不同之处在于,虽然return
语句完全终止了一个函数,但yield
语句暂停了该函数并保存了所有状态,然后在后续调用中从那里继续。
生成器函数与常规函数之间的区别
这是生成器功函数与常规函数不同的地方。
- 生成器函数包含一个或多个
yield
语句。 - 调用时,它返回一个对象(迭代器),但不会立即开始执行。
__iter__()
和__next__()
之类的方法会自动实现。 因此,我们可以使用next()
遍历项目。- 一旦函数屈服,函数将暂停并将控件转移到调用方。
- 连续调用之间会记住局部变量及其状态。
- 最后,当函数终止时,
StopIteration
在以后的调用中自动加高。
这是一个示例,用于说明上述所有要点。 我们有一个名为my_gen()
的生成器函数,带有几个yield
语句。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<span class="pl-c"># A simple generator function</span> <span class="pl-k">def</span> <span class="pl-en">my_gen</span>(): <span class="pl-s1">n</span> <span class="pl-c1">=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed first'</span>) <span class="pl-c"># Generator function contains yield statements</span> <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed second'</span>) <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed at last'</span>) <span class="pl-k">yield</span> <span class="pl-s1">n</span> |
解释器中的交互式运行如下所示。 在 Python Shell 中运行这些命令以查看输出。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-c"># It returns an object but does not start execution immediately.</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-s1">a</span> <span class="pl-c1">=</span> <span class="pl-en">my_gen</span>() <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-c"># We can iterate through the items using next().</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">next</span>(<span class="pl-s1">a</span>) <span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">first</span> <span class="pl-c1">1</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-c"># Once the function yields, the function is paused and the control is transferred to the caller.</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-c"># Local variables and theirs states are remembered between successive calls.</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">next</span>(<span class="pl-s1">a</span>) <span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">second</span> <span class="pl-c1">2</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">next</span>(<span class="pl-s1">a</span>) <span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">at</span> <span class="pl-s1">last</span> <span class="pl-c1">3</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-c"># Finally, when the function terminates, StopIteration is raised automatically on further calls.</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">next</span>(<span class="pl-s1">a</span>) <span class="pl-v">Traceback</span> (<span class="pl-s1">most</span> <span class="pl-s1">recent</span> <span class="pl-s1">call</span> <span class="pl-s1">last</span>): ... <span class="pl-v">StopIteration</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">next</span>(<span class="pl-s1">a</span>) <span class="pl-v">Traceback</span> (<span class="pl-s1">most</span> <span class="pl-s1">recent</span> <span class="pl-s1">call</span> <span class="pl-s1">last</span>): ... <span class="pl-v">StopIteration</span> |
在上面的示例中要注意的一件有趣的事情是,每次调用之间都会记住变量n
的值。
与普通函数不同,函数屈服时不会破坏局部变量。 此外,生成器对象只能被迭代一次。
要重新启动该过程,我们需要使用a = my_gen()
之类的东西来创建另一个生成器对象。
最后要注意的一点是,我们可以将生成器直接用于for
循环。
这是因为for
循环使用一个迭代器,并使用next()
函数对其进行迭代。 抛出StopIteration
时,它自动结束。 检查这里知道在 Python 中实际上是如何实现for
循环的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<span class="pl-c"># A simple generator function</span> <span class="pl-k">def</span> <span class="pl-en">my_gen</span>(): <span class="pl-s1">n</span> <span class="pl-c1">=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed first'</span>) <span class="pl-c"># Generator function contains yield statements</span> <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed second'</span>) <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> <span class="pl-en">print</span>(<span class="pl-s">'This is printed at last'</span>) <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-c"># Using for loop</span> <span class="pl-k">for</span> <span class="pl-s1">item</span> <span class="pl-c1">in</span> <span class="pl-en">my_gen</span>(): <span class="pl-en">print</span>(<span class="pl-s1">item</span>) |
运行该程序时,输出为:
1 2 3 4 5 6 |
<span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">first</span> <span class="pl-c1">1</span> <span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">second</span> <span class="pl-c1">2</span> <span class="pl-v">This</span> <span class="pl-c1">is</span> <span class="pl-s1">printed</span> <span class="pl-s1">at</span> <span class="pl-s1">last</span> <span class="pl-c1">3</span> |
带有循环的 Python 生成器
上面的例子用处不大,我们研究它只是为了了解背景中发生的事情。
通常,生成器函数通过具有合适终止条件的循环来实现。
让我们以反转字符串的生成器为例。
1 2 3 4 5 6 7 8 |
<span class="pl-k">def</span> <span class="pl-en">rev_str</span>(<span class="pl-s1">my_str</span>): <span class="pl-s1">length</span> <span class="pl-c1">=</span> <span class="pl-en">len</span>(<span class="pl-s1">my_str</span>) <span class="pl-k">for</span> <span class="pl-s1">i</span> <span class="pl-c1">in</span> <span class="pl-en">range</span>(<span class="pl-s1">length</span> <span class="pl-c1">-</span> <span class="pl-c1">1</span>, <span class="pl-c1">-</span><span class="pl-c1">1</span>, <span class="pl-c1">-</span><span class="pl-c1">1</span>): <span class="pl-k">yield</span> <span class="pl-s1">my_str</span>[<span class="pl-s1">i</span>] <span class="pl-c"># For loop to reverse the string</span> <span class="pl-k">for</span> <span class="pl-s1">char</span> <span class="pl-c1">in</span> <span class="pl-en">rev_str</span>(<span class="pl-s">"hello"</span>): <span class="pl-en">print</span>(<span class="pl-s1">char</span>) |
输出
1 2 3 4 5 |
<span class="pl-s1">o</span> <span class="pl-s1">l</span> <span class="pl-s1">l</span> <span class="pl-s1">e</span> <span class="pl-s1">h</span> |
在此示例中,我们使用range()
函数使用for
循环以相反的顺序获取索引。
注意:此生成器函数不仅适用于字符串,还适用于其他种类的可迭代变量,例如列表,元组等。
Python 生成器表达式
使用生成器表达式可以轻松地动态创建简单的生成器。 它使建造发电机变得容易。
类似于创建匿名函数的 lambda 函数,生成器表达式创建匿名生成器函数。
生成器表达式的语法类似于 Python 中的列表推导式语法。 但是方括号将替换为圆括号。
列表推导式与生成器表达式之间的主要区别在于,列表生成器生成整个列表,而生成器表达式一次生成一个项。
他们懒惰的执行(仅在需要时才生成项目)。 因此,生成器表达式比等效的列表推导式具有更高的内存效率。
1 2 3 4 5 6 7 8 9 10 11 12 |
<span class="pl-c"># Initialize the list</span> <span class="pl-s1">my_list</span> <span class="pl-c1">=</span> [<span class="pl-c1">1</span>, <span class="pl-c1">3</span>, <span class="pl-c1">6</span>, <span class="pl-c1">10</span>] <span class="pl-c"># square each term using list comprehension</span> <span class="pl-s1">list_</span> <span class="pl-c1">=</span> [<span class="pl-s1">x</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-k">for</span> <span class="pl-s1">x</span> <span class="pl-c1">in</span> <span class="pl-s1">my_list</span>] <span class="pl-c"># same thing can be done using a generator expression</span> <span class="pl-c"># generator expressions are surrounded by parenthesis ()</span> <span class="pl-s1">generator</span> <span class="pl-c1">=</span> (<span class="pl-s1">x</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-k">for</span> <span class="pl-s1">x</span> <span class="pl-c1">in</span> <span class="pl-s1">my_list</span>) <span class="pl-en">print</span>(<span class="pl-s1">list_</span>) <span class="pl-en">print</span>(<span class="pl-s1">generator</span>) |
输出:
1 2 |
[<span class="pl-c1">1</span>, <span class="pl-c1">9</span>, <span class="pl-c1">36</span>, <span class="pl-c1">100</span>] <span class="pl-c1"><</span><span class="pl-s1">generator</span> <span class="pl-s1">object</span> <span class="pl-c1"><</span><span class="pl-s1">genexpr</span><span class="pl-c1">></span> <span class="pl-s1">at</span> <span class="pl-c1">0x7f5d4eb4bf50</span><span class="pl-c1">></span> |
上面我们可以看到生成器表达式没有立即产生所需的结果。 相反,它返回一个生成器对象,该对象仅按需生成项目。
这是我们如何开始从生成器获取项目的方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<span class="pl-c"># Initialize the list</span> <span class="pl-s1">my_list</span> <span class="pl-c1">=</span> [<span class="pl-c1">1</span>, <span class="pl-c1">3</span>, <span class="pl-c1">6</span>, <span class="pl-c1">10</span>] <span class="pl-s1">a</span> <span class="pl-c1">=</span> (<span class="pl-s1">x</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-k">for</span> <span class="pl-s1">x</span> <span class="pl-c1">in</span> <span class="pl-s1">my_list</span>) <span class="pl-en">print</span>(<span class="pl-en">next</span>(<span class="pl-s1">a</span>)) <span class="pl-en">print</span>(<span class="pl-en">next</span>(<span class="pl-s1">a</span>)) <span class="pl-en">print</span>(<span class="pl-en">next</span>(<span class="pl-s1">a</span>)) <span class="pl-en">print</span>(<span class="pl-en">next</span>(<span class="pl-s1">a</span>)) <span class="pl-en">next</span>(<span class="pl-s1">a</span>) |
当我们运行上面的程序时,我们得到以下输出:
1 2 3 4 5 6 7 |
<span class="pl-c1">1</span> <span class="pl-c1">9</span> <span class="pl-c1">36</span> <span class="pl-c1">100</span> <span class="pl-v">Traceback</span> (<span class="pl-s1">most</span> <span class="pl-s1">recent</span> <span class="pl-s1">call</span> <span class="pl-s1">last</span>): <span class="pl-v">File</span> <span class="pl-s">"<string>"</span>, <span class="pl-s1">line</span> <span class="pl-c1">15</span>, <span class="pl-s1">in</span> <span class="pl-c1"><</span><span class="pl-s1">module</span><span class="pl-c1">></span> <span class="pl-v">StopIteration</span> |
生成器表达式可以用作函数参数。 以这种方式使用时,可以删除圆括号。
1 2 3 4 5 |
<span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">sum</span>(<span class="pl-s1">x</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-k">for</span> <span class="pl-s1">x</span> <span class="pl-c1">in</span> <span class="pl-s1">my_list</span>) <span class="pl-c1">146</span> <span class="pl-c1">>></span><span class="pl-c1">></span> <span class="pl-en">max</span>(<span class="pl-s1">x</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-k">for</span> <span class="pl-s1">x</span> <span class="pl-c1">in</span> <span class="pl-s1">my_list</span>) <span class="pl-c1">100</span> |
使用 Python 生成器
有几个原因使生成器成为强大的实现。
1.易于实现
与生成器类的生成器相比,生成器可以以简洁明了的方式实现。 下面是一个使用迭代器类实现 2 的幂序列的示例。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<span class="pl-k">class</span> <span class="pl-v">PowTwo</span>: <span class="pl-k">def</span> <span class="pl-en">__init__</span>(<span class="pl-s1">self</span>, <span class="pl-s1">max</span><span class="pl-c1">=</span><span class="pl-c1">0</span>): <span class="pl-s1">self</span>.<span class="pl-s1">n</span> <span class="pl-c1">=</span> <span class="pl-c1">0</span> <span class="pl-s1">self</span>.<span class="pl-s1">max</span> <span class="pl-c1">=</span> <span class="pl-s1">max</span> <span class="pl-k">def</span> <span class="pl-en">__iter__</span>(<span class="pl-s1">self</span>): <span class="pl-k">return</span> <span class="pl-s1">self</span> <span class="pl-k">def</span> <span class="pl-en">__next__</span>(<span class="pl-s1">self</span>): <span class="pl-k">if</span> <span class="pl-s1">self</span>.<span class="pl-s1">n</span> <span class="pl-c1">></span> <span class="pl-s1">self</span>.<span class="pl-s1">max</span>: <span class="pl-k">raise</span> <span class="pl-v">StopIteration</span> <span class="pl-s1">result</span> <span class="pl-c1">=</span> <span class="pl-c1">2</span> <span class="pl-c1">**</span> <span class="pl-s1">self</span>.<span class="pl-s1">n</span> <span class="pl-s1">self</span>.<span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> <span class="pl-k">return</span> <span class="pl-s1">result</span> |
上面的程序冗长而令人困惑。 现在,让我们使用生成器函数执行相同的操作。
1 2 3 4 5 |
<span class="pl-k">def</span> <span class="pl-v">PowTwoGen</span>(<span class="pl-s1">max</span><span class="pl-c1">=</span><span class="pl-c1">0</span>): <span class="pl-s1">n</span> <span class="pl-c1">=</span> <span class="pl-c1">0</span> <span class="pl-k">while</span> <span class="pl-s1">n</span> <span class="pl-c1"><</span> <span class="pl-s1">max</span>: <span class="pl-k">yield</span> <span class="pl-c1">2</span> <span class="pl-c1">**</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">1</span> |
由于生成器自动跟踪细节,因此实现简洁明了。
2.内存有效
一个普通的返回序列的函数会在返回结果之前在内存中创建整个序列。 如果序列中的项目数量很大,这是一个过大的杀伤力。
这种序列的生成器实现是内存友好的,因此是首选的,因为它一次只能生成一项。
3.表示无限流
生成器是代表无限数据流的绝佳媒介。 无限流不能存储在内存中,并且由于生成器一次只生成一项,因此它们可以表示无限数据流。
以下生成器函数可以生成所有偶数(至少在理论上)。
1 2 3 4 5 |
<span class="pl-k">def</span> <span class="pl-en">all_even</span>(): <span class="pl-s1">n</span> <span class="pl-c1">=</span> <span class="pl-c1">0</span> <span class="pl-k">while</span> <span class="pl-c1">True</span>: <span class="pl-k">yield</span> <span class="pl-s1">n</span> <span class="pl-s1">n</span> <span class="pl-c1">+=</span> <span class="pl-c1">2</span> |
4.像流水线一样连接生成器
可以使用多个生成器来流水线化一系列操作。 最好用一个例子来说明。
假设我们有一个生成斐波那契数列中数字的生成器。 我们还有另一个用于平方的生成器。
如果我们想找出斐波那契数列中数字的平方和,可以通过以下方式将生成器函数的输出流水线化来实现。
1 2 3 4 5 6 7 8 9 10 11 |
<span class="pl-k">def</span> <span class="pl-en">fibonacci_numbers</span>(<span class="pl-s1">nums</span>): <span class="pl-s1">x</span>, <span class="pl-s1">y</span> <span class="pl-c1">=</span> <span class="pl-c1">0</span>, <span class="pl-c1">1</span> <span class="pl-k">for</span> <span class="pl-s1">_</span> <span class="pl-c1">in</span> <span class="pl-en">range</span>(<span class="pl-s1">nums</span>): <span class="pl-s1">x</span>, <span class="pl-s1">y</span> <span class="pl-c1">=</span> <span class="pl-s1">y</span>, <span class="pl-s1">x</span><span class="pl-c1">+</span><span class="pl-s1">y</span> <span class="pl-k">yield</span> <span class="pl-s1">x</span> <span class="pl-k">def</span> <span class="pl-en">square</span>(<span class="pl-s1">nums</span>): <span class="pl-k">for</span> <span class="pl-s1">num</span> <span class="pl-c1">in</span> <span class="pl-s1">nums</span>: <span class="pl-k">yield</span> <span class="pl-s1">num</span><span class="pl-c1">**</span><span class="pl-c1">2</span> <span class="pl-en">print</span>(<span class="pl-en">sum</span>(<span class="pl-en">square</span>(<span class="pl-en">fibonacci_numbers</span>(<span class="pl-c1">10</span>)))) |
输出:
1 |
<span class="pl-c1">4895</span> |
这种流水线高效且易于阅读(是的,非常酷!)。