摘要

我使用了一个半复杂的正则表达式从网站检索数据。我遇到的问题是我必须对匹配的数据集进行一些后处理。

我已经让数据处理达到了我想要的 95+%，但是，我收到这个我无法推理的简单错误消息; 这很奇怪。

我可以绕过它，但这不是重点。 我试图弄清楚这是一个错误还是我在元组解包时在精神上忽略了什么

背景信息

我必须克服的一件事是，每场"真正的比赛"我都会得到 4 场比赛。这意味着我的 1 个项目的数据分布在 4 个匹配项中。

在简单的图形形式(略微过于简化(中：

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
1: | ( ), ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( )
2: | (█), (█), (█), (█), ( ), ( ), ( ), ( ), ( ), ( )
3: | ( ), ( ), ( ), ( ), (█), ( ), ( ), ( ), ( ), ( )
4: | ( ), ( ), ( ), ( ), ( ), ( ), (█), (█), (█), (█)
5: | ( ), ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( )
6: | (▒), (▒), (▒), (▒), ( ), ( ), ( ), ( ), ( ), ( )
7: | ( ), ( ), ( ), ( ), (▒), ( ), ( ), ( ), ( ), ( )
8: | ( ), ( ), ( ), ( ), ( ), ( ), (▒), (▒), (▒), (▒)
9: | ...
...
615: | ...

我可以得到所有数据，但我想压缩它，就像这样......

index |  a    b    c    d    e    f    g    h    i    j 
--------------------------------------------------------
1: | (█), (█), (█), (█), (█), (█), (█), (█), (█), (█)
2: | (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒), (▒)
3: | ...
...
154: | ...

法典

工程

记下abcd、e、f和ghij的变量，以及我如何在底部的for-loop中解压缩它们

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
abcd = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
ghij = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
]
abcdefghij = zip(abcd, e, f, ghij)
for (a, b, c, d), e, f, (g, h, i, j) in abcdefghij:
print("a", a, "nb", b, "nc", c, "nd", d, "ne", e, "nf", f, "ng", g, "nh", h, "ni", i, "nj", j, "n", "-" * 100)
#

失败

请注意，我正在尝试立即解压缩相同的元组，其中包含变量a、b、c、d、e、f、g、h、i和j

matches = [('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Beta 1', '3.6', 'Beta', '1', '', '', '', '', '', ''), ('', '', '', '', 'October 10, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.13/android-studio-ide-192.5916306-linux.tar.gz', '3.6.0', '13', '192'), ('', '', '', '', '', 'stable', '', '', '', ''), ('Android Studio 3.5.1', '3.5.1', '', '', '', '', '', '', '', ''), ('', '', '', '', 'October 2, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.5.1.0/android-studio-ide-191.5900203-linux.tar.gz', '3.5.1', '0', '191'), ('', '', '', '', '', '', '', '', '', ''), ('Android Studio 3.6 Canary 12', '3.6', 'Canary', '12', '', '', '', '', '', ''), ('', '', '', '', 'September 18, 2019', '', '', '', '', ''), ('', '', '', '', '', '', 'https://dl.google.com/dl/android/studio/ide-zips/3.6.0.12/android-studio-ide-192.5871855-linux.tar.gz', '3.6.0', '12', '192')]
f = [
f
if f == "stable" else "preview"
for index, (_, _, _, _, _, f, *_)
in enumerate(matches)
if index % 4 == 0
]
a, b, c, d = [
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
]
e = [
e
for index, (_, _, _, _, e, *_)
in enumerate(matches)
if index % 4 == 2
]
g, h, i, j = [
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3]
abcdefghij = zip(a, b, c, d, e, f, g, h, i, j)
for a, b, c, d, e, f, g, h, i, j in abcdefghij:
print("a", a, "nb", b, "nc", c, "nd", d, "ne", e, "nf", f, "ng", g, "nh", h, "ni", i, "nj", j, "n", "-" * 100)
#

使用此代码，我收到以下错误消息...

... a, b, c, d = [(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]`
ValueError: too many values to unpack (expected 4)`

期望

我本来希望这两种方法执行完全相同的逻辑，最终结果应该完全相同。

他们不是！为什么？

@PaulPanzer 这似乎有效。我将不得不验证所有内容是否正确排列。但是我为什么需要它呢？

假设q是一个可迭代对象，您的理解会生成一个包含 26 个元组的列表，每个元组有 4 个项目。

z = [(a,b,c,d) for i, (a,b,c,d,*e) in enumerate(q)]

In [6]: len(z)
Out[6]: 26
In [7]: len(z[0])
Out[7]: 4
In [17]: z[:3]
Out[17]: [('a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b'), ('c', 'c', 'c', 'c')]

当您尝试解压缩时，您正在尝试将 26 个项目填充到四个名称/变量中

In [8]: a,b,c,d = z
Traceback (most recent call last):
File "<ipython-input-8-64277b78f273>", line 1, in <module>
a,b,c,d = z
ValueError: too many values to unpack (expected 4)

zip(*list_of_4_item_tuples)会将list_of_4_item_tuples转换为 4 个元组，每个元组 26 个项目

In [9]: 
In [9]: a,b,c,d = zip(*z)    # z is the result of the list comprehension shown above
In [11]: len(a),len(b),len(c),len(d)
Out[11]: (26, 26, 26, 26)

测试内容

import string
a = string.ascii_lowercase
b = string.ascii_lowercase
c = string.ascii_lowercase
d = string.ascii_lowercase
e = string.ascii_lowercase
f = string.ascii_lowercase
q = zip (a,b,c,d,e,f)

您的列表[(a, b, c, d) for index, (a, b, c, d, *_) in enumerate(matches) if index % 4 == 1]没有 4 个元素，这意味着尝试仅使用四个变量解压缩它会失败。

解决方案

当列表推导创建元组列表，并且您想要解压缩这些元组时，您需要执行以下操作zip(*...)

x, y, z = zip(*list_comprehension)
# To be more clear
x, y, z = zip(*[(i, j, k) for (i, j, k) in tuple_list])

# For my code, this change must be made this code
a, b, c, d = zip(*[
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
if index % 4 == 1
])
...
# And this code
g, h, i, j = zip(*[
(g, h, i, j)
for index, (*_, g, h, i, j)
in enumerate(matches)
if index % 4 == 3
])

为什么

让我们看一下下面的代码。

matches = [
("a1", "b1", "c1", "d1", "e1"),
("a2", "b2", "c2", "d2", "e2"),
("a3", "b3", "c3", "d3", "e3"),
("a4", "b4", "c4", "d4", "e4"),
("a5", "b5", "c5", "d5", "e5")
]
# I want a tuple of a's, b's, and c's
abc = [
(a, b, c)
for (a, b, c, *_)  # Ignore elements `d` and `e`
in matches
]
print("abc =", abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# NOTE: This is a list of tuples of ones, twos, threes, fours, and fives
#       Not a's, b's, and c's!!
# I want a list of e's
e = [
e
for (*_, e) 
in matches
]
print("e =", e)
# e = ['e1', 'e2', 'e3', 'e4', 'e5']
# NOTE: This is a list of e's

事实上，有了abc，我得到了一个列表，包括一个，两个，三个，四个和五个，而不是a，b和c。

深入探讨

错误消息ValueError: too many values to unpack的原因是元组中要解压缩的元组太多或太少。

请记住，您有一个包含 one、two、three、four 和 5(每个元组 5 个元素(的列表，而不是 a、b 和 c(每个元组 3 个元素(

所以这总是会失败

a, b, c = [
(a, b, c)
for (a, b, c, *_) 
in matches
]
# ERROR
#    Traceback (most recent call last):
#      File "...*.py", line 11, in <module>
#        for (a, b, c, *_) in matches
#    ValueError: too many values to unpack (expected 3)

您正在尝试将这些值[('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]放入 3 个元组中。你不能！您需要列表理解内外的 5 个元组

但这会成功。这将是错误的。但它不会导致错误。

# This will assign 5 variables with the tuples (a, b, c) from the original tuples (a, b, c, d, e)
ones, twos, threes, fours, fives = [
(a, b, c)
for (a, b, c, *_) in matches
]
print("ones =", ones)
print("twos =", twos)
print("threes =", threes)
print("fours =", fours)
print("fives =", fives)
# Output
# ones = ('a1', 'b1', 'c1')
# twos = ('a2', 'b2', 'c2')
# threes = ('a3', 'b3', 'c3')
# fours = ('a4', 'b4', 'c4')
# fives = ('a5', 'b5', 'c5')

我们想要类似('a1', 'a2', 'a3', 'a4', 'a5')的东西，而不是('a1', 'b1', 'c1')

如果元组的大小为 20，那么您将需要...sixs, sevens, .... , nineteens, twenties = [ ... ]

第一次尝试

好吧，我们希望每个元组中的所有第一个元素都放在一起。第 2 和第 3 个相同。所以zip(...)似乎是一个很好的候选人。让我们看看结果。

result = list(zip(abc))
print(result)
# list(zip(abc)) = [(('a1', 'b1', 'c1'),), (('a2', 'b2', 'c2'),), (('a3', 'b3', 'c3'),), (('a4', 'b4', 'c4'),), (('a5', 'b5', 'c5'),)]
# Let's look at what one element looks like
print(result[0])
# result[0] = (('a1', 'b1', 'c1'),)

这是错误的！

如您所见，有几件事。

奇怪的元组结构！元组内的元组。当您zip元组列表时。这就是结果。
每个元组中的元素错误！我们得到了一个ones列表，而不是一个a列表

第二次尝试

好吧，zip不适用于元组列表(按原样(。我们必须先对元组列表做点什么

让我们看看这个...

abc = [(a, b, c) for (a, b, c, *_) in matches]
print(abc)
# abc = [('a1', 'b1', 'c1'), ('a2', 'b2', 'c2'), ('a3', 'b3', 'c3'), ('a4', 'b4', 'c4'), ('a5', 'b5', 'c5')]
# Again, we cannot zip these
print(*abc)
# *abc = ('a1', 'b1', 'c1') ('a2', 'b2', 'c2') ('a3', 'b3', 'c3') ('a4', 'b4', 'c4') ('a5', 'b5', 'c5')
# Wait, here we have a sequence of tuples. Not a list of tuples. Just tuple after tuple after tuple.
# What happens when we zip this "sequence" of tuples?
print(list(zip(*abc)))
# list(zip(*abc)) = [('a1', 'a2', 'a3', 'a4', 'a5'), ('b1', 'b2', 'b3', 'b4', 'b5'), ('c1', 'c2', 'c3', 'c4', 'c5')]
# Great, so let's try this
a, b, c = zip(*abc)

这就是我们想要的！！

因此

因为我们可以执行以下操作。

a, b, c, d = zip(*abcd)
print("a =", a)
print("b =", b)
print("c =", c)
# Output
# a = ('a1', 'a2', 'a3', 'a4', 'a5')
# b = ('b1', 'b2', 'b3', 'b4', 'b5')
# c = ('c1', 'c2', 'c3', 'c4', 'c5')

这意味着我们可以做到这一点...

a, b, c, d = zip(*[
(a, b, c, d)
for index, (a, b, c, d, *_)
in enumerate(matches)
])

使用列表理解的元组解包失败，但适用于 for 循环

摘要

背景信息

法典

工程

失败

期望

解决方案

为什么

深入探讨

第一次尝试

第二次尝试

因此

相关内容

最新更新

热门标签：