我正在尝试循环访问两个python数据帧列以确定特定值，然后将结果添加到新列中。下面的代码抛出以下错误：

raise ValueError('Length of values does not match length of ' 'index')"

我不知道为什么？

数据帧：

TeamID    todayorno
1   sw        True
2   pr        False
3   sw        False
4   pr        True

法典：

team = []
for row in results['TeamID']:   
if row == "sw":
for r in results['todayorno']:
if r == True:
team.append('red')
else:
team.append('green')
else:
team.append('green')
results['newnew'] = team

您正在迭代数据帧两次，这由您有 2 个for循环的事实来表示。您最终得到的结果是 10 个项目，而不是所需的 4 个项目。

不需要显式迭代。您可以使用numpy.select为指定条件应用值。

import numpy as np
mask = results['TeamID'] == 'sw'
conditions = [~mask, mask & results['todayorno'], mask & ~results['todayorno']]
values = ['green', 'red', 'green']
results['newnew'] = np.select(conditions, values, 'green')
print(results)
TeamID  todayorno newnew
1     sw       True    red
2     pr      False  green
3     sw      False  green
4     pr       True  green

快速回答

不要试图循环。

相反，使用默认值(即最常见的(创建新列，然后处理要更改的值并设置它们：

>>> results
TeamID  todayorno
0     sw       True
1     pr      False
2     sw      False
3     pr       True
>>> results['newnew'] = 'green'
>>> results
TeamID  todayorno newnew
0     sw       True  green
1     pr      False  green
2     sw      False  green
3     pr       True  green
>>> results.loc[(results['TeamID'] == 'sw') & (results['todayorno']), 'newnew'] = 'red'
>>> results
TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

或者，您可以使用.apply(..., index=1)来计算整个序列，该函数查看每一行，并立即将整个序列分配为一列：

>>> results
TeamID  todayorno
0     sw       True
1     pr      False
2     sw      False
3     pr       True
>>> results['newnew'] = results.apply(
...     lambda s: 'red' if s['TeamID'] == 'sw' and s['todayorno'] else 'green',
...     axis=1,
... )
>>> results
TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

解释

问题所在

据我从您的代码中可以看出，您正在尝试向数据帧添加一个名为newnew的列。

在数据帧的行中，TeamID列包含值"sw"，列todayorno包含值True，您希望列newnew包含值"red"。

在所有其他行中，您希望newnew的值"green"。

一条规则

为了有效地与熊猫合作，一个非常重要的规则是：不要试图循环。尤其是通过行。

相反，让熊猫为你做这项工作。

因此，第一步是创建新列。而且由于在大多数情况下您希望值"green"，您可以简单地执行以下操作：

results['newnew'] = 'green'

现在，数据帧如下所示：

TeamID  todayorno newnew
0     sw       True  green
1     pr      False  green
2     sw      False  green
3     pr       True  green

您会注意到 pandas "扩展"了通过所有行提供的单个值。

现在要获取sw/True行"red"，首先您需要找到它们。为此，我们需要了解熊猫寻址的工作原理。

(一点点(熊猫寻址的工作原理

在 pandas 数据帧后使用方括号时，通常对数据帧的列进行寻址。前任：

>>> results['TeamID']
0    sw
1    pr
2    sw
3    pr
Name: TeamID, dtype: object

即，通过请求results数据帧的TeamID索引，您返回了一个名为TeamID的Series，该仅包含该列的值。

另一方面，如果要对行进行寻址，则需要使用.loc属性。

>>> results.loc[1]
TeamID          pr
todayorno    False
newnew       green
Name: 1, dtype: object

在这里，我们返回了一个包含行值的Series。

如果我们想看到多行，我们可以通过索引行列表来获取子数据帧：

>>> results.loc[[1,2]]
TeamID  todayorno newnew
1     pr      False  green
2     sw      False  green

或者通过使用条件：

>>> results.loc[results['TeamID'] == 'pr']
TeamID  todayorno newnew
1     pr      False  green
3     pr       True  green

条件可以包含布尔组合，但语法有特殊要求，例如使用&而不是and，并且由于&运算符的优先级，用括号小心地将条件的各个部分括起来：

>>> results.loc[(results['TeamID'] == 'sw') & (results['todayorno'])]
TeamID  todayorno newnew
1     sw       True  green

.loc属性还可以按行和列进行寻址。逗号分隔寻址部分，其中行的寻址在前，列在最后：

>>> results.loc[results['TeamID'] == 'pr', 'todayorno']
1    False
3     True
Name: todayorno, dtype: bool

最后的润色

.loc属性也可以用于赋值，方法是将所需的值分配给所需的"坐标"。

所以在你的情况下：

>>> results.loc[
...     (results['TeamID'] == 'sw') & (results['todayorno']),
...     'newnew'
... ] = "red"
>>> results
TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

另一种解决方案

数据帧的.apply()方法允许按列或按行多次应用单个函数。若要按行应用，请传递axis=1参数。

如果传递给.apply(..., axis=1)的函数的结果返回单个值，则该函数的每个应用的结果将组合在一个系列中，该系列具有数据帧行的相同寻址(用熊猫的说法是相同的索引(。

所以：

>>> results.apply(
...     lambda s: 'red' if s['TeamID'] == 'sw' and s['todayorno'] else 'green',
...     axis=1,
... )
0      red
1    green
2    green
3    green
dtype: object

然后可以将其分配为数据帧的一列：

>>> results['newnew'] = results.apply(
...     lambda s: 'red' if s['TeamID'] == 'sw' and s['todayorno'] else 'green',
...     axis=1,
... )
>>> results
TeamID  todayorno newnew
0     sw       True    red
1     pr      False  green
2     sw      False  green
3     pr       True  green

如何使用 python2.7 使用嵌套 for 循环循环访问数据帧并附加到新的数据帧列?

快速回答

解释

问题所在

一条规则

(一点点(熊猫寻址的工作原理

最后的润色

另一种解决方案

相关内容

最新更新

热门标签：