Pandas:生成一个数据帧列，该列的值取决于数据帧的另一列

我正在尝试生成一个pandas数据帧，其中一列的数值将基于另一个数据帧中一列的值。以下是一个示例：我想基于数据帧df_的一列生成另一个数据帧

ipdb> df_ = pd.DataFrame({'c1':[False, True, False, True]})
ipdb> df_
c1
0  False
1   True
2  False
3   True

使用df_生成具有如下列的另一个数据帧df1。

ipdb> df1
col1  col2
0     0   NaN
1     1   0
2     2   NaN
3     3   1

这里，"col1"具有正常的索引值，"c1"在df_中有False的行中具有NaN，并且在"c1"为True的行中顺序递增值。

为了生成这个数据帧，下面是我尝试过的。

ipdb> df_[df_['c1']==True].reset_index().reset_index()
level_0  index    c1
0        0      1  True
1        1      3  True

不过，我觉得应该有一种更好的方法来生成具有两列的数据帧，如df1中所示。

我认为您需要cumsum，并从0:中减去1开始计数

df_ = pd.DataFrame({'c1':[False, True, False, True]})
df_['col2'] = df_.loc[df_['c1'], 'c1'].cumsum().sub(1)
print (df_)
c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

另一种解决方案是通过sum和numpy.arange来计数True值的出现次数，并将其分配回已过滤的DataFrame:

df_.loc[df_['c1'],'col2']= np.arange(df_['c1'].sum())
print (df_)
c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

详细信息：

print (df_['c1'].sum())
2
print (np.arange(df_['c1'].sum()))
[0 1]

解决此问题的另一种方法，

df.loc[df['c1'],'col2']=range(len(df[df['c1']]))

输出：

c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

相关内容

最新更新

热门标签：