有条件地将大于0的值设置为1

我有一个数据帧，它看起来像这样，有更多的日期列

AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         7.0              0.0
1            Catherine         0.0         13.0             17.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         3.0              0.0
4            Christine         0.0         0.0              0.0

当值大于0时，我想将AUTHOR之后的每列中的值设置为1，因此生成的表如下所示：

AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         1.0              0.0
1            Catherine         0.0         1.0              1.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         1.0              0.0
4            Christine         0.0         0.0              0.0

我尝试了下面的代码行，但出现了一个错误，这是有道理的。因为我需要弄清楚如何将此代码仅应用于日期列，同时在表中保留AUTHOR列。

Counts[Counts != 0] = 1

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

您可以先选择日期列，然后在这些列上屏蔽

cols = df.drop(columns='AUTHOR').columns
# or
cols = df.filter(regex='d{4}-d{2}-d{2}').columns
# or
cols = df.select_dtypes(include='number').columns
df[cols] = df[cols].mask(df[cols] != 0, 1)

print(df)
AUTHOR  2022-07-01  2022-10-14  2022-10-15
0     Kathrine         0.0         1.0         0.0
1    Catherine         0.0         1.0         1.0
2  Amanda Jane         0.0         0.0         0.0
3    Jaqueline         0.0         1.0         0.0
4    Christine         0.0         0.0         0.0

由于只想排除第一列，因此可以先将其设置为索引，然后创建布尔值。最后，您将重置索引。

df.set_index('AUTHOR').pipe(lambda g: g.mask(g > 0, 1)).reset_index()
df
AUTHOR  2022-10-14  2022-10-15
0  Kathrine         0.0         1.0
1  Cathrine         1.0         1.0

相关内容

最新更新

热门标签：