假设一个数据框是这样的
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
添加新的三列如下
df["three"] = df["one"] * df["two"]
结果
one two three
a 1.0 1.0 1.0
b 2.0 2.0 4.0
c 3.0 3.0 9.0
d NaN 4.0 NaN
包含重复列表或列表的列值如何,我需要创建一个新列并添加值最高的数字
例子one two
a 1.0 [12,1]
[12,1]
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
所以我想这样
one two flag
a 1.0 [12,1] 12
[12,1]
b 2.0 [200,400] 400
c 3.0 3.0 3.0
d NaN 4.0 4.0
感谢如果有列表或嵌套列表或浮动,您可以使用max
:
df = pd.DataFrame({"two": [[[12,1],[12,1]] ,[200,400] ,3.0,4.0 ]})
from typing import Iterable
#https://stackoverflow.com/a/40857703/2901002
def flatten(items):
"""Yield items from any nested iterable; see Reference."""
for x in items:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
for sub_x in flatten(x):
yield sub_x
else:
yield x
df['new'] = [max(flatten(x)) if isinstance(x, list) else x for x in df['two']]
print (df)
two new
0 [[12, 1], [12, 1]] 12.0
1 [200, 400] 400.0
2 3.0 3.0
3 4.0 4.0
编辑:对于new DataFrame中所有列的max使用聚合函数max
:
df = df_orig.pivot_table(index=['keyword_name','volume'],
columns='asin',
values='rank',
aggfunc=list)
df1 = df_orig.pivot_table(index=['keyword_name','volume'],
columns='asin',
values='rank',
aggfunc='max')
out = pd.concat([df, df1.add_suffix('_max')], axis=1)