根据条件，使用现有列中的值将两个新列追加到 datframe

比如说，我有一个看起来像的数据框。

df.head()
ID  col1    col2    col3    col4    type
1   146       91      Rp    Rp-203  ex
1   146      314      Rp    Rp-203  trans
1   603       91      Rp    Rp-203  CDS
1   910       81      Rp    Rp-203  CDS
1   910       81      Rp    Rp-203  ex
1   202      825      Rp    Rp-203  CDS
1   202      837      Rp    Rp-203  ex
1   200      314      Rp    Rp-203  ex

从上面的数据框，我想生成数据框。数据框基于type列等于ex的条件。除此之外，新数据框还应具有两个附加列，其值以逗号分隔，基于col1和col2。

我想使用col1和col2中的值为col4和type列中的每个值生成两列col5和col6。也就是说，我想按列col3对列中的值(ex(进行分组type。

最后，我的目标是将数据帧作为，

ID  col1    col2    col3    col4    ex_start    ex_end
1   146     314     Rp      Rp-203  091,081,837 910,202,200

我尝试了以下解决方案，

df2 = df.loc[df['type']=='ex', ['col3','col1', 'col2',]].groupby(['col3']).agg(
lambda x: ','.join([str(y) for y in x]))

但是，我的解决方案是从col1的第一个值捕获ex_start第一个值。但是我需要col2值作为df2ex_start列中的第一个值。并将值col1ex_end列的第一个值df2依此类推。而列col1和col2在df2中，应从df列中获取值col1并col2列type是否等于trans。

任何帮助/建议都非常感谢！

使用：

#filter only ex rows by type
df3 = df[df['type']=='ex'].copy()
#shift values per groups from list
df3['s'] = df3.groupby(['ID','col3', 'col4'])['col2'].shift()
#removed NaNs rows per start and convert values to int and strings
df3 = df3.dropna(subset=['s']).assign(ex_start = lambda x: x['s'].astype(int).astype(str),
ex_end = lambda x: x['col1'].astype(str))
print (df3)
ID  col1  col2 col3    col4 type      s ex_start ex_end
4   1   910    81   Rp  Rp-203   ex   91.0       91    910
6   1   202   837   Rp  Rp-203   ex   81.0       81    202
7   1   200   314   Rp  Rp-203   ex  837.0      837    200
#then aggregate join
df4 = df3.groupby(['ID','col3', 'col4'])['ex_start','ex_end'].agg(','.join).reset_index()
print (df4)
ID col3    col4   ex_start       ex_end
0   1   Rp  Rp-203  91,81,837  910,202,200
#filter by trans first rows per groups
df5 = df[df['type']=='trans'].drop_duplicates(['ID','col3', 'col4']).drop('type', 1)
print (df5)
ID  col1  col2 col3    col4
1   1   146   314   Rp  Rp-203
#and add df5
df = df5.merge(df4)
print (df)
ID  col1  col2 col3    col4   ex_start       ex_end
0   1   146   314   Rp  Rp-203  91,81,837  910,202,200

这是我使用 groupby 的方法和一个要处理的函数

def join(group):
ex = group[["col1", "col2"]].copy().values
row = group.iloc[0]
row[["col1", "col2"]] = (ex[0,0], ex[-1,1])
row["ex_start"] = ",".join(ex[1:,0].astype(str))
row["ex_end"] = ",".join(ex[:-1,1].astype(str))
return row
df.groupby("type").apply(join)

要获得只有您可以做的EX行

df.groupby("type").apply(join).loc[["ex"]]

输出

ID  col1    col2    col3    col4    type    ex_start    ex_end
type                                
CDS 1   603 825 Rp  Rp-203  CDS 910,202 91,81
ex  1   146 314 Rp  Rp-203  ex  910,202,200 91,81,837
trans   1   146 314 Rp  Rp-203  trans

相关内容

最新更新

热门标签：