使用IF-Python/Pandas创建新行



如果if条件(例如,单元格包含值(为true,我会尝试"拆分/复制"一行。。。

例如,我有一张表:

d = {'Invited_guest': ["Max", "Luca", "John", "Biran", "Ian"], 'Age': [19, 21, 32, 45, 34], 'Origin' : ['US', 'UK', 'GER', 'ITA', 'FRA'],'FamilyMember_1': ["Paul", "Anna", "Peter", "Lewis", "Jeremy"], 'FamilyMember_2': ['Rene', 'Ruben', 'Calvin', 'George', 'Silke'], 'FamilyMember_3': ['', 'Olivia', '', '', 'Selina']}
df = pd.DataFrame(data=d)
df
年龄>>保罗雷内21英国安娜鲁本奥利维亚CalvinITALewis乔治FRAJeremySilkeSelina
索引Invited_guest来源FamilyMember_1FamilyMembel_2FamilyMember_3
0最大值19美国
1卢卡
2John32GERPeter
3Brian45
伊恩34

您可以使用meltgroupbyagg(list)的组合将每行FamilyMember名称转换为列表(并使用pipe删除空名称(,然后assign将结果返回到数据帧,explode该列:

exploded = df.assign(names=df.filter(like='FamilyMember_').T.melt().pipe(lambda x: x[x['value'] != '']).groupby('variable')['value'].agg(list)).explode('names').drop(df.filter(like='FamilyMember_'), axis=1).reset_index(drop=True)

输出:

>>> exploded
Invited_guest  Age Origin   names
0            Max   19     US    Paul
1            Max   19     US    Rene
2           Luca   21     UK    Anna
3           Luca   21     UK   Ruben
4           Luca   21     UK  Olivia
5           John   32    GER   Peter
6           John   32    GER  Calvin
7          Brian   45    ITA   Lewis
8          Brian   45    ITA  George
9            Ian   34    FRA  Jeremy
10           Ian   34    FRA   Silke
11           Ian   34    FRA  Selina

解释

首先,我们选择以FamilyMember_:开头的列

>>> family_members = df.filter(like='FamilyMember_')
>>> family_members
FamilyMember_1 FamilyMember_2 FamilyMember_3
0           Paul           Rene               
1           Anna          Ruben         Olivia
2          Peter         Calvin               
3          Lewis         George               
4         Jeremy          Silke         Selina

接下来,我们将其旋转90度(也称为转座(,以便稍后与melt:一起使用

>>> family_members.T
0       1       2       3       4
FamilyMember_1  Paul    Anna   Peter   Lewis  Jeremy
FamilyMember_2  Rene   Ruben  Calvin  George   Silke
FamilyMember_3        Olivia                  Selina

然后,我们将melt它:

>>> family_members.T.melt()
variable   value
0          0    Paul
1          0    Rene
2          0        
3          1    Anna
4          1   Ruben
5          1  Olivia
6          2   Peter
7          2  Calvin
8          2        
9          3   Lewis
10         3  George
11         3        
12         4  Jeremy
13         4   Silke
14         4  Selina

现在我们需要删除空项目。我们可以这样做:

x = family_members.T.melt()
x = x[x['values'] != '']

但这是多行,一行不行。因此,我们可以将pipe与lambda函数一起使用,以一种线性方式实现这一点:

>>> family_members.T.melt().pipe(lambda x: x[x['value'] != ''])
variable   value
0          0    Paul
1          0    Rene
3          1    Anna
4          1   Ruben
5          1  Olivia
6          2   Peter
7          2  Calvin
9          3   Lewis
10         3  George
12         4  Jeremy
13         4   Silke
14         4  Selina

然后,我们可以按variable列进行分组,因为它将需要组合在一起的名称完美地组合在一起:

>>> g = family_members.T.melt().pipe(lambda x: x[x['value'] != '']).groupby('variable')
>>> g
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x12b131e50>
# That's not very useful, so we can convert it to a list to have a peek at what's inside:
>>> list(g)
[(0,
variable value
0         0  Paul
1         0  Rene),
(1,
variable   value
3         1    Anna
4         1   Ruben
5         1  Olivia),
(2,
variable   value
6         2   Peter
7         2  Calvin),
(3,
variable   value
9          3   Lewis
10         3  George),
(4,
variable   value
12         4  Jeremy
13         4   Silke
14         4  Selina)]

我们需要将每个组转换为包含在其中的名称列表。这就是agg(list)的作用:

>>> g['value'].agg(list)
variable
0               [Paul, Rene]
1      [Anna, Ruben, Olivia]
2            [Peter, Calvin]
3            [Lewis, George]
4    [Jeremy, Silke, Selina]
Name: value, dtype: object

完美。现在我们需要将该列放回数据帧中。我们可以像往常一样分配:

g['names'] = g['value'].agg(list)

但同样,这将使一句话变得不可能。幸运的是,有assign函数,它是为这个用例构建的:

>>> df.assign(names=g['value'].agg(list))
Invited_guest  Age Origin FamilyMember_1 FamilyMember_2 FamilyMember_3                    names
0           Max   19     US           Paul           Rene                            [Paul, Rene]
1          Luca   21     UK           Anna          Ruben         Olivia    [Anna, Ruben, Olivia]
2          John   32    GER          Peter         Calvin                         [Peter, Calvin]
3         Biran   45    ITA          Lewis         George                         [Lewis, George]
4           Ian   34    FRA         Jeremy          Silke         Selina  [Jeremy, Silke, Selina]

(请注意,assign未就位。它修改数据帧的新副本,而不是原始副本(。

最后,我们使用神奇的explode(仅适用于Panda 0.25及更新版本(:

>>> df.assign(names=g['value'].agg(list)).explode('names')
Invited_guest  Age Origin FamilyMember_1 FamilyMember_2 FamilyMember_3   names
0           Max   19     US           Paul           Rene                   Paul
0           Max   19     US           Paul           Rene                   Rene
1          Luca   21     UK           Anna          Ruben         Olivia    Anna
1          Luca   21     UK           Anna          Ruben         Olivia   Ruben
1          Luca   21     UK           Anna          Ruben         Olivia  Olivia
2          John   32    GER          Peter         Calvin                  Peter
2          John   32    GER          Peter         Calvin                 Calvin
3         Biran   45    ITA          Lewis         George                  Lewis
3         Biran   45    ITA          Lewis         George                 George
4           Ian   34    FRA         Jeremy          Silke         Selina  Jeremy
4           Ian   34    FRA         Jeremy          Silke         Selina   Silke
4           Ian   34    FRA         Jeremy          Silke         Selina  Selina

当然,删除仅有的FamilyMember_*列:

>>> family_member_columns = df.filter(like='FamilyMember_').columns
>>> family_member_columns
Index(['FamilyMember_1', 'FamilyMember_2', 'FamilyMember_3'], dtype='object')
>>> df.assign(names=g['value'].agg(list)).explode('names').drop(family_member_columns, axis=1)

首先,我们得到所有以FamilyMember_开头的列
然后,我们可以使用pandas.melt来获得预期的结果
要获得干净的输出,我们可以删除melt创建的输出variable,然后删除NaN值,因为有些Invited_guest没有FamilyMember_3,我们按Invited_guest对值进行排序,并重置索引以获得干净有序的最终DataFrame:

>>> keys = [c for c in df if c.startswith('FamilyMember_')]
>>> pd.melt(df, id_vars=['Invited_guest', 'Age', 'Origin'], value_vars=keys, value_name='key').drop('variable', axis=1).dropna().sort_values('Invited_guest').reset_index(drop=True)
Invited_guest   Age     Origin  key
0   Brian           45      ITA     Lewis
1   Brian           45      ITA     George
2   Ian             34      FRA     Jeremy
3   Ian             34      FRA     Silke
4   Ian             34      FRA     Selina
5   John            32      GER     Peter
6   John            32      GER     Calvin
7   Luca            21      UK      Anna
8   Luca            21      UK      Ruben
9   Luca            21      UK      Olivia
10  Max             19      US      Paul
11  Max             19      US      Rene

最新更新