在Python数据框架中将多列转换/取消透视为行



我有一个数据集,需要将其分解为多行。

例如:

id  cor_id1 mail11 mail12 mail13 cor_id2 mail21 mail22 mail23 cor_id3 mail31 mail32 mail33
1    1      a@123  b@234  c@123    2     a@def  b@fgh   c@asd   3      s@wer  b@ert  e@rty
2    4      e@234  e@234  e@qwe    9     e@dfe  f@jfg   r@ert   10     e@wer  g@wer  e@ert

我需要将它们取消为

id cor_id mail
1   1     a@123
1   1     b@234
1   1     c@123
1   2     a@def
1   2     b@fgh
1   2     c@asd
1   3     s@wer
1   3     b@ert
1   3     e@rty
2   4     e@234
2   4     e@234
2   4     e@qwe
2   9     e@dfe
2   9     r@ert
2   10    e@wer
2   10    g@wer
2   10    e@ert

我试过df.melt,但那只给了一列。

如果数据有多个列要转换为行,该怎么办。

id  cor_id1 ad1 mail11 mail12 mail13 cor_id2 ad2 mail21 mail22 mail23 cor_id3 ad3 mail31 mail32 mail33
1    1     23    a@123  b@234  c@123        2   24  a@def  b@fgh   c@asd      3   25   s@wer  b@ert  e@rty
2    4     33    e@234  e@234  e@qwe        9   34 e@dfe  f@jfg   r@ert      10  35    e@wer  g@wer  e@ert

我想要

id cor_id  ad  mail
1   1      23  a@123
1   1      23 b@234
1   1      23 c@123
1   2      24 a@def
1   2      24  b@fgh
1   2      24 c@asd
1   3      25 s@wer
1   3      25 b@ert
1   3      25 e@rty
2   4      33 e@234
2   4      33 e@234
2   4      33 e@qwe
2   9      34 e@dfe
2   9      34 f@jfg
2   9      34 r@ert
2   10     35 e@wer
2   10     35 g@wer
2   10     35 e@ert

使用wide_to_long,但首先必须更改cor_id列的列名,并添加最后一位数字:

df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

另一种选择是添加0并使用dropna:删除缺失的行

df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()

print (df)
id  cor_id   mail
0    1     1.0  a@123
1    1     1.0  b@234
2    1     1.0  c@123
3    1     1.0  a@def
4    1     2.0  b@fgh
5    1     2.0  s@wer
6    1     2.0  b@ert
7    1     3.0  e@rty
8    1     3.0  c@asd
9    2     4.0  e@234
10   2     4.0  e@234
11   2     4.0  e@qwe
12   2     4.0  e@dfe
13   2     9.0  f@jfg
14   2     9.0  e@wer
15   2     9.0  g@wer
16   2    10.0  e@ert
17   2    10.0  r@ert

编辑:如果有多个像cor_id这样的列,只需将其添加到元组中由startswith进行测试,然后用ffill:更改list对所有列的正向填充

df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
id  cor_id    ad   mail
0    1     1.0  23.0  a@123
1    1     1.0  23.0  b@234
2    1     1.0  23.0  c@123
3    1     2.0  24.0  a@def
4    1     2.0  24.0  b@fgh
5    1     2.0  24.0  c@asd
6    1     3.0  25.0  s@wer
7    1     3.0  25.0  b@ert
8    1     3.0  25.0  e@rty
9    2     4.0  33.0  e@234
10   2     4.0  33.0  e@234
11   2     4.0  33.0  e@qwe
12   2     9.0  34.0  e@dfe
13   2     9.0  34.0  f@jfg
14   2     9.0  34.0  r@ert
15   2    10.0  35.0  e@wer
16   2    10.0  35.0  g@wer
17   2    10.0  35.0  e@ert

您也可以使用pd.melt((,例如:

melted = pd.melt(df, id_vars = ['id','cor_id'], value_vars = ['mail11','mail12',...], var_name = 'Attribute', value_name = 'Value')
melted.head()

最新更新