我想转换此数据框:
import pandas as pd
df = pd.DataFrame.from_items([('a', [13,'F','RD',0,0,1,0,1]),
('b', [45,'M','RD',1,1,0,1,0]),
('c', [67,'F','AN',0,0,1,0,1]),
('d', [23,'M','AN',1,0,0,1,1])],
orient='index', columns=['AGE', 'SEX', 'REG', 'A', 'B', 'C', 'D', 'E'])
print df
AGE SEX REG A B C D E
a 13 F RD 0 0 1 0 1
b 45 M RD 1 1 0 1 0
c 67 F AN 0 0 1 0 1
d 23 M AN 1 0 0 1 1
要转换为:
AGE SEX REG PRODUCT PA
a 13 F RD A 0
a 13 F RD B 0
a 13 F RD C 1
a 13 F RD D 0
a 13 F RD E 1
b 45 M RD A 1
b 45 M RD B 1
b 45 M RD C 0
b 45 M RD D 1
b 45 M RD E 0
c 67 F AN A 0
c 67 F AN B 0
c 67 F AN C 1
c 67 F AN D 0
c 67 F AN E 1
d 23 M AN A 1
d 23 M AN B 0
d 23 M AN C 0
d 23 M AN D 1
d 23 M AN E 1
基本上是为每个用户(a,b,c,d)重复每个产品(a,b,c,d,e),并归因于每个用户/产品的值。原始表有千行。
您可以将set_index
与stack
,reset_index
和最后一个rename
列名称使用PRODUCT
:
print (df.set_index(['AGE','SEX','REG'])
.stack()
.reset_index(name='PA')
.rename(columns={'level_3':'PRODUCT'}))
AGE SEX REG PRODUCT PA
0 13 F RD A 0
1 13 F RD B 0
2 13 F RD C 1
3 13 F RD D 0
4 13 F RD E 1
5 45 M RD A 1
6 45 M RD B 1
7 45 M RD C 0
8 45 M RD D 1
9 45 M RD E 0
10 67 F AN A 0
11 67 F AN B 0
12 67 F AN C 1
13 67 F AN D 0
14 67 F AN E 1
15 23 M AN A 1
16 23 M AN B 0
17 23 M AN C 0
18 23 M AN D 1
19 23 M AN E 1
print (df.set_index(['AGE','SEX','REG'], append=True)
.stack()
.reset_index([1,2,3,4], name='PA')
.rename(columns={'level_4':'PRODUCT'}))
AGE SEX REG PRODUCT PA
a 13 F RD A 0
a 13 F RD B 0
a 13 F RD C 1
a 13 F RD D 0
a 13 F RD E 1
b 45 M RD A 1
b 45 M RD B 1
b 45 M RD C 0
b 45 M RD D 1
b 45 M RD E 0
c 67 F AN A 0
c 67 F AN B 0
c 67 F AN C 1
c 67 F AN D 0
c 67 F AN E 1
d 23 M AN A 1
d 23 M AN B 0
d 23 M AN C 0
d 23 M AN D 1
d 23 M AN E 1