我正在尝试将一些字符串数据转换为列,但由于我没有可以使用的唯一索引或多索引,因此很难利用过去的响应。
示例格式
index location field value
1 location1 firstName A
2 location1 lastName B
3 location1 dob C
4 location1 email D
5 location1 title E
6 location1 address1 F
7 location1 address2 G
8 location1 address3 H
9 location1 firstName I
10 location1 lastName J
11 location1 dob K
12 location1 email L
13 location1 title M
14 location1 address1 N
15 location1 address2 O
16 location1 address3 P
40 location2 firstName Q
41 location2 lastName R
42 location2 dob S
43 location2 email T
44 location2 title U
45 location2 address1 V
46 location2 address2 W
47 location2 address3 X
我想透视的格式:
location firstName lastName dob email title address1 address2 address3
location1 A B C D E F G H
location1 I J K L M N O P
location2 Q R S T U V W X
我最接近实现这一目标的是使用 aggfuc='first',但这我需要每个位置的所有值,而不仅仅是第一个。
我想透视的格式:
df = df.pivot_table(index='location',columns='field',values='value',aggfunc='first')
您需要
使用代理项列进行透视。这是使用 cumsum
+ set_index
+ unstack
的解决方案。
v = df.set_index(['location', 'field', df.field.eq('firstName').cumsum()]).unstack(-2)
v.index = v.index.droplevel(-1)
v.columns = v.columns.droplevel(0)
field address1 address2 address3 dob email firstName
location
location1 F G H C D A
location1 N O P K L I
location2 V W X S T Q
field lastName title
location
location1 B E
location1 J M
location2 R U