旋转熊猫数据帧,没有数值类型,索引不是唯一的



我正在尝试将一些字符串数据转换为列,但由于我没有可以使用的唯一索引或多索引,因此很难利用过去的响应。

示例格式

  index	location	field	        value
1	location1	firstName	A
2	location1	lastName	B
3	location1	dob	        C 
4	location1	email	        D
5	location1	title	        E
6	location1	address1	F
7	location1	address2	G
8	location1	address3	H
9	location1	firstName	I
10	location1	lastName	J
11	location1	dob	        K
12	location1	email	        L
13	location1	title	        M
14	location1	address1	N
15	location1	address2	O
16	location1	address3	P
40	location2	firstName	Q
41	location2	lastName	R
42	location2	dob	        S
43	location2	email	        T
44	location2	title	        U
45	location2	address1	V
46	location2	address2	W
47	location2	address3	X

我想透视的格式:

location	firstName lastName dob email title address1	address2 address3
location1	A	B	C	D	E	F	G	H
location1	I	J	K	L	M	N	O	P
location2	Q	R	S	T	U	V	W	X

我最接近实现这一目标的是使用 aggfuc='first',但这我需要每个位置的所有值,而不仅仅是第一个。

我想透视的格式:

df = df.pivot_table(index='location',columns='field',values='value',aggfunc='first')

您需要

使用代理项列进行透视。这是使用 cumsum + set_index + unstack 的解决方案。

v = df.set_index(['location', 'field', df.field.eq('firstName').cumsum()]).unstack(-2) 
v.index = v.index.droplevel(-1)
v.columns = v.columns.droplevel(0)
field     address1 address2 address3         dob      email firstName  
location                                                                
location1        F        G        H          C           D         A   
location1        N        O        P           K          L         I   
location2        V        W        X           S          T         Q   
field     lastName      title  
location                       
location1        B          E  
location1        J          M  
location2        R          U  

最新更新