将3列excel with K:V转置到列Pandas中



我有一个3列的excel文件,我读取到熊猫基本上有k:v对列

stuff     Unnamed: 1            Unnamed: 2
0                    animal:dog            NaN                   NaN
1                   color:brown            NaN                   NaN
2                        age:12            NaN                   NaN
3                  height: 60cm   doctor: mike  visit date: 01/12/21
4   unique animal id:3597230857   doctor id: 2                   NaN
5                           NaN            NaN                   NaN
6                   animal: cat            NaN                   NaN
7                  color: orage            NaN                   NaN
8                        age:21            NaN                   NaN
9                  height: 40cm   doctor: mike                   NaN
10   unique animal id: 95760203   doctor id: 2   visit date:05/21/20
11                          NaN            NaN                   NaN
12                animal: snake            NaN                   NaN
13                 color: orage            NaN                   NaN
14                        age:3  doctor:sally                    NaN
15     unique animal id: 325125    doctor id:1   visit date:05/21/20
16                          NaN            NaN                   NaN
17                          NaN            NaN                   NaN
18                 animal: bird            NaN                   NaN
19                  color: blue            NaN                   NaN
20                 height: 40cm  doctor:sally                    NaN
21  unique animal id: 496701275    doctor id:1   visit date:05/21/20
Process finished with exit code 0

我不仅需要将信息绑定在unnamed:1 &未命名的:2到唯一的动物ID因为这是我跟踪动物的方式但也调换了这些列在"; "是带有适当值的列标头。

animal   color   age  height(cm)  unique animal id doctor  dictir id visit date
0    dog   brown  12.0        60.0        3597230857   mike          2 2021-01-12
1    cat   orage  21.0        40.0          95760203   mike          2 2020-05-21
2  snake  orange   3.0         NaN            325125  sally          1 2020-05-21
3   bird    blue   NaN        40.0         496701275  sally          1 2020-05-21

当前正在处理

for source in sources:
df = pd.read_excel(source)
df['new_heading'] = df['stuff'].str.split(':').str[0]
df['new_value'] = df['stuff'].str.split(':').str[1]
df.loc[df['new_heading'] == "animal", 'new_aninmal'] = df['new_value']
df.loc[df['new_heading'] == "color", 'new_color'] = df['new_value']
df.loc[df['new_heading'] == "age", 'new_age'] = df['new_value']
df.loc[df['new_heading'] == "height", 'new_height'] = df['new_value']
df.loc[df['new_heading'] == "unique animal id", 'new_uid'] = df['new_value']

但它给出了这个瀑布输出(预期如此)

stuff     Unnamed: 1            Unnamed: 2       new_heading   new_value new_aninmal new_color new_age new_height     new_uid
0                    animal:dog            NaN                   NaN            animal         dog         dog       NaN     NaN        NaN         NaN
1                   color:brown            NaN                   NaN             color       brown         NaN     brown     NaN        NaN         NaN
2                        age:12            NaN                   NaN               age          12         NaN       NaN      12        NaN         NaN
3                  height: 60cm   doctor: mike  visit date: 01/12/21            height        60cm         NaN       NaN     NaN       60cm         NaN
4   unique animal id:3597230857   doctor id: 2                   NaN  unique animal id  3597230857         NaN       NaN     NaN        NaN  3597230857
5                           NaN            NaN                   NaN               NaN         NaN         NaN       NaN     NaN        NaN         NaN
6                   animal: cat            NaN                   NaN            animal         cat         cat       NaN     NaN        NaN         NaN
7                  color: orage            NaN                   NaN             color       orage         NaN     orage     NaN        NaN         NaN
8                        age:21            NaN                   NaN               age          21         NaN       NaN      21        NaN         NaN
9                  height: 40cm   doctor: mike                   NaN            height        40cm         NaN       NaN     NaN       40cm         NaN
10   unique animal id: 95760203   doctor id: 2   visit date:05/21/20  unique animal id    95760203         NaN       NaN     NaN        NaN    95760203
11                          NaN            NaN                   NaN               NaN         NaN         NaN       NaN     NaN        NaN         NaN
12                animal: snake            NaN                   NaN            animal       snake       snake       NaN     NaN        NaN         NaN
13                 color: orage            NaN                   NaN             color       orage         NaN     orage     NaN        NaN         NaN
14                        age:3  doctor:sally                    NaN               age           3         NaN       NaN       3        NaN         NaN
15     unique animal id: 325125    doctor id:1   visit date:05/21/20  unique animal id      325125         NaN       NaN     NaN        NaN      325125
16                          NaN            NaN                   NaN               NaN         NaN         NaN       NaN     NaN        NaN         NaN
17                          NaN            NaN                   NaN               NaN         NaN         NaN       NaN     NaN        NaN         NaN
18                 animal: bird            NaN                   NaN            animal        bird        bird       NaN     NaN        NaN         NaN
19                  color: blue            NaN                   NaN             color        blue         NaN      blue     NaN        NaN         NaN
20                 height: 40cm  doctor:sally                    NaN            height        40cm         NaN       NaN     NaN       40cm         NaN
21  unique animal id: 496701275    doctor id:1   visit date:05/21/20  unique animal id   496701275         NaN       NaN     NaN        NaN   496701275

每个部分都以animal开头,因此创建一个计数器来确认它。计数器将用于数据的旋转:

reshape = (
df.assign(counter=np.where(df.stuff.str.startswith("animal:"), 1, 0).cumsum())
.melt("counter")
.dropna()
.drop(columns="variable")
)

在这里,我们用:分割列并透视数据;计数器确保唯一索引,这是成功的枢轴所必需的:

(
reshape.value.str.split(":", expand=True)
.assign(counter=reshape.counter)
.pivot("counter", 0, 1)
)

age     animal  color   doctor  doctor id   height  unique animal id    visit date
counter                                 
1       12      dog     brown   mike      2        60cm     3597230857  01/12/21
3       21      cat     orage   mike      2        40cm     95760203    05/21/20
5       3       snake   orage   sally     1         NaN     325125  05/21/20
8      NaN      bird    blue    sally     1        40cm     496701275   05/21/20
# find the animal group
df['index'] = df['stuff'].fillna('').str.contains('animal:').cumsum()
# stack to drop na and group animal group
obj = df.set_index('index').stack().droplevel(1)
# print(obj)
# split and expand, then strip blank space
df = obj.str.split(':', expand=True).applymap(str.strip)
# reset_index and unstack
dfn = df.set_index(0, append=True)[1].unstack()
# print(dfn)

输出:

print(obj)
index
1                     animal:dog
1                    color:brown
1                         age:12
1                   height: 60cm
1                   doctor: mike
1           visit date: 01/12/21
1    unique animal id:3597230857
1                   doctor id: 2
2                    animal: cat
2                   color: orage
2                         age:21
2                   height: 40cm
2                   doctor: mike
2     unique animal id: 95760203
2                   doctor id: 2
2            visit date:05/21/20
3                  animal: snake
3                   color: orage
3                          age:3
3                   doctor:sally
3       unique animal id: 325125
3                    doctor id:1
3            visit date:05/21/20
4                   animal: bird
4                    color: blue
4                   height: 40cm
4                   doctor:sally
4    unique animal id: 496701275
4                    doctor id:1
4            visit date:05/21/20
dtype: object
print(dfn)
age animal  color doctor doctor id height unique animal id visit date
1   12    dog  brown   mike         2   60cm       3597230857   01/12/21
2   21    cat  orage   mike         2   40cm         95760203   05/21/20
3    3  snake  orage  sally         1    NaN           325125   05/21/20
4  NaN   bird   blue  sally         1   40cm        496701275   05/21/20

样本数据:

import pandas as pd
import numpy as np
nan = np.nan
data = [{'stuff': 'animal:dog', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'color:brown', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'age:12', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'height: 60cm',
'Unnamed: 1': 'doctor: mike',
'Unnamed: 2': 'visit date: 01/12/21'},
{'stuff': 'unique animal id:3597230857',
'Unnamed: 1': 'doctor id: 2',
'Unnamed: 2': nan},
{'stuff': nan, 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'animal: cat', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'color: orage', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'age:21', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'height: 40cm', 'Unnamed: 1': 'doctor: mike', 'Unnamed: 2': nan},
{'stuff': 'unique animal id: 95760203',
'Unnamed: 1': 'doctor id: 2',
'Unnamed: 2': 'visit date:05/21/20'},
{'stuff': nan, 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'animal: snake', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'color: orage', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'age:3', 'Unnamed: 1': 'doctor:sally', 'Unnamed: 2': nan},
{'stuff': 'unique animal id: 325125',
'Unnamed: 1': 'doctor id:1',
'Unnamed: 2': 'visit date:05/21/20'},
{'stuff': nan, 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': nan, 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'animal: bird', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'color: blue', 'Unnamed: 1': nan, 'Unnamed: 2': nan},
{'stuff': 'height: 40cm', 'Unnamed: 1': 'doctor:sally', 'Unnamed: 2': nan},
{'stuff': 'unique animal id: 496701275',
'Unnamed: 1': 'doctor id:1',
'Unnamed: 2': 'visit date:05/21/20'}]
df = pd.DataFrame(data)

最新更新