
  • 本文关键字:数据 拆分 pandas dataframe split
  • 更新时间 :
  • 英文 :


|  Type | ColumnName | Source   |
| 1     | customerID | exp1.txt |
| 2     | name       | NaN      |
| 2     | surname    | NaN      |
| 3     | NaN        | NaN      | ← here i want to split
| 1     | materialID | exp2.txt |
| 2     | weight     | NaN      |
| 2     | dim        | NaN      |
| 3     | NaN        | NaN      | ← here i want to split
| 1     | orderID    | exp3.txt |
Wished output:
|  Type | ColumnName | Source   | 
| 1     | customerID | exp1.txt |
| 2     | name       | NaN      |
| 2     | surname    | NaN      |
|  Type | ColumnName | Source   | 
| 1     | materialID | exp2.txt |
| 2     | weight     | NaN      |
| 2     | dim        | NaN      |
...and so on
then i want to transpose the ColumnName into rows to create a table header.
After that i want to concernate the actual data from the expX.txt file definied on the Source Column.
Desired output for one example:
| CustomerID | name       | surname  | 
| 125        | Max        | Cool     | line 1 in exp1.txt
| 126        | Peter      | Smith    | line 3 in exp1.txt
| 127        | Jon        | Doe      | line 3 in exp1.txt
...and so on                           ...



out = [d for _,d in df.groupby(df['Source'].notna().cumsum())]


[   Type  ColumnName    Source
0     1  customerID  exp1.txt
1     2        name       NaN
2     2     surname       NaN
3     3         NaN       NaN,
Type  ColumnName    Source
4     1  materialID  exp2.txt
5     2      weight       NaN
6     2         dim       NaN
7     3         NaN       NaN,
Type ColumnName    Source
8     1    orderID  exp3.txt]


for i, (_,d) in enumerate(df.groupby(df['Source'].notna().cumsum()), start=1):
print(f'--- group {i} ---')


--- group 1 ---
Type  ColumnName    Source
0     1  customerID  exp1.txt
1     2        name       NaN
2     2     surname       NaN
3     3         NaN       NaN
--- group 2 ---
Type  ColumnName    Source
4     1  materialID  exp2.txt
5     2      weight       NaN
6     2         dim       NaN
7     3         NaN       NaN
--- group 3 ---
Type ColumnName    Source
8     1    orderID  exp3.txt
