如何展开一个内部有树形结构的数据框架



我有一个数据框架,它有一种树状结构,像这样:

df1

Class   Type    Sub-class
0    1        ~       D
1    1        ~       C
2    1        ~       B
3    1        ~       A
4    1        ~       14
5    1        P       NaN       
6    A        ~       C
7    A        ~       D
8    A        ~       7
9    A        ~       B
10    A        P       NaN
11    B        ~       D
12    B        ~       4
13    B        ~       C
14    B        P       NaN
15    C        ~       D
16    C        ~       4
17    C        P       NaN
18    D        ~       18
19    D        ~       9
20    D        P       NaN

D类仅由数值类组成。定义为:

D        ~       18
D        ~       9
D        P       NaN

C类由1个数值类组成,d类定义为:

C        ~       D
C        ~       4
C        P       NaN

B类由1个数值类、D类和c类组成。定义为:

B        ~       D
B        ~       4
B        ~       C
B        P       NaN

A类由1个数字类、D类、C类和b类组成。定义为:

A        ~       C
A        ~       D
A        ~       7
A        ~       B
A        P       NaN

第1类由1个数字类组成,D类、C类、B类和a类。定义为:

1        ~       D
1        ~       C
1        ~       B
1        ~       A
1        ~       14
1        P       NaN 

目的是得到一个最终的数据帧,它是所有类的总串联-例如,当df1中的Sub-class列中提到类D时,我需要用

类D替换整行
Class   Type    Sub-class
D        ~       18
D        ~       9
D        P       NaN

这就是最终数据帧的外观(例如,我已经用D类替换了df1中的第15行)。目的是使Sub-class列只包含数值类:

df_final

Class   Type    Sub-class
0    1        ~       D      <--- Replace this Row with Class D
1    1        ~       C      <--- Replace this Row with Class C
2    1        ~       B      <--- Replace this Row with Class B
3    1        ~       A      <--- Replace this Row with Class A
4    1        ~       14
5    1        P       NaN       
6    A        ~       C      <--- Replace this Row with Class C
7    A        ~       D      <--- Replace this Row with Class D
8    A        ~       7
9    A        ~       B      <--- Replace this Row with Class B
10    A        P       NaN
11    B        ~       D      <--- Replace this Row with Class D
12    B        ~       4
13    B        ~       C      <--- Replace this Row with Class C
14    B        P       NaN
15    D        ~       18
16    D        ~       9
17    D        P       NaN
18    C        ~       4
19    C        P       NaN
20    D        ~       18
21    D        ~       9
22    D        P       NaN

IIUC,这就是你要找的:

df['Class'] = (np.where(
(df['Sub-class'] == 'A') | (df['Sub-class'] == 'B') | (df['Sub-class'] == 'C') | (df['Sub-class'] == 'D'), #conditions
df['Sub-class'], #values if True
df['Class'])) #values if False

结果:

<表类>指数类类型子类tbody><<tr>0D~D1C~C2B~B3~41~1451P南6C~C7D~D8~79道明>~B10P南11D~D12道明>~413 C~C14道明>P南15D~D16C~417CP南18D~1819D~920DP南

最新更新