我有一个包含前3个原因的表(表1)和另一个包含每个变量所属类别的表(表2)。我正在尝试将类别箱匹配到表3中的原因表中。
Table 1: top 3 reasons, basically which 3 variable was the most important for that Register number
RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
Table 2: The category bin for each variable
RegNo v2 v3 v4 v5 v6
1111 (0.44, 0.64] (0.0, 60.0] missing (-inf, 3.0] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] welldone missing missing
3333 (0.44, 0.64] (0.0, 60.0] rare missing missing
4444 (0.44, 0.64] (0.0, 60.0] missing missing (20.0, 102.0]
5555 (0.64, inf] (0.0, 60.0] missing missing (-inf, 20.0]
Table 3: For each ID, find category bin label and replace reason1,2 & 3 with the labels
RegNo Reason1 Reason2 Reason3
1111 (0.0, 60.0] (0.44, 0.64] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] missing
3333 (0.0, 60.0] (0.44, 0.64] missing
4444 (0.0, 60.0] (20.0, 102.0] (0.44, 0.64]
5555 (0.0, 60.0] (0.64, inf] missing
方法
- 以
join()
的方式索引两个数据帧 - 则是join 输出的每个原因的
pd.concat()
df1 = pd.read_csv(io.StringIO(""" RegNo Reason1 Reason2 Reason3
1111 v3 v2 v6
2222 v2 v3 v5
3333 v3 v2 v6
4444 v3 v6 v2
5555 v3 v2 v5
"""), sep="s+")
df2 = pd.read_csv(io.StringIO(""" RegNo v2 v3 v4 v5 v6
1111 (0.44, 0.64] (0.0, 60.0] missing (-inf, 3.0] (102.0, 175.0]
2222 (-inf, 0.33] (0.0, 60.0] welldone missing missing
3333 (0.44, 0.64] (0.0, 60.0] rare missing missing
4444 (0.44, 0.64] (0.0, 60.0] missing missing (20.0, 102.0]
5555 (0.64, inf] (0.0, 60.0] missing missing (-inf, 20.0]
"""),sep="ss+", engine="python")
# index main df
df1 = df1.set_index("RegNo")
# reshape and index reasons
dfm = df2.set_index("RegNo").stack()
df3 = pd.concat([
# add apprpriate column into index, don't want othert columns
(df1.set_index(c, append="True").loc[:,[]]
# now it's a straight forward join
.join(dfm.rename_axis(["RegNo",c]).to_frame())
# cleanup index and rename columns of joined DF
.droplevel(1).rename(columns={0:c}))
for c in ["Reason1","Reason2","Reason3"]], axis=1)
Reason31111 "text-align:左;"(0.0,60.0) "text-align:左;"(0.44,0.64) "text-align:左;"(102.0,175.0) (负无穷,0.33](0.0,60.0) 3333 "text-align:左;"(0.0,60.0) "text-align:左;"(0.44,0.64) 4444 "text-align:左;"(0.0,60.0) "text-align:左;"(20.0,102.0) "text-align:左;"(0.44,0.64)5555 "text-align:左;"(0.0,60.0) "text-align:左;"(0.64,正]