我有以下数据帧,当我应用熔融函数时:
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
df_rm_features_melted = df_rm_features.melt(
id_vars=['id', 'date'],
value_vars=df_rm_features.select_dtypes(include=numerics).columns
)
我得到错误:InvalidIndexError: Reindexing only valid with uniquely valued Index objects
id date factoring lglc overdue_60 distress max_dpd tr_op
0288 12/1/2018 0 1 1 1 0 0
0288 1/1/2019 0 0 0 0 10 1
0288 2/1/2019 0 0 0 0 2 1
0288 3/1/2019 0 0 0 0 52 1
0288 4/1/2019 0 0 0 1 2 0
首先,我从您的代码中得到的错误是ValueError: arrays must all be same length
。可能是因为id
在id_vars
列表中,也是一个数字列,所以最终也在value_vars
列表中。
要从value_vars
中删除id
和任何其他列,而不必显式指定,请将numpy.setdiff1d()
与select_dtypes
子句一起使用:
id_vars=['id', 'date']
wanted_vals = df_rm_features.select_dtypes(include=numerics).columns
canhave_vals = np.setdiff1d(wanted_vals, id_vars)
df_rm_features.melt(id_vars=id_vars,
value_vars=canhave_vals)
输出:
id date variable value
0 288 12/1/2018 distress 1
1 288 1/1/2019 distress 0
2 288 2/1/2019 distress 0
3 288 3/1/2019 distress 0
4 288 4/1/2019 distress 1
5 288 12/1/2018 factoring 0
6 288 1/1/2019 factoring 0
7 288 2/1/2019 factoring 0
8 288 3/1/2019 factoring 0
9 288 4/1/2019 factoring 0
10 288 12/1/2018 lglc 1
11 288 1/1/2019 lglc 0
12 288 2/1/2019 lglc 0
13 288 3/1/2019 lglc 0
14 288 4/1/2019 lglc 0
15 288 12/1/2018 max_dpd 0
16 288 1/1/2019 max_dpd 10
17 288 2/1/2019 max_dpd 2
18 288 3/1/2019 max_dpd 52
19 288 4/1/2019 max_dpd 2
20 288 12/1/2018 overdue_60 1
21 288 1/1/2019 overdue_60 0
22 288 2/1/2019 overdue_60 0
23 288 3/1/2019 overdue_60 0
24 288 4/1/2019 overdue_60 0
25 288 12/1/2018 tr_op 0
26 288 1/1/2019 tr_op 1
27 288 2/1/2019 tr_op 1
28 288 3/1/2019 tr_op 1
29 288 4/1/2019 tr_op 0