我有一个如下的数据帧。我在Price列上应用了MinMaxScaler,并使用inverse_transform函数来获取原始价格值,但它给出了错误的结果。请就此向我提出建议。
DataFrame:
Date Customer Price
1/6/2019 A 142404534.13
1/7/2019 A 989.34
1/8/2019 A 45444.57
1/9/2019 A 574343.10
1/10/2019 A 23232.34
1/1/2019 A 923423.00
1/2/2019 A 332.00
1/3/2019 A 2342323.24
1/4/2019 A 232.00
1/5/2019 A 65.70
1/6/2019 B 875.46
1/7/2019 B 142466027340.03
1/8/2019 B 25.17
1/9/2019 B 1.01
1/10/2019 B 1.00
1/10/2019 B 57.61
1/6/2019 B 232232.78
1/7/2019 B 15.20
1/8/2019 B 44.56
1/9/2019 B 2323254.45
1/10/2019 B 395.45
1/10/2019 B 23423454.92
1/6/2019 C 34.12
1/7/2019 C 89.34
1/8/2019 C 44.57
1/9/2019 C 343.10
1/10/2019 C 232.34
df:上的MinMaxScaler代码
from sklearn.preprocessing import MinMaxScaler
df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x))
scaler=MinMaxScaler()
dff = df.groupby('Customer').Price.transform(lambda s:scaler.fit_transform(s.values.reshape(-1,1)).ravel())
dff = pd.DataFrame(dff)
dff['Price'] = dff['Price'].apply(lambda x: '{:.2f}'.format(x))
dff = pd.concat([dff['Price'] , df['Customer']] , axis=1)
dff输出:
Price Customer
0 1.00 A
1 0.00 A
2 0.00 A
3 0.00 A
4 0.00 A
5 0.01 A
6 0.00 A
7 0.02 A
8 0.00 A
9 0.00 A
10 0.00 B
11 1.00 B
12 0.00 B
.
.
.
.
20 0.00 B
21 0.00 B
22 0.00 C
23 0.18 C
24 0.03 C
25 1.00 C
26 0.64 C
inverse_transform函数代码以获得实际价格值:
dd = dff.groupby('Customer').Price.transform(lambda s: scaler.inverse_transform(s.values.reshape(-1,1)).ravel())
dd = pd.DataFrame(dd)
dd['Price'] = dd['Price'].apply(lambda x: '{:.2f}'.format(x))
dd = pd.concat([dd['Price'] , df['Customer']] , axis=1)
dd输出:
Price Customer
0 343.10 A
1 34.12 A
2 34.12 A
3 34.12 A
4 34.12 A
5 37.21 A
6 34.12 A
7 40.30 A
8 34.12 A
9 34.12 A
10 34.12 B
11 343.10 B
12 34.12 B
13 34.12 B
.
.
.
.
.
18 34.12 B
19 34.12 B
20 34.12 B
21 34.12 B
22 34.12 C
23 89.74 C
24 43.39 C
25 343.10 C
26 231.87 C
请帮我建议一下。
脚本中存在多个问题:
- 使用
df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x))
可以将Price列转换为字符串,因此任何数字运算在此数据类型上都无效。若要在显示器上设置浮点的精度,可以在导入pandas行后使用pd.set_option('display.float_format', lambda x: '%.2f' % x)
-
如果我理解得很好,你想根据客户范围为0-1值。如果是这样的话,你不能在一行中做到这一点,比如:
df.groupby('Customer'(.Price.transform(lambdax: scaler.fit_transform(s.values.整形(-1,1((.rave((
因为每个组对缩放器对象的每次迭代都会覆盖上一组的最小/最大参数。所以在最后,当你调用scaler.inverse_transform((.时,你有了最后一组的参数
解决方案:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price_scaled'] = None
df['Price_inverse_scaled'] = None
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df[df['Customer']==customer].Price.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df[df['Customer']==customer].Price_scaled.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_inverse_scaled'] = inverse_scale
输出(我将精度设置为.7f,因为对于许多值来说,最大值太大,并且大多数值在两倍精度中变为0.00(:
Date Customer Price Price_scaled Price_inverse_scaled
0 2019-01-06 A 142404534.1300000 1.0000000 142404534.1300000
1 2019-01-07 A 989.3400000 0.0000065 989.3400000
2 2019-01-08 A 45444.5700000 0.0003187 45444.5700000
3 2019-01-09 A 574343.1000000 0.0040327 574343.1000000
4 2019-01-10 A 23232.3400000 0.0001627 23232.3400000
5 2019-01-01 A 923423.0000000 0.0064840 923423.0000000
6 2019-01-02 A 332.0000000 0.0000019 332.0000000
7 2019-01-03 A 2342323.2400000 0.0164479 2342323.2400000
8 2019-01-04 A 232.0000000 0.0000012 232.0000000
9 2019-01-05 A 65.7000000 0.0000000 65.7000000
10 2019-01-06 B 875.4600000 0.0000000 875.4600000
11 2019-01-07 B 142466027340.0299988 1.0000000 142466027340.0299988
12 2019-01-08 B 25.1700000 0.0000000 25.1700000
13 2019-01-09 B 1.0100000 0.0000000 1.0100000
14 2019-01-10 B 1.0000000 0.0000000 1.0000000
15 2019-01-10 B 57.6100000 0.0000000 57.6100000
16 2019-01-06 B 232232.7800000 0.0000016 232232.7800000
17 2019-01-07 B 15.2000000 0.0000000 15.2000000
18 2019-01-08 B 44.5600000 0.0000000 44.5600000
19 2019-01-09 B 2323254.4500000 0.0000163 2323254.4500000
20 2019-01-10 B 395.4500000 0.0000000 395.4500000
21 2019-01-10 B 23423454.9200000 0.0001644 23423454.9200000
22 2019-01-06 C 34.1200000 0.0000000 34.1200000
23 2019-01-07 C 89.3400000 0.1787171 89.3400000
24 2019-01-08 C 44.5700000 0.0338210 44.5700000
25 2019-01-09 C 343.1000000 1.0000000 343.1000000
26 2019-01-10 C 232.3400000 0.6415302 232.3400000
编辑:多列版本
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price2'] = range(df.shape[0])
df['Price3'] = df['Price2'] * 2
df = df.join(pd.DataFrame(0, df.index, ['Price1_scaled', 'Price2_scaled', 'Price3_scaled', 'Price1_inverse_scaled', 'Price2_inverse_scaled', 'Price3_inverse_scaled']))
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df.loc[df['Customer']==customer, 'Price':'Price3'])
df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'])
df.loc[df['Customer']==customer, 'Price1_inverse_scaled':'Price3_inverse_scaled'] = inverse_scale
输出:
Date Customer Price Price2 Price3 Price1_scaled Price2_scaled Price3_scaled Price1_inverse_scaled Price2_inverse_scaled Price3_inverse_scaled
0 2019-01-06 A 142404534.1300000 0 0 1.0000000 0.0000000 0.0000000 142404534.1300000 0.0000000 0.0000000
1 2019-01-07 A 989.3400000 1 2 0.0000065 0.1111111 0.1111111 989.3400000 1.0000000 2.0000000
2 2019-01-08 A 45444.5700000 2 4 0.0003187 0.2222222 0.2222222 45444.5700000 2.0000000 4.0000000
3 2019-01-09 A 574343.1000000 3 6 0.0040327 0.3333333 0.3333333 574343.1000000 3.0000000 6.0000000
4 2019-01-10 A 23232.3400000 4 8 0.0001627 0.4444444 0.4444444 23232.3400000 4.0000000 8.0000000
5 2019-01-01 A 923423.0000000 5 10 0.0064840 0.5555556 0.5555556 923423.0000000 5.0000000 10.0000000
6 2019-01-02 A 332.0000000 6 12 0.0000019 0.6666667 0.6666667 332.0000000 6.0000000 12.0000000
7 2019-01-03 A 2342323.2400000 7 14 0.0164479 0.7777778 0.7777778 2342323.2400000 7.0000000 14.0000000
8 2019-01-04 A 232.0000000 8 16 0.0000012 0.8888889 0.8888889 232.0000000 8.0000000 16.0000000
9 2019-01-05 A 65.7000000 9 18 0.0000000 1.0000000 1.0000000 65.7000000 9.0000000 18.0000000
10 2019-01-06 B 875.4600000 10 20 0.0000000 0.0000000 0.0000000 875.4600000 10.0000000 20.0000000
11 2019-01-07 B 142466027340.0299988 11 22 1.0000000 0.0909091 0.0909091 142466027340.0299988 11.0000000 22.0000000
12 2019-01-08 B 25.1700000 12 24 0.0000000 0.1818182 0.1818182 25.1700000 12.0000000 24.0000000
13 2019-01-09 B 1.0100000 13 26 0.0000000 0.2727273 0.2727273 1.0100000 13.0000000 26.0000000
14 2019-01-10 B 1.0000000 14 28 0.0000000 0.3636364 0.3636364 1.0000000 14.0000000 28.0000000
15 2019-01-10 B 57.6100000 15 30 0.0000000 0.4545455 0.4545455 57.6100000 15.0000000 30.0000000
16 2019-01-06 B 232232.7800000 16 32 0.0000016 0.5454545 0.5454545 232232.7800000 16.0000000 32.0000000
17 2019-01-07 B 15.2000000 17 34 0.0000000 0.6363636 0.6363636 15.2000000 17.0000000 34.0000000
18 2019-01-08 B 44.5600000 18 36 0.0000000 0.7272727 0.7272727 44.5600000 18.0000000 36.0000000
19 2019-01-09 B 2323254.4500000 19 38 0.0000163 0.8181818 0.8181818 2323254.4500000 19.0000000 38.0000000
20 2019-01-10 B 395.4500000 20 40 0.0000000 0.9090909 0.9090909 395.4500000 20.0000000 40.0000000
21 2019-01-10 B 23423454.9200000 21 42 0.0001644 1.0000000 1.0000000 23423454.9200000 21.0000000 42.0000000
22 2019-01-06 C 34.1200000 22 44 0.0000000 0.0000000 0.0000000 34.1200000 22.0000000 44.0000000
23 2019-01-07 C 89.3400000 23 46 0.1787171 0.2500000 0.2500000 89.3400000 23.0000000 46.0000000
24 2019-01-08 C 44.5700000 24 48 0.0338210 0.5000000 0.5000000 44.5700000 24.0000000 48.0000000
25 2019-01-09 C 343.1000000 25 50 1.0000000 0.7500000 0.7500000 343.1000000 25.0000000 50.0000000
26 2019-01-10 C 232.3400000 26 52 0.6415302 1.0000000 1.0000000 232.3400000 26.0000000 52.0000000