序列的真值是不明确的。想不通

  • 本文关键字:不明确 想不通 python pandas
  • 更新时间 :
  • 英文 :


我是python的新手。。。因此,我编写了这个函数,它应该规范数据帧的"价格"列中包含的价格值:

def normalize_price(df): 
for elements in df['price']: 
if (df["price"]>= 1000) and (df['price']<= 1499): 
df['price'] = 1000 
return
elif 1500 <= df['price'] <= 2499:
df['price'] = 1500 
return
elif 2500 <= df['price'] <= 2999:
df['price'] = 2500 
return
elif 3000 <= df['price'] <= 3999:
df['price'] = 3000 
return

所以,当我调用它时,我会得到错误

---------------------------------------------------------------------------
<ipython-input-86-1e239d3cbba4> in normalize_price(df)
20 def normalize_price(df):
21     for elements in df['price']:
---> 22         if (df["price"]>= 1000) and (df['price']<= 1499):
23             df['price'] = 1000
24             return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

既然我快疯了,我想知道为什么:(谢谢

np.select可能是最简单的方法

def normalize_price(df): 
# create a list of conditions
cond = [
(df["price"]>= 1000) & (df['price']<= 1499),
1500 <= df['price'] <= 2499,
2500 <= df['price'] <= 2999,
3000 <= df['price'] <= 3999
]
# create a list of choices based on the conditions above
choice = [
1000,
1500,
2500,
3000
]
# use numpy.select and assign array to df['price']
df['price'] = np.select(cond, choice, df['price'])
return df

示例更新

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,10000, 50), columns=['price'])
def normalize_price(df): 
cond = [
(df["price"]>= 1000) & (df['price']<= 1499),
(df['price'] >= 1500) & (df['price'] <= 2499),
(df['price'] >= 2500) & (df['price'] <= 2999),
(df['price'] >= 3000) & (df['price'] <= 3999)
]
choice = [
1000,
1500,
2500,
3000
]
df['price_new'] = np.select(cond, choice, df['price'])
return df
normalize_price(df)
price  price_new
0     235        235
1    5192       5192
2     905        905
3    7813       7813
4    2895       2500 <-----
5    5056       5056
6     144        144
7    4225       4225
8    7751       7751
9    3462       3000 <----

这里应该真正避免for循环和if语句。你只想四舍五入到最近的500分,这样你就可以进行

import pandas as pd
import numpy as np
df = pd.DataFrame({"price":[1200, 1600, 2100, 3499]})
df["price"] = (df["price"]/500).apply(np.floor)*500

编辑如果您正在寻找更通用的解决方案


df = pd.DataFrame({"price":[1200, 1600, 2100, 3499,3600, 140000, 160000]})
df["div"] = 5*10**(df["price"].astype(str).str.len()-2)
(df["price"]/df["div"]).apply(np.floor)*df["div"]

您可以为此目的使用pandas.cut,在您的情况下:

bins=[1000, 1500, 2500, 3000, 4000]
df["bin"]=pd.cut(df["price"], bins, right=False, retbins=False, labels=bins[:-1])

假设bin列是函数的输出列

参考编号:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

最新更新