我正在尝试使用一个大数据框架(1M行),我需要设置高(1)和低(0)配置文件,我构建这个函数,但它是相当长的去通过所有的列和行,我怎么能改进它?我听说过矢量化,但不知道如何设置。
多谢
#x is a dataframe
def flag_low(x):
if x['EAN'] in list1:
if (x['local_weekday'] >= 5 ):
return 1
elif ((x['local_hour'] <= 6) | (23 <= x['local_hour'])):
return 1
elif ((x['local_hour'] == 7) & ( x['local_minute'] < 30 )):
return 1
elif ((x['local_hour'] == 22) & ( 30 <= x['local_minute'] )):
return 1
elif x['EAN'] in list2:
if (x['local_weekday'] >= 5 ):
return 1
elif ((x['local_hour'] <= 6) | (23 <= x['local_hour'])):
return 1
elif x['EAN'] in list3:
if (x['local_weekday'] >= 5 ):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
elif x['EAN'] in list4:
if (x['local_weekday'] >= 5 ):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
elif x['EAN'] in list5:
if (x['local_weekday'] >= 5 ):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
elif x['EAN'] in list6:
if (x['local_weekday'] >= 5 ):
return 1
elif (x['local_time'] in be_holidays):
return 1
elif ((x['local_hour'] <= 5) | (21 <= x['local_hour'])):
return 1
elif x['EAN'] in list7:
if (x['local_weekday'] >= 5 ):
return 1
elif (x['local_time'] in be_holidays):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
elif x['EAN'] in list8:
if (x['local_weekday'] >= 5 ):
return 1
elif (x['local_time'] in be_holidays):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
elif x['EAN'] in list9:
if (x['local_weekday'] >= 5 ):
return 1
elif (x['local_time'] in be_holidays):
return 1
elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
return 1
else:
return 0
dataframe['BinLow'] = dataframe.apply(flag_low, axis = 1)
下一步
我做了@Ade_1解释,但无法验证它是否有效,由于一个问题:
TypeError: Cannot convert type '
' to date.
在行
(x['local_time'] in be_holidays)
我该如何解决这个问题?
由于有很多if/else条件,因此需要使用np.select()进行矢量化。
对于嵌套的if,您必须将它们链接在一起
语法如下
condition= [
(df['column'].isin(list1)) & (df['column']>= 5)
#continue the conditions
]
choices= [
1,
# continue
]
dataframe['BinLow']= np.select(condition, choices, default=0)
注意:另一个条件应该与你的选择相匹配。另外,np.select()中的默认值代表最后一个else语句。