我有以下数据集:
起始数据集:
ObjectID,Date,Price,Vol,Mx
101,2017-01-01,,145,203
101,2017-01-02,,155,163
101,2017-01-03,67.0,140,234
101,2017-01-04,78.0,130,182
101,2017-01-05,58.0,178,202
101,2017-01-06,53.0,134,204
101,2017-01-07,52.0,134,183
101,2017-01-08,62.0,148,176
101,2017-01-09,42.0,152,193
101,2017-01-10,80.0,137,150
我首先根据起始数据集中的值创建了两列新的布尔值,称为VolPrice和Check。我想创建一个名为DoubleCheck的第三个附加列,如果VolPrice或Check等于True,则该列的值应为True,否则DoubleCheck的值应该为false。最初我得到以下错误:
ValueError:包含多个元素的数组的真值不明确。使用.any((或.all((
但后来我在语句中的每一列后面都添加了.any((来构造DoubleCheck列。然而,这也不起作用,因为它在整个DoubleCheck列中提供了"True"值,即使应该有假值,如下所示。
代码:
import pandas as pd
import numpy as np
Observations = pd.read_csv("C:\Users\Observations.csv", parse_dates=['Date'], index_col=['ObjectID', 'Date'])
Observations['VolPrice'] = np.where((Observations['Price']<Observations['Vol']) & (Observations['Vol']<Observations['Mx']), True, False)
Observations['Check'] = np.where(Observations['Vol']<Observations['Price'], True, False)
Observations['DoubleCheck'] = np.where((Observations['Check'].any()==True) or (Observations['VolPrice'].any()==True), True, False)
print(Observations)
当前结果:
ObjectID,Date,Price,Vol,Mx,VolPrice,Check,DoubleCheck
101,2017-01-01,,145,203,False,False,True
101,2017-01-02,,155,163,False,False,True
101,2017-01-03,67.0,140,234,True,False,True
101,2017-01-04,78.0,130,182,True,False,True
101,2017-01-05,58.0,178,202,True,False,True
101,2017-01-06,53.0,134,204,True,False,True
101,2017-01-07,52.0,134,183,True,False,True
101,2017-01-08,62.0,148,176,True,False,True
101,2017-01-09,42.0,152,193,True,False,True
101,2017-01-10,80.0,137,150,True,False,True
期望结果:
ObjectID,Date,Price,Vol,Mx,VolPrice,Check,DoubleCheck
101,2017-01-01,,145,203,False,False,False
101,2017-01-02,,155,163,False,False,False
101,2017-01-03,67.0,140,234,True,False,True
101,2017-01-04,78.0,130,182,True,False,True
101,2017-01-05,58.0,178,202,True,False,True
101,2017-01-06,53.0,134,204,True,False,True
101,2017-01-07,52.0,134,183,True,False,True
101,2017-01-08,62.0,148,176,True,False,True
101,2017-01-09,42.0,152,193,True,False,True
101,2017-01-10,80.0,137,150,True,False,True
将|
用于位OR
,与&
用于位AND
:相同
Observations['DoubleCheck'] = Observations['Check'] | Observations['VolPrice']
或同时具有两列的DataFrame.any
:
Observations['DoubleCheck'] = Observations[['Check','VolPrice']].any(axis=1)
没有np.where
:,所有这些都是可能的
Observations['VolPrice'] = (Observations['Price']<Observations['Vol']) & (Observations['Vol']<Observations['Mx'])
Observations['Check'] = Observations['Vol']<Observations['Price']
Observations['DoubleCheck'] = Observations['Check'] | Observations['VolPrice']