我正在尝试弄清楚如何过滤熊猫中的数据,然后为符合过滤器标准的项目中的所有行分配一个值,并影响原始数据框架。这是我到目前为止的最接近尝试,但它引发了很多信息警告:
import pandas as pd
df = pd.read_csv('http://www.sharecsv.com/dl/9096d32f98aa0ac671a1cca16fa43be8/SalesJan2009.csv')
df['Zone'] = ''
zone1 = df[(df['Latitude'] > 0) & (df['Latitude'] > 0)]
zone2 = df[(df['Latitude'] < 0) & (df['Latitude'] > 0)]
zone3 = df[(df['Latitude'] > 0) & (df['Latitude'] < 0)]
zone4 = df[(df['Latitude'] < 0) & (df['Latitude'] < 0)]
zone1[['Zone']] = zone1[['Zone']] = 1
zone2[['Zone']] = zone1[['Zone']] = 2
zone3[['Zone']] = zone1[['Zone']] = 3
zone4[['Zone']] = zone1[['Zone']] = 4
df
这根本不会影响原始数据框架,但它正在设置过滤子集中的值。
我假设我可能需要过滤出满足每个过滤器的所有内容,然后将其从原件中删除,然后将更改置于原始?
上这是一个随机数据集,可以说明我要做的事情,但是我的实际数据集具有不符合任何过滤条件的数据,我也需要将其视为未知的数据,因为我并没有像我那样消耗所有行与此示例有关。
我试图避免在每一行上循环并检查每个行的条件
iiuc,您是否想做这样的事情:
zone1 = (df['Latitude'] > 0) & (df['Longitude'] > 0)
zone2 = (df['Latitude'] < 0) & (df['Longitude'] > 0)
zone3 = (df['Latitude'] > 0) & (df['Longitude'] < 0)
zone4 = (df['Latitude'] < 0) & (df['Longitude'] < 0)
df['Zone'] = np.select([zone1,zone2,zone3,zone3],['Zone 1','Zone 2', 'Zone 3','Zone 4'])
输出:
Transaction_date Product Price Payment_Type Name
0 1/2/09 6:17 Product1 1200 Mastercard carolina
1 1/2/09 4:53 Product1 1200 Visa Betina
2 1/2/09 13:08 Product1 1200 Mastercard Federica e Andrea
3 1/3/09 14:44 Product1 1200 Visa Gouya
4 1/4/09 12:56 Product2 3600 Visa Gerd W
City State Country Account_Created
0 Basildon England United Kingdom 1/2/09 6:00
1 Parkville MO United States 1/2/09 4:42
2 Astoria OR United States 1/1/09 16:21
3 Echuca Victoria Australia 9/25/05 21:13
4 Cahaba Heights AL United States 11/15/08 15:47
Last_Login Latitude Longitude Zone
0 1/2/09 6:08 51.500000 -1.116667 Zone 3
1 1/2/09 7:49 39.195000 -94.681940 Zone 3
2 1/3/09 12:32 46.188060 -123.830000 Zone 3
3 1/3/09 14:22 -36.133333 144.750000 Zone 2
4 1/4/09 12:45 33.520560 -86.802500 Zone 3
您错过了两个条件都在检查 latitude ,并且应该检查.loc
,以便您学习如何以正确的方式更改DataFrame的某些部分。<<<<<<<<。/p>
import pandas as pd
df = pd.read_csv('http://www.sharecsv.com/dl/9096d32f98aa0ac671a1cca16fa43be8/SalesJan2009.csv')
df['Zone'] = ''
zone1 = (df['Latitude'] > 0) & (df['Longitude'] > 0)
zone2 = (df['Latitude'] < 0) & (df['Longitude'] > 0)
zone3 = (df['Latitude'] > 0) & (df['Longitude'] < 0)
zone4 = (df['Latitude'] < 0) & (df['Longitude'] < 0)
df.loc[zone1, 'Zone'] = 1
df.loc[zone2, 'Zone'] = 2
df.loc[zone3, 'Zone'] = 3
df.loc[zone4, 'Zone'] = 4
df