我正在尝试的:
import re
new_df = census_df.loc[(census_df['REGION']==1 | census_df['REGION']== 2) & (census_df['CTYNAME'].str.contains('^Washington[a-z]*'))& (census_df['POPESTIMATE2015']>census_df['POPESTIMATE2014'])]
new_df
它返回此错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
您需要在filt_1中的每个逻辑表达式两边设置括号:
filt_1 = (census_df['REGION'] == 1) | (census_df['REGION'] == 2)
请注意,我的census_df数据是半虚构的,但显示了功能。从filt_1分配操作到以下的所有操作仍将适用于整个census_df数据帧。这是完整的程序:
import pandas as pd
cols = ['REGION', 'CTYNAME', 'POPESTIMATE2014', 'POPESTIMATE2015']
data = [[1, "Washington", 4846411, 4858979],
[3, "Autauga County", 55290, 55347]]
census_df = pd.DataFrame(data, columns=cols)
filt_1 = (census_df['REGION'] == 1) | (census_df['REGION'] == 2)
filt_2 = census_df['CTYNAME'].str.contains("^Washington[a-z]*")
filt_3 = census_df['POPESTIMATE2015'] > census_df['POPESTIMATE2014']
filt = filt_1 & filt_2 & filt_3
new_df = census_df.loc[filt]
print(new_df)
返回:
REGION CTYNAME POPESTIMATE2014 POPESTIMATE2015
0 1 Washington 4846411 4858979