根据条件打印数据集的特定行集



我正在尝试的:

import re
new_df = census_df.loc[(census_df['REGION']==1 | census_df['REGION']== 2) & (census_df['CTYNAME'].str.contains('^Washington[a-z]*'))& (census_df['POPESTIMATE2015']>census_df['POPESTIMATE2014'])]
new_df

它返回此错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

您需要在filt_1中的每个逻辑表达式两边设置括号:

filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)

请注意,我的census_df数据是半虚构的,但显示了功能。从filt_1分配操作到以下的所有操作仍将适用于整个census_df数据帧。这是完整的程序:

import pandas as pd
cols = ['REGION', 'CTYNAME', 'POPESTIMATE2014', 'POPESTIMATE2015']
data = [[1, "Washington", 4846411, 4858979],
[3, "Autauga County", 55290, 55347]]
census_df = pd.DataFrame(data, columns=cols)
filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)
filt_2 = census_df['CTYNAME'].str.contains("^Washington[a-z]*")
filt_3 = census_df['POPESTIMATE2015'] > census_df['POPESTIMATE2014']
filt = filt_1 & filt_2 & filt_3
new_df = census_df.loc[filt]
print(new_df)

返回:

REGION     CTYNAME  POPESTIMATE2014  POPESTIMATE2015
0       1  Washington          4846411          4858979

最新更新