如何在Pandas中使用过滤后的数据?



我是熊猫新手。下面是代码的一部分。我试图使用df_filtered,这是具有钼列值=AB123的过滤数据。然而,从第14行开始,如果我使用df_filtered而不是excel_data_df,则不会给出结果。想要的列被正确地选中了。但是值过滤没有发生-在钼中,col值应该是= AB123。但值过滤没有发生,我得到整个excel转换为json与选定的列。请帮助理解如何考虑/使用来自第14行df_filtered数据。

PathLink = os.path.join(path, 'test' + '.json') #Path Name Formation 
excel_data_df = pandas.read_excel('input.xlsx',
sheet_name='input_sheet1',
usecols=[4,5,6,18,19], #index starts from 0
names=['codenum', 'name',
'class',
'school',
'city'],
dtype={'codenum': str,
'name': str,
'school': str,
'city': str})  # Excel Read and Column Filtering
df_filtered = excel_data_df.loc[lambda x: x['codenum'] == 'AB123']    # Row Filtering   -- need to use this further     
excel_data_df.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns  -- Line Num 14
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
excel_data_df = excel_data_df[cols_to_keep]  # columns to keep
excel_data_df                                # columns to keep
json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file

首先,一个小提示;您可以通过执行

来删除对lambda的使用/需要
df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]

如果你想去掉lambda

之后,正如评论中指出的,确保它包含过滤后的样本;

df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]
if df_filtered.shape[0]: #it contains samples
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
excel_data_df = excel_data_df[cols_to_keep]  # columns to keep
excel_data_df                                # columns to keep
json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file
else: #it does not contain any samples i.e empty dataframe
print("Filtered data does not contain data")

试试下面的代码:

df_filtered = excel_data_df[excel_data_df['codenum'] == 'AB123']

如果仍然不工作,则"代码"可能没有您试图过滤掉的这个值。

感谢大家的意见。最初,它是返回空数据帧,建议在上面的答案和评论。根据您的输入发布编辑后的工作代码,供任何人将来参考。

PathLink = os.path.join(path, 'test' + '.json') #Path Name Formation 
excel_data_df = pandas.read_excel('input.xlsx',
sheet_name='input_sheet1',
usecols=[3,5,6,18,19], #index starts from 0 ##  edit 1: corrected index to right column index
names=['codenum', 'name', 
'class',
'school',
'city'],
dtype={'codenum': str,
'name': str,
'school': str,
'city': str})  # Excel Read and Column Filtering

print(excel_data_df['codenum'].unique())  ##edit 1: returns unique values including AB123

df_filtered = excel_data_df.loc[excel_data_df["codenum"]=='AB123']    # Row Filtering   ##edit 1     
print(df_filtered)  ##edit 1 - to check if expected results are present in filtered data

df_filtered.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns  

if df_filtered.shape[0]: #it contains samples ## edit 1 
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
df_filtered = df_filtered[cols_to_keep]  # columns to keep
df_filtered                                # columns to keep
json_str = df_filtered.to_json(PathLink,orient='records',indent=2) #json converted file
else: #it does not contain any samples i.e empty dataframe ##edit 1 
print("Filtered data does not contain data")

熊猫。Loc将返回过滤后的结果。

在您的代码中,您试图创建一个过滤的但是df。loc是不是过滤器制作器

参见示例,df。Loc返回原点df的过滤结果。

import pandas as pd
df = pd.DataFrame([[1, "AB123"], [4, "BC123"], [7, "CD123"]],columns=['A', 'B'])
print(df)
#   A      B
#0  1  AB123
#1  4  BC123
#2  7  CD123
print(df.loc[lambda x: x["B"] == "AB123"])
#   A      B
#0  1  AB123