我正在尝试这段代码,但需要删除数据帧中重复项的通用实现:
import pandas as pd
# making data frame from csv file
data = pd.read_csv("C:/Users/gvsph/Downloads/employees.csv")
# sorting by first name
data.sort_values("First Name", inplace=True)
# dropping ALL duplicte values
data.drop_duplicates(subset="First Name",
keep=False, inplace=True)
# displaying data
print(data)
您可以使用不带参数的"drop_duplicates"从数据集中删除所有重复记录。
cfr熊猫文档