我有一列应该只包含一个"a";或";b";,
我该如何检查列中是否有其他输入
ps:我认为在R中它使用了这个
table(df$column_name)
我如何在熊猫中实现类似的输出
我认为可以先使用groupby()
,然后使用size()
import pandas as pd
data = [
{"colA": "John", "colB": "a"},
{"colA": "Jane", "colB": "b"},
{"colA": "Bob", "colB": "c"},
{"colA": "Rob", "colB": "a"},
{"colA": "Hobb", "colB": "b"},
{"colA": "Greg", "colB": "b"},
{"colA": "Jennie", "colB": "a"},
{"colA": "Joe", "colB": "a"},
{"colA": "Howard", "colB": "x"},
{"colA": "Dave", "colB": "a"},
]
dataframe = pd.DataFrame(data)
print(dataframe.groupby("colB").size())
输出:
colB
a 5
b 3
c 1
x 1
dtype: int64
假设列中没有NaN值
df["your column name"].value_counts() #this gives you the unique values and how many times they have occured in your column.
或
df["your column name"].nunique() #this only gives you the number of unique values.
检查您的列是否具有NaN值
df["your column name"].isna().sum()
希望这能有所帮助。
您可以使用:
df['column_name'].isin(['a', 'b']).all()
如果将输出True
,则所有值都是a
或b
。
如果您想查看哪些值不正确:
df[~df['column_name'].isin(['a', 'b'])]
要同时执行这两项操作,您可以将掩码保存在一个变量中:
m = df['column_name'].isin(['a', 'b'])
print(m.all())
df[~m]