如何在获取数据集模式时忽略nan



我试图忽略数据集中的nan,但不确定在查找多模式数据的函数中放什么以及放在哪里。现在,如果一行有大多数nan值,它会说模式是nan,这不是真的。我怎么能忽略它?

import pandas as pd
#example data
df = {'A':'nan','B':'nan','C':'Blue', 'D':'nan','E':'Blue', 'Index':[0]}
df = pd.DataFrame(df).set_index('Index')
def find_mode(x):
if len(x) > 1: #
#Creates dictionary of values in x and their count
d = {}
for value in x:
if value not in d:
d[value] = 1
else:
d[value] += 1
if len(d) == 1:
return [value]
else:
# Finds most common value
i = 0
for value in d:
if i < d[value]:
i = d[value]
# All values with greatest number of occurrences can be a mode if:
# other values with less number of occurrences exist
modes = []
counter = 0
for value in d:
if d[value] == i:
mode = (value, i)
modes.append(mode)
counter += mode[1] # Create the counter that sums the number of most common occurrences
# Example [1, 2, 2, 3, 3]
# 2 appears twice, 3 appears twice, [2, 3] are a mode
# because sum of counter for them: 2+2 != 5
if counter != len(x):
return [mode[0] for mode in modes]
else:
return 'NA'
else:
return x
mode = []
for x in df.itertuples(index = True):
m = find_mode(x)
mode.append(m)

它看起来像是Panda可以本地处理的复杂代码:

# ensure having real NAs
df = df.replace('nan', pd.NA)
# get mode per row
out = df.mode(axis=1)[0]

输出:

Index
0    Blue
dtype: object

最新更新