为什么Apriori在关联规则中使用NaN进行计算?
如何从Ariori中排除NaN以使用NaN-计算关联规则
df = pd.read_csv('Online Retail.csv')
display(df.head())
df = df.groupby(['InvoiceNo'])['StockCode'].apply(','.join).reset_index()
display(df.head())
df = df.drop('InvoiceNo', 1)
display(df.head())
#***Split items list and fill none with NaN***
df = df['StockCode'].str.split(',', expand = True)
df = df.fillna(value=np.nan)
display(df.head())
print(df.shape)
records = []
for i in range(1, 25900):
records.append([str(df.values[i, j]) for j in range(0, 1114)])
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)
for i in range(0, len(association_results)):
print(association_results[i][0])
for item in association_results:
# first index of the inner list
# Contains base item and add item
pair = item[0]
items = [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])
# second index of the inner list
print("Support: " + str(item[1]))
# third index of the list located at 0th
# of the third index of the inner list
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")
在我完成计算后,我用NaN得到了这个结果。
规则:22917->nan支持:0.00640951388084482置信度:0.6974789915966386电梯:107.532385954381752
规则:22917->nan支持:0.006139233175026063置信度:0.66806672268907563电梯:106.8041549533146
规则:nan->22918支持:0.006216456233831422置信度:0.676470588235294电梯:104.0965128566395
规则:22917->nan支持:0.006177844704428743置信度:0.6639004149377594电梯:102.9602063756305
代码中的错误是:
df = df.fillna(value=np.nan)
不要用np.填充Null值。而且它还为以前的None提供None值。将它们替换为您知道没有用的任何其他值。比如0或负值。