使用Pearson的Neo4j协同过滤(CF)推荐查询



大家好,

我想了解使用Pearson的查询。

nomdenom可以是什么?

r1: r1r2: r2是什么?

我不明白r.r1.ratingr.r2.rating是什么。

这个查询应该推荐由其他用户评分的电影。

MATCH (u1:User {id: 3})-[r:RATED]->(m:Movie)
WITH u1, avg(r.rating) AS u1_mean
MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10
MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings
UNWIND ratings AS r
WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
u1, u2 WHERE denom <> 0
WITH u1, u2, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10
MATCH (u2)-[r:RATED]->(m:Movie) WHERE NOT EXISTS( (u1)-[:RATED]->(m) )
RETURN m.name, SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25

输出如下:

"m.name"│"score"│

│《西雅图夜未眠》;

25.859451877376813│││"Tunnel">

22.652532472101605│││"Beetlejuice">

22.21835919736008│││21.935357890253528│

│《亡灵黎明》;

21.421377433824798│││《赞达的囚徒》

21.225502683325033│││《天才雷普利先生》;

20.83938743140176││任何建议都会有帮助的。

Pearson的公式如下:https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample

nom只是该公式的分子,定义如下:"WITH sum((r.r1.rating-u1_mean) * (r.r2.rating-u2_mean))作为nom,">

同理,分母为分母。

我不太清楚其他两个问题,但希望这有助于!

最新更新