如何优化以下neo4j Cypher查询



我是cypher的新手,有下面的查询来查找两种源类型之间的错误(例如(。我相信从语法上看,查询看起来不错,但在只有100000个节点的数据集上运行需要1分钟。我现在还不使用关系。有人能帮助优化查询吗?谢谢

MATCH (VW_OXSS41:VW_OrderXStatusSummary4{SourceTypeID: "1"}) 
WHERE apoc.date.parse(VW_OXSS41.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))>=apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AND apoc.date.parse(VW_OXSS41.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))<=apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd'))
WITH VW_OXSS41.IdentifierValue as X
MATCH (VW_OXSS42:VW_OrderXStatusSummary4{SourceTypeID: "2"}) 
WHERE apoc.date.parse(VW_OXSS42.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))>=apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AND apoc.date.parse(VW_OXSS42.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))<=apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd'))
WITH apoc.coll.disjunction(COLLECT(X), COLLECT(VW_OXSS42.IdentifierValue)) as XX
UNWIND (XX) as YY

更新的查询和错误:-

WITH apoc.date.parse("2020-02-20",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "2"}) 
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "1"}) 
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.subtract(X,Y) AS XX
MATCH (z:VW_OrderXStatusSummary4 {SourceTypeID: "2"}) 
WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
RETURN XX AS MISMATCHES,MAX(z.TimeStamp);
Variable `a` not defined (line 10, column 7 (offset: 551))
"WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b"

像这样解决了上述错误:-

WITH apoc.date.parse("2020-02-21",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "2"}) 
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "1"}) 
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.subtract(X,Y) AS XX
WITH XX, apoc.date.parse("2020-02-20",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (z:VW_OrderXStatusSummary4 {SourceTypeID: "2"}) 
WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
AND XX = z.IdentifierValue
RETURN XX AS MISMATCHES,MAX(z.TimeStamp);

正确的预期输出为:-

+---------------------------------------------+
| MISMATCHES          | TIMESTAMP             |
+---------------------------------------------+
| "W2002201453550218" | "2020-02-21 12:00:16" |
| "W2002201453550222" | "2020-02-21 12:00:16" |
| "W2002201453550223" | "2020-02-21 09:30:36" |
| "W2002201453550224" | "2020-02-21 12:00:16" |
| "W2002201453550226" | "2020-02-21 12:00:16" |
| "W2002201453550227" | "2020-02-21 12:00:16" |
| "W2002201453550237" | "2020-02-21 12:00:16" |
| "3011WOS002978598"  | "2020-02-21 10:00:54" |
| "3011WOS002978595"  | "2020-02-21 13:00:57" |
| "0010000000006183"  | "2020-02-21 16:00:41" |
| "W2002181111547439" | "2020-02-21 04:00:34" |
| "11"                | "2020-02-21 16:00:41" |
| "10112787861P1458"  | "2020-02-21 10:00:54" |
+---------------------------------------------+

想知道是否有更好的方法?

  1. 您需要避免在两个MATCH子句的结果之间生成笛卡尔乘积。假设两个MATCH子句在各自的查询中执行时,通常会分别返回NM节点。因为您的查询以这种方式组合了这两个MATCH子句,所以您的第二个MATCH子句实际上是在执行N*M匹配(并生成N*M结果行(。

  2. 您需要确保已在:VW_OrderXStatusSummary4(SourceTypeID)上创建了索引。这将优化MATCH子句执行的查找。

  3. 您可以简化Cypher代码以避免重复的函数调用。

创建上述索引后,请尝试以下操作:

WITH apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "1"}) 
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "2"}) 
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.disjunction(X, Y) AS YY
...

在第一个WITH子句中执行COLLECT(x.IdentifierValue)操作会使其返回单个结果行(而不是N结果行(中的所有x节点。这允许第二个MATCH避免笛卡尔乘积问题。

最新更新