无法在检查所有节点属性值的相似性的 Cypher 查询中应用模糊距离函数?



我想找到所有的三元组,其中主节点包含在它的属性之一的一些值,使用一些模糊的相似性函数和过滤结果高于一些预定义的阈值,说85%。做这件事的最佳实践是什么?这是我的初始查询:

MATCH (n)-[r]->(k) WHERE ANY(x in keys(n) WHERE round(apoc.text.levenshteinSimilarity(n[x], "syn"), 4) > 0.8) RETURN n, r, k
在上面的查询之前,我已经使用了更简单的方法(regex):
MATCH (n)-[r]->(k) WHERE ANY(x in keys(n) WHERE n[x] =~ '(i?){search_expression}.*') RETURN n, r, k

但是当我使用第一个更高级的查询时,由于某种原因,我得到:

Wrong argument type: Can't coerce `Long(1662902792106)` to String

当我运行以下查询时:

MATCH (n)-[r]->(k) WHERE ANY(x in keys(n) WHERE round(apoc.text.levenshteinSimilarity(toString(n[x]), "syn"), 4) > 0.8) RETURN n, r, k

输出为:

Invalid input for function 'toString()': Expected a String, Number, Boolean, Temporal or Duration, got: StringArray[ecr:PutImageTagMutability, ecr:StartImageScan, ecr:DescribeImageReplicationStatus, ecr:ListTagsForResource, ecr:UploadLayerPart, ecr:BatchDeleteImage, ecr:CreatePullThroughCacheRule, ecr:ListImages, ecr:BatchGetRepositoryScanningConfiguration, ecr:DeleteRepository, ecr:GetRegistryScanningConfiguration, ecr:CompleteLayerUpload, ecr:TagResource, ecr:DescribeRepositories, ecr:BatchCheckLayerAvailability, ecr:ReplicateImage, ecr:GetLifecyclePolicy, ecr:GetRegistryPolicy, ecr:PutLifecyclePolicy, ecr:DescribeImageScanFindings, ecr:GetLifecyclePolicyPreview, ecr:CreateRepository, ecr:DescribeRegistry, ecr:PutImageScanningConfiguration, ecr:GetDownloadUrlForLayer, ecr:DescribePullThroughCacheRules, ecr:GetAuthorizationToken, ecr:PutRegistryScanningConfiguration, ecr:DeletePullThroughCacheRule, ecr:DeleteLifecyclePolicy, ecr:PutImage, ecr:BatchImportUpstreamImage, ecr:UntagResource, ecr:BatchGetImage, ecr:DescribeImages, ecr:StartLifecyclePolicyPreview, ecr:InitiateLayerUpload, ecr:GetRepositoryPolicy, ecr:PutReplicationConfiguration]

请建议。

您需要查看n[x]的值,其中x是节点n的属性。因此,n[x]可以是整数,浮点数,字符串,布尔值,点,日期,时间,LocalTime, DateTime, LocalDateTime, Duration或简单类型的同构列表。在list上使用toString()函数转换字符串数组失败。

因此,您需要考虑一个函数来将任何数据类型转换为字符串,以便您可以在此节点属性n[x]上应用apoc.text.levenshteinSimilarity函数。

MATCH (n)-[r]->(k) 
WHERE ANY(x in keys(n) 
WHERE round(apoc.text.levenshteinSimilarity(
TRIM(
REDUCE(mergedString = "", item in n[x] 
| mergedString + item + " ")), "syn"), 4) 
> 0.8) 
RETURN n, r, k

其中reduce函数会将列表(或数组)的每一项连接成一个字符串,并且trim会删除该字符串末尾的额外空间。

参考:https://neo4j.com/docs/cypher-manual/current/syntax/values/https://neo4j.com/docs/cypher-manual/current/functions/list/#functions-reducehttps://neo4j.com/docs/cypher-manual/current/functions/string/functions-trim

最新更新