火花图问题



我正在尝试遵循以下示例 https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html

但是,当更改某些标准时,结果与预期不符。 请参阅以下步骤 -

从 functools 导入 reduce 从 pyspark.sql.functions import col, lit, when 从图形框导入 *

vertices = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)], ["id", "name", "age"])
edges = sqlContext.createDataFrame([
("a", "b", "follow"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "follow"),
("d", "a", "follow"),
("a", "e", "follow")
], ["src", "dst", "relationship"])
g = GraphFrame(vertices, edges)

现在我在"关系"列中做了一个更改,所有值都是"关注"而不是朋友。

现在下面的查询运行良好 -

g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 32", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+---------------+--------------+----------------+
|          from|            e0|             v1|            e1|              to|
+--------------+--------------+---------------+--------------+----------------+
|[a, Alice, 34]|[a, e, follow]|[e, Esther, 32]|[e, d, follow]|  [d, David, 29]|
|[a, Alice, 34]|[a, b, follow]|   [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|
+--------------+--------------+---------------+--------------+----------------+

但是如果我将过滤条件从 32 更改为 40,则会获取错误的结果 -

>>> g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 35", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+
|          from|            to|
+--------------+--------------+
|[a, Alice, 34]|[a, Alice, 34]|
+--------------+--------------+

理想情况下,它应该从第一个查询中获取类似的结果,因为所有行的过滤条件仍然得到满足。

这背后有什么解释吗?

bfs(( 搜索符合谓词的第一个结果。爱丽丝年龄是 34 岁,它满足toExpr = "age < 35"谓词,所以你从爱丽丝开始得到零长度路径。请更改为 Expr 以获得更具体的内容。例如toExpr ="name = 'David' or name = 'Charlie'"应该为您提供与第一个查询完全相同的结果。

相关内容

  • 没有找到相关文章

最新更新