我正在尝试遵循以下示例 https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html
但是,当更改某些标准时,结果与预期不符。 请参阅以下步骤 -
从 functools 导入 reduce 从 pyspark.sql.functions import col, lit, when 从图形框导入 *
vertices = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 36),
("g", "Gabby", 60)], ["id", "name", "age"])
edges = sqlContext.createDataFrame([
("a", "b", "follow"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "follow"),
("d", "a", "follow"),
("a", "e", "follow")
], ["src", "dst", "relationship"])
g = GraphFrame(vertices, edges)
现在我在"关系"列中做了一个更改,所有值都是"关注"而不是朋友。
现在下面的查询运行良好 -
g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 32", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+---------------+--------------+----------------+
| from| e0| v1| e1| to|
+--------------+--------------+---------------+--------------+----------------+
|[a, Alice, 34]|[a, e, follow]|[e, Esther, 32]|[e, d, follow]| [d, David, 29]|
|[a, Alice, 34]|[a, b, follow]| [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|
+--------------+--------------+---------------+--------------+----------------+
但是如果我将过滤条件从 32 更改为 40,则会获取错误的结果 -
>>> g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 35", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+
| from| to|
+--------------+--------------+
|[a, Alice, 34]|[a, Alice, 34]|
+--------------+--------------+
理想情况下,它应该从第一个查询中获取类似的结果,因为所有行的过滤条件仍然得到满足。
这背后有什么解释吗?
bfs(( 搜索符合谓词的第一个结果。爱丽丝年龄是 34 岁,它满足toExpr = "age < 35"
谓词,所以你从爱丽丝开始得到零长度路径。请更改为 Expr 以获得更具体的内容。例如toExpr ="name = 'David' or name = 'Charlie'"
应该为您提供与第一个查询完全相同的结果。