我有一个用python编写的项目,使用了pypspark和dagster。我们使用Sphinx构建文档,使用napoleon解析谷歌风格的文档字符串。我们已经开始包括预包装的dagger固体,如下所示:
@solid(
config_schema={
"join_key": String,
"join_style": String,
"df1_name": String,
"df2_name": String,
}
)
def join_two_dfs_solid(
context, df1: SparkDataFrame, df2: SparkDataFrame
) -> SparkDataFrame:
"""
Solid to join two DataFrames on the sepcified key.
Args:
context (dict): Dagster Context Dict
df1 (SparkDataFrame): Spark DataFrame with the same schema
df2 (SparkDataFrame): Spark DataFrame with the same schema
Config Parameters:
join_key (str): name of column to join on. Specified column must exist in both columns.
join_style (str): spark join style, e.g., "left", "inner", "outer", etc.; default is "inner"
df1_name (str): alias name for the first dataframe.
df2_name (str): alias name for the second dataframe.
Returns:
DataFrame
"""
key = context.solid_config["join_key"]
join_style = context.solid_config.get("join_style", "inner")
df1_name = context.solid_config["df1_name"]
df2_name = context.solid_config["df2_name"]
context.log.info(f"Running join of two dataframes on {key}")
check_required_columns(df1, [key])
check_required_columns(df2, [key])
output = df1.alias(df1_name).join(
df2.alias(df2_name),
sf.col(f"{df1_name}.{key}") == sf.col(f"{df2_name}.{key}"),
how=join_style,
)
return output
当我们使用sphinx-apidoc进行构建时,我可以通过检查join_two_dfs_solid.__doc__
看到该函数的docstring存在,并且dagster附加的join_two_dfs_solid._description
字段为空,这应该意味着它使用了docstring。然而,当sphinx文档构建时,我会为包含此实体的模块获得一个空白的.rst文件。有人知道斯芬克斯或实体中是否有其他配置设置需要更改才能正确构建吗?
这是团队目前意识到的一个悬而未决的问题https://github.com/dagster-io/dagster/issues/2427
如果你还在寻找解决方案(一年后,我知道(,我刚刚在dagstd库中添加了一个Sphinx插件来解决这个问题。
您只需要安装库并将其添加到conf.py
文件中即可:
extensions = [
...
'dagstd.sphinx.parser',
]
默认情况下,这将在所有操作文档前面加上(op)
。更改这个,将以下内容添加到您的conf.py
文件中:
dagstd_op_prefix = 'My Prefix'