r语言 - SparkR:比较sparkdataframe的模式



例如,我们有两个相同的sparkdataframe

library(SparkR)
df1 <- createDataFrame(iris)
df2 <- createDataFrame(iris)

如何检查它们是否具有相同的模式?

sdf1 <- schema(df1)
sdf2 <- schema(df2)
print(sdf1)
print(sdf2)

我们可以看到模式是相同的。

StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE

,

identical(sdf1, sdf2)
all.equal(sdf1, sdf2)

表示它们不相同。我们如何比较sparkdataframe的模式?

我建议使用SparkR::dtypes来比较模式。

identical(SparkR::dtypes(df1), SparkR::dtypes(df2))
# TRUE

最新更新