例如,我们有两个相同的sparkdataframe
library(SparkR)
df1 <- createDataFrame(iris)
df2 <- createDataFrame(iris)
如何检查它们是否具有相同的模式?
sdf1 <- schema(df1)
sdf2 <- schema(df2)
print(sdf1)
print(sdf2)
我们可以看到模式是相同的。
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
StructType
|-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
|-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
|-name = "Petal_Length", type = "DoubleType", nullable = TRUE
|-name = "Petal_Width", type = "DoubleType", nullable = TRUE
|-name = "Species", type = "StringType", nullable = TRUE
,
identical(sdf1, sdf2)
all.equal(sdf1, sdf2)
表示它们不相同。我们如何比较sparkdataframe的模式?
我建议使用SparkR::dtypes
来比较模式。
identical(SparkR::dtypes(df1), SparkR::dtypes(df2))
# TRUE