从Spark流中的字符串创建一个结构类型



在火花结构化流中,我想从字符串中创建一个structtype。

在下面的示例中,Spark Read方法仅接受架构的"结构类型",如何从字符串中创建一个structType。我想将雇员琴弦转换为structtype。

public static void main(String[] args) throws AnalysisException {
    String master = "local[*]";
    SparkSession sparkSession = SparkSession
            .builder().appName(EmployeeSchemaLoader.class.getName())
            .master(master).getOrCreate();
    String employeeSchema = "StructType(n" +
            "StructField(firstName,StringType,true),n" +
            "StructField(lastName,StringType,true),n" +
            "StructField(addresses,n" +
            "ArrayType(n" +
            "StructType(n" +
            "StructField(city,StringType,true), n" +
            "StructField(state,StringType,true)n" +
            "),n" +
            "true),n" +
            "true) n" +
            ")";
    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");
    SQLContext sqlCtx = sparkSession.sqlContext();
    Dataset<Row> employeeDataset = sparkSession.read()
            //.schema(employeeSchema)  // Accepts only Struct Type
            .json("simple_employees.json");
    employeeDataset.printSchema();
    employeeDataset.createOrReplaceTempView("employeeView");
    sparkSession.catalog().listTables().show();
    sqlCtx.sql("select * from employeeView").show();

我不确定为什么要这样做。与其使员工Chema成为字符串,为什么不使其成为结构型呢?这样:

StructType employeeSchema = StructType(
    StructField(firstName,StringType,true),
    StructField(lastName,StringType,true),
    StructField(addresses, ArrayType(StructType(
            StructField(city,StringType,true), 
            StructField(state,StringType,true)
    ), true), true) 
from pyspark.sql.types import StructType
schema = inputdf.schema
print(type(inputdf.schema))
# just to display all methods available on schema
print(dir(schema))
new_schema = StructType.fromJson(schema.jsonValue())
print(type(new_schema))

相关内容

  • 没有找到相关文章

最新更新