在火花结构化流中,我想从字符串中创建一个structtype。
在下面的示例中,Spark Read方法仅接受架构的"结构类型",如何从字符串中创建一个structType。我想将雇员琴弦转换为structtype。
public static void main(String[] args) throws AnalysisException {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(EmployeeSchemaLoader.class.getName())
.master(master).getOrCreate();
String employeeSchema = "StructType(n" +
"StructField(firstName,StringType,true),n" +
"StructField(lastName,StringType,true),n" +
"StructField(addresses,n" +
"ArrayType(n" +
"StructType(n" +
"StructField(city,StringType,true), n" +
"StructField(state,StringType,true)n" +
"),n" +
"true),n" +
"true) n" +
")";
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> employeeDataset = sparkSession.read()
//.schema(employeeSchema) // Accepts only Struct Type
.json("simple_employees.json");
employeeDataset.printSchema();
employeeDataset.createOrReplaceTempView("employeeView");
sparkSession.catalog().listTables().show();
sqlCtx.sql("select * from employeeView").show();
我不确定为什么要这样做。与其使员工Chema成为字符串,为什么不使其成为结构型呢?这样:
StructType employeeSchema = StructType(
StructField(firstName,StringType,true),
StructField(lastName,StringType,true),
StructField(addresses, ArrayType(StructType(
StructField(city,StringType,true),
StructField(state,StringType,true)
), true), true)
from pyspark.sql.types import StructType
schema = inputdf.schema
print(type(inputdf.schema))
# just to display all methods available on schema
print(dir(schema))
new_schema = StructType.fromJson(schema.jsonValue())
print(type(new_schema))