如何从spark java中的StructType中获得StructType对象?



我正在开发这个spark java应用程序,我想在structtype对象中访问structtype对象。例如,

当我们取spark数据框架的schema时它看起来像这样-

root
|-- name: struct (nullable = true)
|    |-- firstname: string (nullable = true)
|    |-- middlename: string (nullable = true)
|    |-- lastname: string (nullable = true)
|-- language: string (nullable = true)
|-- fee: integer (nullable = true)

我想把名字作为结构类型获取,以便我可以进一步分析它。它会形成一种链条。但问题是,在根级别或任何级别,我们只能从structtype中提取structfield,而不能从其他structtype中提取。

StructType st = df.schema(); --> we get root level structtype
st.fields(); --> give us array of structfields but if I take name as a structfield i will lose all the fields inside it as 'name' is a StructType and I want to have it as it is.
StructType name = out of st  --> this is what I want to achieve.

您可以使用官方文档中提到的参数和方法:

schema = StructType([StructField('name', StructType([StructField('firstname', StringType()), StructField('middlename', StringType()), StructField('lastname', StringType())])), StructField('language', StringType()), StructField('fee', IntegerType())])
for f in schema.fields:
if (f.name == "name"):
print(f.dataType)
for f2 in f.dataType.fields:
print(f2.name)
[Out]:
StructType([StructField('firstname', StringType(), True), StructField('middlename', StringType(), True), StructField('lastname', StringType(), True)])
firstname
middlename
lastname

最新更新