这是我的 dfMainOutput 的 Spark 数据帧输出。
4295858898,177,SelfSourcedPublic,INC,Cost of sales,Umsatzkosten,,ECOR,false,,,,,false,False,,,,505096,505074,505074,505096,505096,,505074,False,,3014830,,I|!|
现在我想用|^|
替换,
并删除一列数据分区
这就是我正在做的:
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))
val headerColumn = df.columns.filter(v => (!v.contains("^") && !v.contains("_c"))).toSeq
val header = headerColumn.dropRight(1).mkString("", "|^|", "|!|")
val dfMainOutputFinalWithoutNull = dfMainOutputFinal.withColumn("concatenated", regexp_replace(col("concatenated"), "null", "")).withColumnRenamed("concatenated", header)
dfMainOutputFinalWithoutNull.repartition(1).write.partitionBy("DataPartition","StatementTypeCode")
.format("csv")
.option("nullValue", "")
.option("header", "true")
.option("codec", "gzip")
.save("s3://trfsmallfffile/FinancialLineItem/output")
使用此代码生成以下输出:
4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^|ECOR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3014830|^|I|!|
缺少空元素的地方。我希望它是:
4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^||^|ECOR|^|False|^||^||^||^||^|False|^|False|^||^||^||^|505096|^|505074|^|505074|^|505096|^|505096|^||^|505074|^|False|^||^|3014830|^||^|I|!|
同样在数据框输出中,我得到了我们想要false
False
请帮助我缺少什么..
这是我的架构
root
|-- LineItem_organizationId: long (nullable = true)
|-- LineItem_lineItemId: integer (nullable = true)
|-- DataPartition: string (nullable = true)
|-- StatementTypeCode: string (nullable = true)
|-- LineItemName: string (nullable = true)
|-- LocalLanguageLabel: string (nullable = true)
|-- FinancialConceptLocal: string (nullable = true)
|-- FinancialConceptGlobal: string (nullable = true)
|-- IsDimensional: boolean (nullable = true)
|-- InstrumentId: string (nullable = true)
|-- LineItemSequence: string (nullable = true)
|-- PhysicalMeasureId: string (nullable = true)
|-- FinancialConceptCodeGlobalSecondary: string (nullable = true)
|-- IsRangeAllowed: boolean (nullable = true)
|-- IsSegmentedByOrigin: string (nullable = true)
|-- SegmentGroupDescription: string (nullable = true)
|-- SegmentChildDescription: string (nullable = true)
|-- SegmentChildLocalLanguageLabel: string (nullable = true)
|-- LocalLanguageLabel_languageId: string (nullable = true)
|-- LineItemName_languageId: string (nullable = true)
|-- SegmentChildDescription_languageId: string (nullable = true)
|-- SegmentChildLocalLanguageLabel_languageId: string (nullable = true)
|-- SegmentGroupDescription_languageId: string (nullable = true)
|-- SegmentMultipleFundbDescription: string (nullable = true)
|-- SegmentMultipleFundbDescription_languageId: string (nullable = true)
|-- IsCredit: string (nullable = true)
|-- FinancialConceptLocalId: string (nullable = true)
|-- FinancialConceptGlobalId: string (nullable = true)
|-- FinancialConceptCodeGlobalSecondaryId: string (nullable = true)
|-- FFAction: string (nullable = true)
在数据DataPartition=SelfSourcePublic and StatementTypeCode=INC
来自 dfMaainOutput 的输出
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|LineItem_organizationId|LineItem_lineItemId|DataPartition |StatementTypeCode|LineItemName |LocalLanguageLabel |FinancialConceptLocal|FinancialConceptGlobal|IsDimensional|InstrumentId|LineItemSequence|PhysicalMeasureId|FinancialConceptCodeGlobalSecondary|IsRangeAllowed|IsSegmentedByOrigin|SegmentGroupDescription|SegmentChildDescription|SegmentChildLocalLanguageLabel|LocalLanguageLabel_languageId|LineItemName_languageId|SegmentChildDescription_languageId|SegmentChildLocalLanguageLabel_languageId|SegmentGroupDescription_languageId|SegmentMultipleFundbDescription|SegmentMultipleFundbDescription_languageId|IsCredit|FinancialConceptLocalId|FinancialConceptGlobalId|FinancialConceptCodeGlobalSecondaryId|FFAction|
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|4295858898 |707 |SelfSourcedPublic|INC |Revenue from long-term construction contracts |Erlöse aus langfristigen Fertigungsaufträgen |null |ROBR |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3015278 |null |I|!| |
|4295858898 |3289 |SelfSourcedPublic|INC |Balancing Item - Net Income available to Controlling Interest |null |null |IIII |false |null |null |null |null |false |null |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3014960 |null |I|!| |
|4295858922 |808 |SelfSourcedPublic|INC |Income Taxes - Total |Ertragsteuern |null |XTAX |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |False |null |3019589 |null |I|!| |
|4295858922 |1507 |SelfSourcedPublic|INC |Balancing Item - Operating Expenses |null |null |IIII |false |null |null |null |null |false |null |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3014960 |null |I|!| |
|4295858951 |1574 |SelfSourcedPublic|INC |Admin/General Expenses |null |null |ESGA |false |null |null |null |null |false |False |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |False |null |3018991 |null |I|!| |
|4295859007 |1645 |SelfSourcedPublic|INC |Exploration Expenses - Balancing value |null |null |EEXP |false |null |null |null |null |false |null |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |False |null |3018916 |null |I|!| |
|4295859038 |954 |SelfSourcedPublic|INC |Sale Investments |null |null |EGFA |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3018929 |null |I|!| |
|4295859038 |1967 |SelfSourcedPublic|INC |Restructuring Charges/Provisions |Ergebnis aus Umstrukturierungen |null |ERES |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |False |null |3018980 |null |I|!| |
|4295859038 |1996 |SelfSourcedPublic|INC |Diluted Weighted Average Shares on Instrument Level multiplied to its Participation Factor|null |null |DWASEPFI |false |8590926849 |null |null |null |false |null |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |1002023919 |null |I|!| |
|4295859045 |864 |SelfSourcedPublic|INC |Results of valuation gains/losses and disposals of non-current securities |Ergebnis aus Kursänderungen und Abgängen von Wertpapieren des langfristigen Finanzvermögens („@FVTPL“)|null |EGIT |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3018932 |null |I|!| |
|4295859045 |1092 |SelfSourcedPublic|INC |Excep. Depreciation |null |null |EGLO |false |null |null |null |null |false |False |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3018938 |null |I|!| |
|4295859071 |1840 |SelfSourcedPublic|INC |Other Operating Expense |null |null |EOOE |false |null |null |null |null |false |False |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |False |null |3018974 |null |I|!| |
|4295859078 |914 |SelfSourcedPublic|INC |Balancing Item - Non Operating Income/(Expense), net |null |null |IIII |false |null |null |null |null |false |null |null |null |null |505096 |505074 |505074 |505096 |505096 |null |505074 |True |null |3014960 |null |I|!| |
|4295859106 |514 |SelfSourcedPublic|INC |Personnel Expenses |null |null |ELAS |false |null |null |null |null |false |False |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |False |null |3018944 |null |I|!| |
|4295859106 |903 |SelfSourcedPublic|INC |Balancing Item - Non Operating Income/(Expense), net |null |null |IIII |false |null |null |null |null |false |null |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |True |null |3014960 |null |I|!| |
|4295859216 |499 |SelfSourcedPublic|INC |BC - Depreciation of Fixed Assets |null |null |BCDEP |false |null |null |null |null |false |null |null |null |null |505084 |505074 |505074 |505084 |505084 |null |505074 |False |null |1002023928 |null |I|!| |
|4295859236 |172 |SelfSourcedPublic|INC |Total Revenue |Ventes |null |XTLR |false |null |null |null |null |false |False |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |True |null |3016345 |null |I|!| |
|4295859241 |492 |SelfSourcedPublic|INC |Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level) |null |null |XNCNDI |false |8589989623 |null |null |null |false |null |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |True |null |1001214357 |null |I|!| |
|4295859279 |124 |SelfSourcedPublic|INC |Income Available to Com Excl ExtraOrd |Toerekenbaar aan de aandeelhouders van de moederonderneming |null |XNCN |false |null |null |null |null |false |False |null |null |null |505084 |505074 |505074 |505084 |505084 |null |505074 |True |null |3016316 |null |I|!| |
|4295859298 |488 |SelfSourcedPublic|INC |Other operating income/expenses |Other operating expenses |null |EOIE |false |null |null |null |null |false |null |null |null |null |505074 |505074 |505074 |505074 |505074 |null |505074 |True |null |3018969 |null |I|!| |
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
代码后输出
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))
这是输出
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|DataPartition |StatementTypeCode|concatenated |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SelfSourcedPublic|INC |4295858898|^|707|^|INC|^|Revenue from long-term construction contracts|^|Erlöse aus langfristigen Fertigungsaufträgen|^|ROBR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3015278|^|I|!| |
|SelfSourcedPublic|INC |4295858898|^|3289|^|INC|^|Balancing Item - Net Income available to Controlling Interest|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!| |
|SelfSourcedPublic|INC |4295858922|^|808|^|INC|^|Income Taxes - Total|^|Ertragsteuern|^|XTAX|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3019589|^|I|!| |
|SelfSourcedPublic|INC |4295858922|^|1507|^|INC|^|Balancing Item - Operating Expenses|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!| |
|SelfSourcedPublic|INC |4295859236|^|172|^|INC|^|Total Revenue |^|Ventes|^|XTLR|^|false|^|false|^|False|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3016345|^|I|!| |
|SelfSourcedPublic|INC |4295859241|^|492|^|INC|^|Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level) |^|XNCNDI|^|false|^|8589989623|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|1001214357|^|I|!| |
|SelfSourcedPublic|INC |4295859279|^|124|^|INC|^|Income Available to Com Excl ExtraOrd|^|Toerekenbaar aan de aandeelhouders van de moederonderneming|^|XNCN|^|false|^|false|^|False|^|505084|^|505074|^|505074|^|505084|^|505084|^|505074|^|True|^|3016316|^|I|!| |
|SelfSourcedPublic|INC |4295859298|^|488|^|INC|^|Other operating income/expenses|^|Other operating expenses|^|EOIE|^|false|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3018969|^|I|!| |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
因此,例如,请明确4295858898 |3289
列中的空值LocalLanguageLabel
消失了
我不知道它是如何错过的...
您的罪魁祸首是您的dataframe
中有null
值,并且concat_ws
过滤掉所有空值。因此,解决方案是将所有null
值替换为""
,这应该可以解决您的问题。这不会有问题,因为您将架构中的所有null
数据类型都string
。
因此,替换以下内容
val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))
跟
val dfMainOutputFinal = dfMainOutput.na.fill("").select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))
应该解决您的问题