嗨,我有一个带有列CodeArticle的数据框
|CODEARTICLE| STRUCTURE| DES|TYPEMARK|TYP|IMPLOC|MARQUE|GAMME|TAR|
+-----------+-------------+--------------------+--------+---+------+------+-----+---+
| GENCFFRIST|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFMARC|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFESCO|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFTNA|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| GENCFFEMBA|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| 789600010|9999999999998|xxxxxxxxxxxxxxxxx...| 7| 1| Local| | | |
| 799700040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 799701000|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 899980490|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 9| Local| | | |
| 429600010|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 559970040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 0| Local| | | |
| 679500010|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 679500040|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
| 679500060|9999999999998|xxxxxxxxxxxxxxxxx...| 0| 1| Local| | | |
+-----------+-------------+--------------------+--------+---+------+------+-----+---+
我只想拿一个数字codearticler的行 //连接到表TMP_structure oracle
val spark = sparkSession.sqlContext
val articles_Gold = spark.load("jdbc",
Map("url" -> "jdbc:oracle:thin:System/maher@//localhost:1521/XE",
"dbtable" -> "IPTECH.TMP_ARTICLE")).select("CODEARTICLE", "STRUCTURE", "DES", "TYPEMARK", "TYP", "IMPLOC", "MARQUE", "GAMME", "TAR")
val filteredData =articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).filter($"test"!==null)
非常感谢
使用 na.drop
:
articles_Gold.withColumn("test",'CODEARTICLE.cast(IntegerType)).na.drop("test")
您可以在filter
函数的列上使用.isNotNull
功能。您甚至不需要为逻辑创建另一列。您可以简单地执行以下
val filteredData = articles_Gold.withColumn("CODEARTICLE",'CODEARTICLE.cast(IntegerType)).filter('CODEARTICLE.isNotNull)
我希望答案有帮助