需要删除该数据集中的第二行 我是 Apache Spark 的新手,任何人都可以帮助我解决。下面是代码:
public class DeleteRow {
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "C:\winutils");
JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName("JoinFunctions").setMaster("local[*]"));
SQLContext sqlContext = new SQLContext(sc);
SparkSession spark = SparkSession.builder().appName("JavaTokenizerExample").getOrCreate();
List<Row> data = Arrays.asList(
RowFactory.create(1,"Hi I heard about Spark"),
RowFactory.create(2,"I wish Java could use case classes"),
RowFactory.create(3,"Logistic,regression,models,are,neat"));
StructType schema = new StructType(new StructField[] {
new StructField("label", DataTypes.IntegerType, false,
Metadata.empty()),
new StructField("sentence", DataTypes.StringType, false,
Metadata.empty()) });
String ins = data.get(1).toString();
System.out.println(ins);
Dataset<Row> sentenceDataFrame = spark.createDataFrame(data, schema);
sentenceDataFrame.drop(data.get(1).toString());
任何帮助表示赞赏。
使用 FilterFunction。
我写了一个JUnit测试来帮助你。
@Test
public void sampleDeleteRowTest() throws Exception {
List<Row> data = Arrays.asList(
RowFactory.create(1, "Hi I heard about Spark"),
RowFactory.create(2, "I wish Java could use case classes"),
RowFactory.create(3, "Logistic,regression,models,are,neat"));
StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.IntegerType, false,
Metadata.empty()),
new StructField("sentence", DataTypes.StringType, false,
Metadata.empty())});
String ins = data.get(1).toString();
System.out.println(ins);
Dataset<Row> sentenceDataFrame = spark.createDataFrame(data, schema);
long size = sentenceDataFrame.count();
assertTrue("SentenceDataFrame Size = " + sentenceDataFrame.count(), size == 3);
sentenceDataFrame = sentenceDataFrame.filter(new FilterFunction<Row>() {
@Override
public boolean call(Row row) throws Exception {
Integer label = row.getInt(0);
return label != 2;
}
});
size = sentenceDataFrame.count();
assertTrue("SentenceDataFrame Size = " + sentenceDataFrame.count(), size == 2);
}
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(org.apache.spark.api.java.function.FilterFunction)
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/api/java/function/FilterFunction.html
或者,您可以使用 Lambda 函数来实现相同的所需结果:
sentenceDataFrame = sentenceDataFrame.filter((FilterFunction<Row>) r -> (r).getInt(0) != 2);