使用 java 删除 apache spark 中的行



需要删除该数据集中的第二行 我是 Apache Spark 的新手,任何人都可以帮助我解决。下面是代码:

  public class DeleteRow {
         public static void main(String[] args) {
          System.setProperty("hadoop.home.dir", "C:\winutils");
          JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName("JoinFunctions").setMaster("local[*]"));
          SQLContext sqlContext = new SQLContext(sc);
          SparkSession spark = SparkSession.builder().appName("JavaTokenizerExample").getOrCreate();
          List<Row> data = Arrays.asList(
            RowFactory.create(1,"Hi I heard about Spark"),
            RowFactory.create(2,"I wish Java could use case classes"),
            RowFactory.create(3,"Logistic,regression,models,are,neat"));
          StructType schema = new StructType(new StructField[] {
            new StructField("label", DataTypes.IntegerType, false,
              Metadata.empty()),
          new StructField("sentence", DataTypes.StringType, false,
            Metadata.empty()) });
            String ins  = data.get(1).toString();
            System.out.println(ins);

          Dataset<Row> sentenceDataFrame = spark.createDataFrame(data, schema);
        sentenceDataFrame.drop(data.get(1).toString());

任何帮助表示赞赏。

使用 FilterFunction。

我写了一个JUnit测试来帮助你。

@Test
public void sampleDeleteRowTest() throws Exception {
    List<Row> data = Arrays.asList(
            RowFactory.create(1, "Hi I heard about Spark"),
            RowFactory.create(2, "I wish Java could use case classes"),
            RowFactory.create(3, "Logistic,regression,models,are,neat"));
    StructType schema = new StructType(new StructField[]{
            new StructField("label", DataTypes.IntegerType, false,
                    Metadata.empty()),
            new StructField("sentence", DataTypes.StringType, false,
                    Metadata.empty())});
    String ins = data.get(1).toString();
    System.out.println(ins);

    Dataset<Row> sentenceDataFrame = spark.createDataFrame(data, schema);
    long size = sentenceDataFrame.count();
    assertTrue("SentenceDataFrame Size = " + sentenceDataFrame.count(), size == 3);
    sentenceDataFrame = sentenceDataFrame.filter(new FilterFunction<Row>() {
        @Override
        public boolean call(Row row) throws Exception {
            Integer label = row.getInt(0);
            return label != 2;
        }
    });
    size = sentenceDataFrame.count();
    assertTrue("SentenceDataFrame Size = " + sentenceDataFrame.count(), size == 2);
}

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(org.apache.spark.api.java.function.FilterFunction)

https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/api/java/function/FilterFunction.html

或者,您可以使用 Lambda 函数来实现相同的所需结果:

sentenceDataFrame = sentenceDataFrame.filter((FilterFunction<Row>) r -> (r).getInt(0) != 2);

相关内容

  • 没有找到相关文章

最新更新