如何将Sparks MatrixFactorizationModel推荐ProductsForUsers保存到Hbase



我是spark的新手,我想将recommendanceProductsForUsers的输出保存到Hbase表中。我找到了一个例子(https://sparkkb.wordpress.com/2015/05/04/save-javardd-to-hbase-using-saveasnewapihadoopdataset-spark-api-java-coding/)显示使用JavaPairRDD和saveAsNewAPIHadoopDataset进行保存。

如何将JavaRDD<Tuple2<Object, Rating[]>>转换为JavaPairRDD<ImmutableBytesWritable, Put>以便使用saveAsNewAPIHadoopDataset?

//Loads the data from hdfs
    MatrixFactorizationModel sameModel = MatrixFactorizationModel.load(jsc.sc(), trainedDataPath);  
//Get recommendations for all users
    JavaRDD<Tuple2<Object, Rating[]>> ratings3 = sameModel.recommendProductsForUsers(noOfProductsToReturn).toJavaRDD();

通过使用mapToPair。来自您提供的相同来源的示例(我手动更改了类型):

JavaPairRDD<ImmutableBytesWritable, Put> hbasePuts = javaRDD.mapToPair(
  new PairFunction<Tuple2<Object, Rating[]>, ImmutableBytesWritable, Put>() {
@Override
public Tuple2<ImmutableBytesWritable, Put> call(Tuple2<Object, Rating[]> row) throws Exception {
   Put put = new Put(Bytes.toBytes(row.getString(0)));
   put.add(Bytes.toBytes("columFamily"), Bytes.toBytes("columnQualifier1"), Bytes.toBytes(row.getString(1)));
   put.add(Bytes.toBytes("columFamily"), Bytes.toBytes("columnQualifier2"), Bytes.toBytes(row.getString(2)));
       return new Tuple2<ImmutableBytesWritable, Put>(new ImmutableBytesWritable(), put);     
}
 });

它是这样的,您创建一个put的新实例,在构造函数中为它提供行键,然后为您调用的每一列添加。然后返回创建的看跌期权。

这就是我解决上述问题的方法,希望这对某人有所帮助。

    JavaPairRDD<ImmutableBytesWritable, Put> hbasePuts1 = ratings3
                    .mapToPair(new PairFunction<Tuple2<Object, Rating[]>, ImmutableBytesWritable, Put>() {
                        @Override
                        public Tuple2<ImmutableBytesWritable, Put> call(Tuple2<Object, Rating[]> arg0)
                                throws Exception {
                            Rating[] userAndProducts = arg0._2;
                            System.out.println("***********" + userAndProducts.length + "**************");
                            List<Item> items = new ArrayList<Item>();
                            Put put = null
                            String recommendedProduct = "";                         
                            for (Rating r : userAndProducts) {  
//Some logic here to convert Ratings into appropriate put command
// recommendedProduct = r.product; 
}
                            put.addColumn(Bytes.toBytes("recommendation"), Bytes.toBytes("product"),Bytes.toBytes(recommendedProduct));                     Bytes.toBytes("product"),Bytes.toBytes(response.getItems().toString()));
                            return new Tuple2<ImmutableBytesWritable, Put>(new ImmutableBytesWritable(), put);
                        }
                    });
            System.out.println("*********** Number of records in JavaPairRdd: "+ hbasePuts1.count() +"**************");
            hbasePuts1.saveAsNewAPIHadoopDataset(newApiJobConfig.getConfiguration());
            jsc.stop();         

我们只是开源的拼接机,我们有将MLIB与查询和存储集成到拼接机中的示例。我不知道这是否会有帮助,但我想我会让你知道。

http://community.splicemachine.com/use-spark-libraries-splice-machine/

谢谢你的帖子,非常酷。

相关内容

  • 没有找到相关文章