将Cassandra表复制到Hive



我已经被这个问题困扰了好几天了。因此,任何帮助都将不胜感激。

我正在尝试将cassandra表复制到hive(这样我就可以将其放入hive元存储中,然后从Tableau访问它)。Hive->Tableau部分有效,但Cassandra到Hive部分无效。数据未复制到配置单元元存储。

以下是我采取的步骤:

我遵循了本项目自述中的说明:https://github.com/tuplejump/cash/tree/master/cassandra-handler

我生成了蜂巢卡桑德拉.jar,将其和cassandra all-.jar、cassandra srift-*.jar复制到hive-lib文件夹。

然后我启动了hive并尝试了以下操作:

hive> add jar /usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar;
Added [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar] to class path
Added resources: [/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar]
hive> list jars;
/usr/lib/hive/apache-hive-1.1.0/lib/hive-cassandra-1.2.6.jar
hive> create temporary function tmp as 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
    > ;
FAILED: Class org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler not found

我不知道为什么hive看不到CqlStorageHandler。。。

谢谢!

您可以考虑的另一种选择是编写一个简单的java程序,将数据写入一个文件,然后将该文件加载到hive。

package com.company.cassandra;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Cluster.Builder;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.ResultSetFuture;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
public class CassandraExport {
    public static Session session;

    public static void connect(String username, String password, String host, int port, String keyspace) {
        Builder builder =  Cluster.builder().addContactPoint(host);
        builder.withPort(port);
        if (username != null && password != null) {
            builder.withCredentials(username, password);
        }
        Cluster cluster = builder.build();
        session = cluster.connect(keyspace);
    }
    public static void main(String[] args) {
        //Prod
        connect("user", "password", "server", 9042, "keyspace");
        ResultSetFuture future = session.executeAsync("SELECT * FROM table;");
        ResultSet results = future.getUninterruptibly();
        for (Row row : results) {
            //Print the columns in the following order
            String out = row.getString("col1") + "t" +
                            String.valueOf(row.getInt("col2")) + "t" +
                            String.valueOf(row.getLong("col3")) + "t" +
                            String.valueOf(row.getLong("col4"));
            System.out.println(out);
        }
        session.close();
        session.getCluster().close();
    }

}

将输出写入文件,然后加载到配置单元。

hive -e "use schema; load data local inpath '/tmp/cassandra-table' overwrite into table mytable;"

最新更新