我正在尝试将Spark数据帧写入Kudu DB,但我不知道Kudu主节点。我正在使用的集群是Cloudera集群。
如何在集群中找到捻角羚大师?
下面是使用 Python 客户端 v3 (https://cloudera.github.io/cm_api/docs/python-client-swagger/( 的 python 示例:
#!/usr/local/bin/python
import cm_client
# Configure HTTP basic authorization: basic
#configuration = cm_client.Configuration()
cm_client.configuration.username = 'admin'
cm_client.configuration.password = 'admin'
# Create an instance of the API class
api_client = cm_client.ApiClient("http://your-cdh-cluster-cm-host:7180/api/v30")
# create an instance of the ServicesResourceApi class
service_api_instance = cm_client.ServicesResourceApi(api_client)
# create an instance of the HostsResourceApi class
host_api_instance = cm_client.HostsResourceApi(api_client)
# find KUDU_MASTER roles in the CDH cluster
cluster_roles = service_api_instance.read_roles("Cluster 1", "KUDU-1")
for role in cluster_roles.items:
if role.type == "KUDU_MASTER":
role_host = host_api_instance.read_host(role.host_ref.host_id, view="full")
print("Kudu master is located on %sn" % role_host.hostname)
下面是使用 Cloudera Manager Java 客户端的一个非常基本的示例(https://cloudera.github.io/cm_api/docs/java-client-swagger/(
package cloudera.kudu_example;
import java.io.IOException;
import com.cloudera.api.swagger.HostsResourceApi;
import com.cloudera.api.swagger.ServicesResourceApi;
import com.cloudera.api.swagger.client.ApiClient;
import com.cloudera.api.swagger.client.ApiException;
import com.cloudera.api.swagger.client.Configuration;
import com.cloudera.api.swagger.model.ApiHost;
import com.cloudera.api.swagger.model.ApiRole;
import com.cloudera.api.swagger.model.ApiRoleList;
public class App {
public static void main( String[] args ) throws IOException {
ApiClient cmClient = Configuration.getDefaultApiClient();
cmClient.setBasePath(args[0]);
cmClient.setUsername(args[1]);
cmClient.setPassword(args[2]);
cmClient.setVerifyingSsl(false);
HostsResourceApi hostsApiInstance = new HostsResourceApi();
ServicesResourceApi servicesApiInstance = new ServicesResourceApi();
try {
ApiRoleList apiRoles = servicesApiInstance.readRoles("Cluster 1", "KUDU-1");
for(ApiRole role : apiRoles.getItems()) {
if(role.getType().equalsIgnoreCase("KUDU_MASTER")) {
ApiHost host = hostsApiInstance.readHost(role.getHostRef().getHostId(), "full");
System.out.printf("Kudu master runs at %s. IP: %s, status %s", host.getHostname(), host.getIpAddress(), host.getEntityStatus());
}
}
} catch (ApiException e) {
System.err.println("Exception when calling ClustersResourceApi#readClusters");
e.printStackTrace();
}
}
}
我知道这不是最好的方法,但这是一种快速的方法。让我们假设我们已经有一个 kudu 表(以防即使您没有通过 impala 创建测试/临时表(,只需执行格式化为该表的描述即可。您将获得一堆详细信息,包括 kudu master 详细信息(主机名(,其中端口将是 8051。我相信一旦你知道主机和港口的细节,你就可以探索很多 用于 Spark 数据帧。
临时表语法:
创建表kudu_no_partition_by_clause ( id bigint 主键, s 字符串, b 布尔值 ) 存储为捻角羚;
描述语法:描述格式化table_name;
前一年:
Kudu Web 管理详细信息:https://kudu.apache.org/releases/0.6.0/docs/administration.html
带有火花示例的捻角羚: https://kudu.apache.org/docs/developing.html
干杯!!