HiveQL查询不返回结果,也不返回错误



我在Ubuntu 14.0上运行Apache Hadoop 2.6.0,我在Hive 0.13.0中创建了一个表:

CREATE TABLE IF NOT EXISTS recipes_hive.cuisine (
ID INT COMMENT 'Cuisine ID.', 
name STRING COMMENT 'Cusine name - primary key.', 
area STRING COMMENT 'Name of the area of origin - foreign key.', 
scope STRING COMMENT 'Either country or area.') 
COMMENT 'Table containing cuisines data.'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

我用语句

填入数据
LOAD DATA LOCAL INPATH 'path_to_file/CUISINE.csv'
OVERWRITE INTO TABLE recipes_hive.cuisine;

我的数据库中有几个这样的表,它们都是用相同的过程创建和填充的。当运行像

这样的简单查询时:
SELECT * FROM cuisine

甚至在WHERE子句中具有某些条件时,我得到了预期的结果,但运行更复杂的查询时却得不到任何结果。例如:

SELECT cuisine.name, SUM(IF (ingredient.category = "fruit",1,2))/count(*) AS      PERC 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
GROUP BY cuisine.name 
ORDER BY PERC DESC

,或者:

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id IN (
SELECT ingredient.id 
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE c.name = "Pakistan") AND cuisine.name = "Bangladesh"

第一个例子计算一些百分比,第二个例子检查相互元素。

MapReduce和Hadoop被正确调用,它们没有返回错误。输出以

结尾:
Execution completed successfully
MapredLocal task succeeded
OK
Time taken: 122.119 seconds

我已经检查了网络,人们和我一样有类似的问题。我检查:

Hive Table对所有查询返回空结果集

Simple Hive query为空

但未能解决我的问题。数据实际上是在HDFS中,如前所述,它用于简单的查询。

所以要么是我的Hive实例有问题,要么是我的查询写得不正确。

任何帮助都将非常感激。致以最亲切的问候。

如果我们有一个包含ID ={1,2,3}的Cuisine表和包含ID ={5,6,7}的Recipe表,那么即使这些表不是空的,在执行INNER JOIN Cuisine时仍然没有返回任何行。ID =配方。ID(因为两个表中的ID不同)请你检查一下是否没有这样的情况。

SELECT count(1)
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name WHERE c.name = "Pakistan";
--- must return > 0 
select count(1) from recipe as recipe
JOIN part_of ON part_of.id_recipe = recipe.id ;
--- must return > 0 
select count(1) from part_of as part_of
JOIN ingredient ON ingredient.name = part_of.ingredient ;
--- must return > 0 

所以内部查询返回所有count(*)都非零的行。现在测试外部select:

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id = <inner query result> and cuisine.name = "Bangladesh";

您确定结果连接将是非空的吗?由于已经实现了内部连接,因此即使一个表缺少记录,整个结果集也为0。尝试添加一个带有"IS NULL"的左连接,以验证所有表都对结果集有贡献。如果所有子表在连接后各自的列中都有非空值,则查询是正确的。

最新更新