如何在Hive中提取单词最后一次出现后的字符串



我在一个配置单元表中有一个字符串列作为

select * from
(
select "edition_xx/news/radio_today_news_xx" as my_column
union all 
select "edition_xx/news/news/television_1.3" as my_column
) A 

我想提取news/之后的字符串部分。所以我的输出列看起来像

radio_today_news_xx
television_1.3

如何在配置单元中使用regex提取?请注意,news/可以出现X次,我希望字符串在最后一次出现之后。

使用split():

select  split(my_column,'(news/)+')[1] 
from
(
select "edition_xx/news/radio_today_news_xx" as my_column
union all 
select "edition_xx/news/news/television_1.3" as my_column
) A;

此正则表达式表示news/一次或多次

结果:

radio_today_news_xx
television_1.3
Time taken: 37.218 seconds, Fetched: 2 row(s)

使用split获取最后一次出现的

select split(A.my_column,'news/')[size(split(A.my_column,'news/'))-1] lt
from
(
select "edition_xx/news/radio_today_news_xx" as my_column
union all 
select "edition_xx/news/news/television_1.3" as my_column
union all
select "edition_xx/news/radio_today/news_xx" as my_column
) 

输出

radio_today_news_xx
television_1.3
radio_today/news_xx

相关内容

最新更新