如何在Snowflake列的字符串中删除重复/重复的单词

我对regexp概念很陌生，我有以下场景，其中几个单词在方括号内，它可能在多次出现，我只需要保留其中一个单词，并应删除方括号。

现有格式:

|                                              TXT                                       | 
| -------------------------------------------------------------------------------------- | 
| This sentence has [num] [num] [num] [num] and there are [num] [num] in previous string |

我有上面这个表的txt列，我需要应用一个regexp正则表达式函数来实现下面的输出。

所需输出:

|                           TXT                                       | 
| ------------------------------------------------------------------- | 
| This sentence has num and there are num in previous string |

你能帮我一下吗?因为这需要在sql查询上完成。不希望使用udf得到任何答案

Thanks in Advance.

我已经尝试了下面的查询，可以得到第一个要保留的num，但无法实现我所期望的

查询:

select 
regexp_replace(regexp_replace(regexp_replace(txt,'\[\w+\]','REGEX_WORD',1,1,'c'),'\[\w+\]',''),'REGEX_WORD',regexp_replace(regexp_substr(txt,'\[\w+\]'),'\[|\]','')) working_model  from cte;

输出:

This sentence has num and there are in previous string

不希望任何使用udf的答案

代码提供了比正则表达式更大的灵活性，所以这里有一些单独的正则表达式无法涵盖的可能性。

你可以把你的模式放在括号里(这些东西对于那些称之为"括号"的人来说是())，然后告诉它模式可以重复1次或更多次{1,}。如果您使用单引号'来终止字符串，请记住在Snowflake中使用双反斜杠\。如果您使用$$来终止字符串，则不需要双反斜杠。您可以使用1作为替换，对第一个捕获组使用反向引用。不幸的是，这意味着它保留了周围的方括号:[]

set s = (select 'This sentence has [num] [num] [num] [num] and there are [num] [num] in previous string');
select regexp_replace($s, '(\[\w+\]\s+){1,}', '\1') as OUTPUT;

相关内容

最新更新

热门标签：