我想把长文本分成多行;还有其他类似的问题,但没有一个对我有用。 我有什么
ID | Message
----------------------------------
1 | Very looooooooooooooooong text
2 | Short text
我想做的是将该字符串每 n 个字符除以 结果如果n = 15
:
Id | Message
------------------------------------------
1 | Very looooooooo
1 | oooooooong text
2 | Short text
如果在 n 个字符后的第一个空格处进行拆分会更好。
我尝试使用string_split和子字符串,但我找不到任何有效的东西。 我想使用类似这样的东西:
SELECT index, element FROM table, CAST(message AS SUPER) AS element AT index;
但它没有考虑到长度,我不喜欢将 varchar 变量转换为超级变量。
您可以使用generate_series()
来完成此操作:
select m.*, gs.posn, substring(m.message, gs.posn, 15) as split_message
from messages m
cross join lateral generate_series(1, length(message), 15) gs(posn);
在长度之后拆分空格有点棘手。 我们必须将message
拆分为单词,然后弄清楚如何将它们分成几组,然后重新聚合。
我不知道如何在没有递归的情况下拆分空间。 我希望你不介意它把所有空格都当作单词边界:
with recursive by_words as (
select m.*, s.n, s.word, length(s.word) as word_len,
max(s.n) over (partition by m.id) as num_words
from messages m
cross join lateral regexp_split_to_table(m.message, 's+')
with ordinality as s(word, n)
), rejoin as (
select id, n, array[word] as words, word_len as cum_word_len,
word_len >= 15 as keep
from by_words
where n = 1
union all
select p.id, c.n,
case
when p.cum_word_len >= 15 then array[c.word]
else p.words||c.word
end as words,
case
when p.cum_word_len >= 15 then c.word_len
else p.cum_word_len + c.word_len + 1
end as cum_word_len,
(p.cum_word_len + c.word_len + 1 >= 15)
or (c.n = c.num_words) as keep
from rejoin p
join by_words c on (c.id, c.n) = (p.id, p.n + 1)
)
select id,
row_number() over (partition by id
order by n) as segnum,
array_to_string(words, ' ') as split_message
from rejoin
where keep
order by 1, 2
;
db<>小提琴在这里
编辑以添加:
你能告诉我以下内容是否适用于 Redshift 吗?
with gs as (
select generate_series as posn
from generate_series(1, 150000, 15)
)
select *, substring(m.message, gs.posn, 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn
;
感谢@Mike Organek的回答和他的帮助,我找到了也适用于Redshift的解决方案。
Mike 对 Redshift 的回答中的问题与 Redshift 中不支持的generate_series
有关,因此这里有一个解决方法。
with row as (
select t.*, row_number() over () as x
from table t -- big enough table
limit 100
),
result as
(
select (x-1)*15+1 as posn from row --change 15 to a number to split the long text with
)
select * into gs
from result
然后迈克的回答:
select *, substring(m.feedback from gs.posn for 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn