postgreSQL:将长文本分成多行



我想把长文本分成多行;还有其他类似的问题,但没有一个对我有用。 我有什么

ID | Message
----------------------------------
1  | Very looooooooooooooooong text
2  | Short text

我想做的是将该字符串每 n 个字符除以 结果如果n = 15

Id                    | Message
------------------------------------------
1                     | Very looooooooo
1                     | oooooooong text
2                     | Short text

如果在 n 个字符后的第一个空格处进行拆分会更好。

我尝试使用string_split和子字符串,但我找不到任何有效的东西。 我想使用类似这样的东西:

SELECT index, element FROM table, CAST(message AS SUPER) AS element AT index;

但它没有考虑到长度,我不喜欢将 varchar 变量转换为超级变量。

您可以使用generate_series()来完成此操作:

select m.*, gs.posn, substring(m.message, gs.posn, 15) as split_message 
from messages m
cross join lateral generate_series(1, length(message), 15) gs(posn);

在长度之后拆分空格有点棘手。 我们必须将message拆分为单词,然后弄清楚如何将它们分成几组,然后重新聚合。

我不知道如何在没有递归的情况下拆分空间。 我希望你不介意它把所有空格都当作单词边界:

with recursive by_words as (
select m.*, s.n, s.word, length(s.word) as word_len,
max(s.n) over (partition by m.id) as num_words
from messages m
cross join lateral regexp_split_to_table(m.message, 's+') 
with ordinality as s(word, n)
), rejoin as (
select id, n, array[word] as words, word_len as cum_word_len, 
word_len >= 15 as keep 
from by_words
where n = 1
union all
select p.id, c.n, 
case 
when p.cum_word_len >= 15 then array[c.word]
else p.words||c.word
end as words, 
case 
when p.cum_word_len >= 15 then c.word_len
else p.cum_word_len + c.word_len + 1
end as cum_word_len,
(p.cum_word_len + c.word_len + 1 >= 15)
or (c.n = c.num_words) as keep
from rejoin p
join by_words c on (c.id, c.n) = (p.id, p.n + 1)
)
select id, 
row_number() over (partition by id
order by n) as segnum,
array_to_string(words, ' ') as split_message
from rejoin 
where keep
order by 1, 2
;

db<>小提琴在这里

编辑以添加:

你能告诉我以下内容是否适用于 Redshift 吗?

with gs as (
select generate_series as posn
from generate_series(1, 150000, 15)
)
select *, substring(m.message, gs.posn, 15) as split_message
from messages m
join gs 
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn
;

感谢@Mike Organek的回答和他的帮助,我找到了也适用于Redshift的解决方案。

Mike 对 Redshift 的回答中的问题与 Redshift 中不支持的generate_series有关,因此这里有一个解决方法。

with row as (
select t.*, row_number() over () as x
from table t -- big enough table
limit 100
),
result as
(
select (x-1)*15+1 as posn from row --change 15 to a number to split the long text with
)
select * into gs
from result

然后迈克的回答:

select *, substring(m.feedback from gs.posn for 15) as split_message
from messages m
join gs 
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn

最新更新