小贝子编程

提取管道和日语字符之间的字母，并用逗号替换为空格

本文关键字：替换空格日语管道字符之间提取 python regex pandas dataframe
更新时间 : 2023-09-17
英文 : Extract alpha letters between a pipe and japanese character, and replace with space with comma

我在数据帧中有一些看起来像这样的数据：

Japanese
--------
明日|Adverb の|Case 天気|Weather は|Case なんですか

使用 Pandas，我正在寻找一种在新专栏中返回它的方法

Tag
------
Adverb, Case, Weather

到目前为止，我已经能够使用

df['Tag'] = df.iloc[:, 0].str.replace('[^a-zA-Z]', ' ')

要得到

Tag
------
Adverb Case Weather

但是当我跑步时

df['Tag'] = df['Tag'].str.replace(' ', ',')

我得到

Tag
------
,,,,Adverb,,,Case,,,,Weather,,,Case,,,,,,

我想我应该使用 str.extract 而不是替换，但在这种情况下我也会收到一条错误消息。

s = df.Japanese.str.findall('(?i)[a-z]+')
pd.Series([', '.join({*x}) for x in s], s.index)
0    Adverb, Weather, Case
dtype: object

s = df.Japanese.str.findall('(?i)[a-z]+')
pd.Series([', '.join(sorted({*x})) for x in s], s.index)
0    Adverb, Case, Weather
dtype: object

相关内容