如何分离组合的列，但数据不一致

我正在为一份新工作做准备，在这份工作中，我将收到不同质量的数据提交，通常时间、日期/字符等被毫无意义地组合在一起，在分析之前必须分开。提前思考如何解决这个问题。

使用下面一个虚构的例子，我将region、rep和product组合在一起。

file['combine'] = file['Region'] + file['Sales Rep'] + file['Product']

Shift Region Sales Rep  Product  Cost per  Units Sold              combine
0       3   East  Shirlene   Pencil         5          71   EastShirlenePencil
1       3  South  Anderson   Folder        17          69  SouthAndersonFolder
2       3   West    Shelli   Folder        17         185     WestShelliFolder
3       3  South    Damion   Binder        30         159    SouthDamionBinder
4       3   West  Shirlene  Stapler        25          41  WestShirleneStapler

假设没有其他数据，问题是，如何拆分"组合"列？

非常感谢！

如果您想要字符串之间的空间，您可以执行：

df["combine"] = df[["Region", "Sales Rep", "Product"]].apply(" ".join, axis=1)
print(df)

打印：

Shift Region Sales Rep  Product  Cost per  Units Sold                combine
0      3   East  Shirlene   Pencil         5          71   East Shirlene Pencil
1      3  South  Anderson   Folder        17          69  South Anderson Folder
2      3   West    Shelli   Folder        17         185     West Shelli Folder
3      3  South    Damion   Binder        30         159    South Damion Binder
4      3   West  Shirlene  Stapler        25          41  West Shirlene Stapler

或者：如果您想拆分已组合的字符串：

import re
df["separated"] = df["combine"].apply(lambda x: re.findall(r"[A-Z][^A-Z]*", x))
print(df)

打印：

Shift Region Sales Rep  Product  Cost per  Units Sold              combine                  separated
0      3   East  Shirlene   Pencil         5          71   EastShirlenePencil   [East, Shirlene, Pencil]
1      3  South  Anderson   Folder        17          69  SouthAndersonFolder  [South, Anderson, Folder]
2      3   West    Shelli   Folder        17         185     WestShelliFolder     [West, Shelli, Folder]
3      3  South    Damion   Binder        30         159    SouthDamionBinder    [South, Damion, Binder]
4      3   West  Shirlene  Stapler        25          41  WestShirleneStapler  [West, Shirlene, Stapler]

相关内容

最新更新

热门标签：