极地.str.用表达式或.str替换.用正则表达式分割



我有这个数据框架:

sample = pl.DataFrame({"equip": ['AmuletsMedals', 'Guns, CrossbowsOff-Hands', 'Melee WeaponsShieldsOff-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All WeaponsShieldsOff-Hands']})
print(sample)
shape: (7, 1)
┌───────────────────────────────┐
│ equip                         │
│ ---                           │
│ str                           │
╞═══════════════════════════════╡
│ AmuletsMedals                 │
│ Guns, CrossbowsOff-Hands      │
│ Melee WeaponsShieldsOff-Hands │
│ All Armor                     │
│ Chest Armor                   │
│ Shields                       │
│ All WeaponsShieldsOff-Hands   │
└───────────────────────────────┘

我的目的是在单词之间加一个逗号:

answer = pl.DataFrame({"equip": ['Amulets, Medals', 'Guns, Crossbows, Off-Hands', 'Melee Weapons, Shields, Off-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All Weapons, Shields, Off-Hands']})
print(answer)
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip                               │
│ ---                                 │
│ str                                 │
╞═════════════════════════════════════╡
│ Amulets, Medals                     │
│ Guns, Crossbows, Off-Hands          │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor                           │
│ Chest Armor                         │
│ Shields                             │
│ All Weapons, Shields, Off-Hands     │
└─────────────────────────────────────┘

我尝试替换,但是替换不接受表达式:

sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", "[a-z], [A-Z]"))

和一个在polar github上发现的提示,但它会在每次遇到时切割第一个和最后一个单词的最后一个和第一个字母,就像使用:

一样
sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", ", "))

任何想法?

奖金的问题:我认为简单情况的答案也能解决较难的情况,但如果不是这样,下面是较难的情况:

我确实有另一列,它的正则表达式模式比"[a-z][a-z] "稍微难一点,应该是"[a-z][a-z] |[a-z]+|[a-z][1-9]"(我还没有过多强调确切的正则表达式)。目的也是在属性之间加一个逗号:

sample2 = pl.DataFrame({"attributes": ['+10% Aether Damage+30 Defensive Ability16% Aether Resistance6% Less Damage from Aetherials6% Less Damage from Aether Corruptions',
'4-6 Aether Damage+25% Aether Damage10% Physical Damage converted to Aether DamageAether Tendril (Granted by Item)',
'2-8 Lightning Damage+25% Lightning Damage+25% Electrocute Damage10% Physical Damage converted to Lightning DamageEmpowered Lightning Nova (Granted by Item)',
'+10 Health Regenerated per Second+24 Armor20% Poison & Acid Resistance',
'+22 Defensive Ability10% Chance to Avoid Projectiles+18 Armor',
'+15 Physique+10% Shield Block ChanceShield Slam (Granted by Item)',
'+10% Chaos Damage+30 Defensive Ability16% Chaos Resistance6% Less Damage from Chthonics']})

你可以在你的模式中使用捕获组:

df.with_columns(pl.col("equip").str.replace_all(r"([a-z])([A-Z])", "$1, $2"))
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip                               │
│ ---                                 │
│ str                                 │
╞═════════════════════════════════════╡
│ Amulets, Medals                     │
│ Guns, Crossbows, Off-Hands          │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor                           │
│ Chest Armor                         │
│ Shields                             │
│ All Weapons, Shields, Off-Hands     │
└─────────────────────────────────────┘

您可能还想使用unicode类p{lower}p{upper}来代替。

polar支持的正则表达式语法为:https://docs.rs/regex/latest/regex/

相关内容

  • 没有找到相关文章

最新更新