polars.read_csv()，带有德语数字格式

在polar中是否有可能像在pandas.read_csv((中那样，用德语数字格式在csv中读取参数"十进制"；以及"；数千"；

当前，Polars read_csv方法不公开这些参数。

但是，有一个简单的变通方法可以转换它们。例如，使用此csv，允许Polars将德语格式的数字读取为utf8。

from io import StringIO
import polars as pl
my_csv = """col1tcol2tcol3
1.234,5tabct1.234.567
9.876tdeft3,21
"""
df = pl.read_csv(StringIO(my_csv), sep="t")
print(df)

shape: (2, 3)
┌─────────┬──────┬───────────┐
│ col1    ┆ col2 ┆ col3      │
│ ---     ┆ ---  ┆ ---       │
│ str     ┆ str  ┆ str       │
╞═════════╪══════╪═══════════╡
│ 1.234,5 ┆ abc  ┆ 1.234.567 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 9.876   ┆ def  ┆ 3,21      │
└─────────┴──────┴───────────┘

从这里开始，转换只是几行代码：

df = df.with_column(
pl.col(["col1", "col3"])
.str.replace_all(r".", "")
.str.replace(",", ".")
.cast(pl.Float64)  # or whatever datatype needed
)
print(df)

shape: (2, 3)
┌────────┬──────┬────────────┐
│ col1   ┆ col2 ┆ col3       │
│ ---    ┆ ---  ┆ ---        │
│ f64    ┆ str  ┆ f64        │
╞════════╪══════╪════════════╡
│ 1234.5 ┆ abc  ┆ 1.234567e6 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 9876.0 ┆ def  ┆ 3.21       │
└────────┴──────┴────────────┘

只需小心将此逻辑仅应用于德语区域中编码的数字。它会破坏在其他地区格式化的数字。

相关内容

最新更新

热门标签：