不能在Rust中用LazyCsvReader读取csv到Polars数据框架



我第一次尝试rust版本的polar。所以我建立了一个项目,并在货物中添加了极地。Toml文件的装载文件如下所示:

[package]
name = "polar_test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
polars = "0.22.6"

然后我在main中编写了以下代码。rs文件。代码直接取自极地网站。但编译器抱怨LazyCsvReader和其他类型,如col, sort等。看起来::prelude::*没有作用。下面是main的代码。rs文件:

use polars::prelude::*;
fn example() -> Result<DataFrame> {
LazyCsvReader::new("wine.data".into()).collect()
}
fn main() {
println!("Hello, world!");
}

下面是错误日志:

datapsycho@dataops:~/.../PolarTest$ cargo build
Compiling polar_test v0.1.0 (/home/datapsycho/IdeaProjects/PolarTest)
error[E0433]: failed to resolve: use of undeclared type `LazyCsvReader`
--> src/main.rs:4:5
|
4 |     LazyCsvReader::new("foo.csv".into()).collect()
|     ^^^^^^^^^^^^^ use of undeclared type `LazyCsvReader`
For more information about this error, try `rustc --explain E0433`.
error: could not compile `polar_test` due to previous error

我的理解是使用prelude::*不会将col, groupby, LazyCsvReader等类型带入作用域。有人能给我一个例子,我怎么能读一个CSV文件与极地和做一些操作。下面是相应的python版本的代码,pandas如下所示:

from pathlib import Path
import pandas as pd

def read_data(path: Path) -> pd.DataFrame:
columns = [
"Class label", "Alcohol",
"Malic acid", "Ash",
"Alcalinity of ash", "Magnesium",
"Total phenols", "Flavanoids",
"Nonflavanoid phenols",
"Proanthocyanins",
"Color intensity", "Hue",
"OD280/OD315 of diluted wines",
"Proline"
]
_df = pd.read_csv(path, names=columns)
return _df

def count_classes(df: pd.DataFrame) -> pd.DataFrame:
_df = df.groupby("Class label").agg(total=("Class label", "count")).reset_index()
_df.to_csv(Path("datastore").joinpath("data_count.csv"), index=False)
return _df

def main():
file_path = Path("datastore").joinpath("wine.data")
main_df = read_data(file_path)
class_stat_df = count_classes(main_df)
print(class_stat_df)

if __name__ == "__main__":
main()

数据可以通过以下命令下载:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

谁能帮助我,我如何在Rust中编写相同的转换管道与极地。这是北极星首页给出的例子,可能需要一些修改:

use polars::prelude::*;
fn example() -> Result<DataFrame> {
LazyCsvReader::new("foo.csv".into())
.finish()
.filter(col("bar").gt(lit(100)))
.groupby(vec![col("ham")])
.agg(vec![col("spam").sum(), col("ham").sort(false).first()])
.collect()
}

需要激活lazy特性。查看文档了解所有特性:

https://docs.rs/polars/0.22.8/polars/compile-times-and-opt-in-features

在阅读了建议的文档后,最终得到了解决方案。作为一个Rust初学者用户可能无法真正理解首先建议的解决方案。然后经过一些研究,现在事情对我来说很清楚。因此,我的理解是,为了减少编译开销,在默认模式下禁用某些功能,用户需要启用这些功能。启用这些功能的货物。需要更新Toml文件。如要引入描述和懒惰特性,可以在cargo文件中使用以下配置:

polars = {version = "0.22.8", features = ["describe", "lazy"]}

最新更新