我正在做一个机器学习NLP项目,遇到了一个问题



我的问题是,在我的csv上,我目前正在研究的是一个餐厅评论评论和它们的值1或0基于给出的星星。在一些注释上有逗号使用,所以当我执行pd.read_csv()时,它看到注释逗号并给出一个错误。如何解决这个问题?

我用了pd。read_csv(path, on_bad_lines='skip')或error_bad_lines = False也是,但这两个没有帮助我的问题。

基本上,它将1或0值输入到审查端,并留给我一个nan值,因此它在代码中产生了问题。

这是csv

Review,Liked
Wow... Loved this place.,1
Crust is not good.,0
Not tasty and the texture was just nasty.,0
Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.,1
The selection on the menu was great and so were the prices.,1
Now I am getting angry and I want my damn pho.,0
Honeslty it didn't taste THAT fresh.),0
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.,0
The fries were great too.,1
A great touch.,1
Service was very prompt.,1
Would not go back.,0
The cashier had no care what so ever on what I had to say it still ended up being wayyy overpriced.,0
I tried the Cape Cod ravoli, chicken, with cranberry...mmmm!,1
I was disgusted because I was pretty sure that was human hair.,0
I was shocked because no signs indicate cash only.,0
Highly recommended.,1
Waitress was a little slow in service.,0
This place is not worth your time, let alone Vegas.,0
did not like at all.,0
The Burrittos Blah!,0
The food, amazing.,1
Service is also cute.,1
I could care less... The interior is just beautiful.,1
So they performed.,1
That's right....the red velvet cake.....ohhh this stuff is so good.,1
#NAME?,0
This hole in the wall has great Mexican street tacos, and friendly staff.,1
Took an hour to get our food only 4 tables in restaurant my food was Luke warm, Our sever was running around like he was totally overwhelmed.,0
The worst was the salmon sashimi.,0
Also there are combos like a burger, fries, and beer for 23 which is a decent deal.,1
This was like the final blow!,0
I found this place by accident and I could not be happier.,1
seems like a good quick place to grab a bite of some familiar pub food, but do yourself a favor and look elsewhere.,0
Overall, I like this place a lot.,1
The only redeeming quality of the restaurant was that it was very inexpensive.,1
Ample portions and good prices.,1
Poor service, the waiter made me feel like I was stupid every time he came to the table.,0
My first visit to Hiro was a delight!,1

第一行是列名

一个选择是使用regex分隔符:

df = pd.read_csv(path, sep=r"b,b|,(?=[01]$)", engine="python")

输出:

print(df.sample(6))
Review  Liked
26                                                        #NAME?      0
6                          Honeslty it didn't taste THAT fresh.)      0
13  I tried the Cape Cod ravoli, chicken, with cranberry...mmmm!      1
0                                       Wow... Loved this place.      1
10                                      Service was very prompt.      1
15            I was shocked because no signs indicate cash only.      0

Demo: [Regex]

相关内容

最新更新