r语言 - 如何通过跳过有问题的行来打开带有 Fread 的文件



基本上,我正在尝试使用Libray(data.table(读取csv,但给我错误。我知道它停留在第 342637 行,但无法弄清楚如何阅读 csv 或跳过这个有问题的行。我已经尝试了我在网上找到的所有选项,但仍然停留在同一个地方。由于数据很大,我无法检查第 342637 行周围出了什么问题。还有其他方法可以读取此csv文件吗?

数据表版本:1.10.4.3

user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8")
Read 13.1% of 1837283 rows
Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8") : 
Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) 'n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", fill=TRUE)
Read 13.6% of 1837284 rows
Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
Expecting 77 cols, but line 342637 contains text after processing all cols. Tryagain with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) 'n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.
user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", sep=",")
Read 13.6% of 1837283 rows
Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) 'n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

user <- fread( "user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", sep=",", fill=TRUE, blank.lines.skip=TRUE)
Read 14.2% of 1837284 rows
Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) 'n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

一种选择是执行 2 次fread()调用 - 一个用于前 342636 行,然后一个用于其余行:

user_start <- fread('user.csv', nrows = 342636)
user_end <- fread('user.csv', skip = 342637)
user <- rbindlist(list(user_start, user_end))

最新更新