R导入csv文件时丢失级别



我有一个3686行34列的数据帧。当我用write.csv2(data, file = folder/data.csv2)保存这个data.frame,然后用read.csv2(folder/data.csv2)再次将其加载到R中时,它也具有相同的行数(3686);但是,当我用unique(data$Species)询问物种(因子)的数量时,Environment中的数据表有708个水平,而我导入的数据表只有554个水平。

str(imported_dataframe$Species)

输出:因子w/554电平

str(Data_in_Environment$Species)

输出:因数w/708电平

有人能帮我吗?

写入CSV时,level属性丢失。您可以单独导出关卡并在data.frame中设置它们。

# Species is a factor with three levels
all_levels <- levels(iris$Species)
all_levels
# [1] "setosa"     "versicolor" "virginica" 
# export table where not all levels are present
write.csv2(head(iris), file = "iris_tmp.csv", row.names = FALSE)
# also export complete list of levels
cat(all_levels, file = "iris_levels_tmp.txt")
# import both levels and data
all_levs <- scan("iris_levels_tmp.txt", what = "")
iris6 <- read.csv2("iris_tmp.csv")
# unrepresented levels have been lost
levels(iris6$Species)
# [1] "setosa"
# define Species as factor with all levels
iris6$Species <- factor(iris6$Species, levels = all_levs)

或者您可以使用save/load导出R数据对象。

iris5 <- head(iris, n = 5)
save("iris5", file = "iris5.rda")
# load back iris5
load(file = "iris5.rda")
levels(iris5$Species)
# [1] "setosa"     "versicolor" "virginica"

或者,您可以使用csvy库并使用包含因子级别的yaml头文件导出csv文件:

# library load
library(csvy)
library(dplyr)
# relevel factos
iris_releveled = iris %>% mutate(Species = relevel(Species, "virginica","setosa","versicolor"))
# write csv file
write.csv2(iris_releveled,"iris_releveled.csv")
# load exported dataset
iris_relevel_loaded = read.csv2("iris_releveled.csv",stringsAsFactors = T)
# now factor levels are lost
iris_relevel_loaded$Species %>% levels()
# write CSVy file from dataset with releveled factors
write_csvy(iris_releveled,  file = "iris_releveled.csvy")
# read csv file with original factor levels
iris_relevel_loaded = read_csvy("iris_releveled.csvy", stringsAsFactors = T)
# now factor levels are kept
iris_relevel_loaded$Species %>% levels()

相关内容

  • 没有找到相关文章

最新更新