我有一个包含253列和10000多行的数据框架。
问题:我只需要保留带有列名的单元格(而不是整行),并根据SKU列从行中删除"NO"(将其用作ID)。
有没有一种更简单的方法可以水平删除所有"NO",将SKU列作为ID?
我的输入:
SKU Tv y Video Cómputo Tecnología Electrohogar Decohogar Deportes
2003091090002P Tv y Video NO Tecnología NO NO Deportes
2.00E+12 Tv y Video NO Tecnología NO NO Deportes
2003120060006P Tv y Video NO Tecnología NO NO Deportes
2003120060006P NO NO NO NO NO NO
2.00E+12 NO NO NO NO NO NO
2004121460000P NO Cómputo Tecnología NO Decohogar NO
2.00E+12 NO Cómputo Tecnología NO Decohogar NO
2004121440002P NO Cómputo Tecnología NO Decohogar NO
2.00E+12 NO Cómputo Tecnología NO Decohogar NO
我想要的输出:
正如你在"Deportes"栏中看到的,我有来自"Deports"one_answers"Decohogar"的数据。我不介意把这两者结合起来,因为我在每一行都有真实的数据。
SKU Tv y Video Cómputo Deportes
2003091090002P Tv y Video Tecnología Deportes
2.00E+12 Tv y Video Tecnología Deportes
2003120060006P Tv y Video Tecnología Deportes
2004121460000P Cómputo Tecnología Decohogar
2.00E+12 Cómputo Tecnología Decohogar
2004121440002P Cómputo Tecnología Decohogar
2.00E+12 Cómputo Tecnología Decohogar
以下是我的数据示例:
structure(list(SKU = structure(c(4L, 1L, 5L, 5L, 2L, 7L, 3L,
6L, 3L), .Label = c("2.00309E+12", "2.00312E+12", "2.00412E+12",
"2003091090002P", "2003120060006P", "2004121440002P", "2004121460000P"
), class = "factor"), Tv.y.Video = structure(c(2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("NO", "Tv y Video"), class = "factor"),
Cómputo = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Cómputo",
"NO"), class = "factor"), Tecnología = structure(c(2L, 2L,
2L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("NO", "Tecnología"
), class = "factor"), Electrohogar = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "NO", class = "factor"),
Decohogar = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L
), .Label = c("Decohogar", "NO"), class = "factor"), Deportes = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Deportes", "NO"
), class = "factor")), .Names = c("SKU", "Tv.y.Video", "Cómputo",
"Tecnología", "Electrohogar", "Decohogar", "Deportes"), class = "data.frame", row.names = c(NA,
-9L))
这里听起来有两个问题-一个是数据处理问题(如何将所有NO
值清除为"NA"),另一个是子集设置问题(如何只选择没有缺失值的行。
下面是一个可复制的示例,它使用dplyr
中的mutate_each
逐列函数来清理所有列:
library(nycflights13)
library(dplyr)
#has NAs
summary(flights)
munge_me <- function(x) {ifelse(is.na(x), -1, x)}
flights_prepped <- flights %>%
mutate_each(
funs(munge_me)
)
#NAs are gone, and you have -1 instead
summary(flights_prepped)
一旦您的值被正确标记为NA,您就可以使用complete.cases()
将数据帧子集设置为仅没有NA的行。
nrow(flights)
nrow(flights[complete.cases(flights), ])