R:从数据帧中水平删除"NO"

  • 本文关键字:删除 NO 水平 数据帧 r
  • 更新时间 :
  • 英文 :


我有一个包含253列和10000多行的数据框架。

问题:我只需要保留带有列名的单元格(而不是整行),并根据SKU列从行中删除"NO"(将其用作ID)。

有没有一种更简单的方法可以水平删除所有"NO",将SKU列作为ID?

我的输入:

SKU             Tv y Video    Cómputo    Tecnología Electrohogar     Decohogar  Deportes
2003091090002P  Tv y Video      NO       Tecnología      NO             NO      Deportes
2.00E+12        Tv y Video      NO       Tecnología      NO           NO        Deportes
2003120060006P  Tv y Video      NO       Tecnología      NO           NO        Deportes
2003120060006P  NO              NO            NO         NO           NO         NO
2.00E+12        NO              NO            NO         NO           NO         NO
2004121460000P  NO            Cómputo     Tecnología     NO          Decohogar          NO
2.00E+12        NO            Cómputo     Tecnología     NO          Decohogar          NO
2004121440002P  NO            Cómputo     Tecnología     NO          Decohogar          NO
2.00E+12        NO            Cómputo     Tecnología     NO          Decohogar          NO

我想要的输出:

正如你在"Deportes"栏中看到的,我有来自"Deports"one_answers"Decohogar"的数据。我不介意把这两者结合起来,因为我在每一行都有真实的数据。

SKU             Tv y Video  Cómputo     Deportes
2003091090002P  Tv y Video  Tecnología  Deportes
2.00E+12        Tv y Video  Tecnología  Deportes
2003120060006P  Tv y Video  Tecnología  Deportes
2004121460000P  Cómputo     Tecnología  Decohogar
2.00E+12        Cómputo     Tecnología  Decohogar
2004121440002P  Cómputo     Tecnología  Decohogar
2.00E+12        Cómputo     Tecnología  Decohogar

以下是我的数据示例:

structure(list(SKU = structure(c(4L, 1L, 5L, 5L, 2L, 7L, 3L, 
6L, 3L), .Label = c("2.00309E+12", "2.00312E+12", "2.00412E+12", 
"2003091090002P", "2003120060006P", "2004121440002P", "2004121460000P"
), class = "factor"), Tv.y.Video = structure(c(2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("NO", "Tv y Video"), class = "factor"), 
    Cómputo = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Cómputo", 
    "NO"), class = "factor"), Tecnología = structure(c(2L, 2L, 
    2L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("NO", "Tecnología"
    ), class = "factor"), Electrohogar = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "NO", class = "factor"), 
    Decohogar = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L
    ), .Label = c("Decohogar", "NO"), class = "factor"), Deportes = structure(c(1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Deportes", "NO"
    ), class = "factor")), .Names = c("SKU", "Tv.y.Video", "Cómputo", 
"Tecnología", "Electrohogar", "Decohogar", "Deportes"), class = "data.frame", row.names = c(NA, 
-9L))

这里听起来有两个问题-一个是数据处理问题(如何将所有NO值清除为"NA"),另一个是子集设置问题(如何只选择没有缺失值的行。

下面是一个可复制的示例,它使用dplyr中的mutate_each逐列函数来清理所有列:

library(nycflights13)
library(dplyr)
#has NAs
summary(flights)
munge_me <- function(x) {ifelse(is.na(x), -1, x)}
flights_prepped <- flights %>%
  mutate_each(
    funs(munge_me)    
  )
#NAs are gone, and you have -1 instead
summary(flights_prepped)

一旦您的值被正确标记为NA,您就可以使用complete.cases()将数据帧子集设置为没有NA的行。

nrow(flights)
nrow(flights[complete.cases(flights), ])

最新更新