如何在 R 中取消空列

  • 本文关键字:取消 r csv
  • 更新时间 :
  • 英文 :


我有一个像这样的CSV文件

Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Bro Pest Control,cat|home & garden|pest control,kw|entry,Bro Pest Control,http://www.ci.com/profile/66/ab/brrd_pest_control.html,Pest Control,,NO,NO,"Ablle,Louna,United States",,,,,,,,,
447,Dist Tire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,DisTire Ctr Inc,http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html,Autepair,,NO,NO,"Abblle,Louana,United States",,,,,,,,,
6665,Best Control,geo|la|abbe la area,home & garden|pest control,Br Pest Control,http://www.cit.com/profile/66/abbee_la/broud_pest_control.html,Pest Control,,NO,NO,"A,Louisiana,United States",,,,,,,,,

我想要的输出是

 Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,NO,NO,"Abbe,Louiana,United States"
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,NO,NO,"Abbe,Loua,United States"

我正在使用的代码段是

mydf <- read.csv("C:/Users/Administrator/Downloads/FinalLocationList1.csv", header=FALSE, skip=1)
d <- setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))
z <- mydf <- Filter(function(x)!all(is.na(x)), mydf)

信用 - 托马斯

但是以上不是在解决标题问题吗?如何解决?R的新手。任何帮助,不胜感激。

编辑 : 输出 dput(mydf)

structure(list(V1 = c(666L, 447L, 6665L), V2 = structure(c(2L, 
3L, 1L), .Label = c("Best Control", "Bro Pest Control", "Dist Tire Ctr Inc"
), class = "factor"), V3 = structure(c(2L, 1L, 3L), .Label = c("cat|automotive sales &   services|automotive repair", 
"cat|home & garden|pest control", "geo|la|abbe la area"), class = "factor"), 
V4 = structure(c(2L, 2L, 1L), .Label = c("home & garden|pest control", 
"kw|entry"), class = "factor"), V5 = structure(c(2L, 3L, 
1L), .Label = c("Br Pest Control", "Bro Pest Control", "DisTire Ctr Inc"
), class = "factor"), V6 = structure(1:3, .Label = c("http://www.ci.com/profile/66/ab /brrd_pest_control.html", 
"http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html", 
"http://www.cit.com/profile/66/abbee_la/broud_pest_control.html"
), class = "factor"), V7 = structure(c(2L, 1L, 2L), .Label = c("Autepair", 
"Pest Control"), class = "factor"), V8 = c(NA, NA, NA), V9 = structure(c(1L, 
1L, 1L), .Label = "NO", class = "factor"), V10 = structure(c(1L, 
1L, 1L), .Label = "NO", class = "factor"), V11 = structure(c(3L, 
2L, 1L), .Label = c("A,Louisiana,United States", "Abblle,Louana,United States", 
"Ablle,Louna,United States"), class = "factor"), V12 = c(NA, 
NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, NA), V15 = c(NA, 
NA, NA), V16 = c(NA, NA, NA), V17 = c(NA, NA, NA), V18 = c(NA, 
NA, NA), V19 = c(NA, NA, NA), V20 = c(NA, NA, NA)), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"
), class = "data.frame", row.names = c(NA, -3L))

错误

Error in setNames(mydf[, sapply(mydf, function(x) all(!is.na(x)))], names(n)) : 
'names' attribute [20] must be the same length as the vector [10]
对要

删除的列使用 colClasses="NULL"...在这种情况下的最后九个,所以rep("NULL",9)

tx <- 'Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,,NO,NO,"Abbe,Louiana,United States",,,,,,,,,
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,,NO,NO,"Abbe,Loua,United States",,,,,,,,,'
df <- read.table(text=tx, sep=",", 
                 colClasses=c("numeric", rep("character",10), rep("NULL",9)), 
                 header=TRUE)
> str(df)
'data.frame':   2 obs. of  11 variables:
 $ Identity         : num  666 44
 $ AdvertiserName   : chr  "Broud Pest Control" "DiscTire Ctr Inc"
 $ CampaignName     : chr  "cat|home & garden|pest control" "cat|automotive sales & services|automotive repair"
 $ AdGroupName      : chr  "kw|entry" "kw|entry"
 $ Keyword          : chr  "Bssad Pest Control" "Discount Tire Ctr Inc"
 $ DestURL          : chr  "http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html" "http://www.cit.com/profile/44/ab/discouctr_inc.html"
 $ KeystoneKW       : chr  "Pest Control" "Automotive Repair"
 $ X                : chr  "" ""
 $ CampaignDuplicate: chr  "NO" "NO"
 $ AdGroupDuplicate : chr  "NO" "NO"
 $ CampaignLocation : chr  "Abbe,Louiana,United States" "Abbe,Loua,United States"

你可以试试:

    setNames(
      Filter(function(x) !all(is.na(x)), mydf), 
      names(mydf)[-grep("^X(\.[0-9]+)?$", names(mydf))]
    )

生产:

  Identity Number Data Result Add
1        1      4   55     92  62
2        3      7   43     12  74
3        7      3   58     52  64
4        0      6   10     22  96
5        3      8   13     92  22

Filter将保留所有不全NA列。 然后 grep 部分依赖于 read 产生的名称。CSV 用于空白列(X、X.1 等)以过滤掉错误的名称。 这应该一般有效。


编辑:使用更新的CSV运行会产生:

> str(setNames(Filter(function(x) !all(is.na(x)), mydf), names(mydf)[-grep("^X(\.[0-9]+)?", names(mydf))]))
'data.frame': 2 obs. of  10 variables:
 $ Identity         : int  666 44
 $ AdvertiserName   : Factor w/ 2 levels "Broud Pest Control",..: 1 2
 $ CampaignName     : Factor w/ 2 levels "cat|automotive sales & services|automotive repair",..: 2 1
 $ AdGroupName      : Factor w/ 1 level "kw|entry": 1 1
 $ Keyword          : Factor w/ 2 levels "Bssad Pest Control",..: 1 2
 $ DestURL          : Factor w/ 2 levels "http://www.cit.com/profile/44/ab/discouctr_inc.html",..: 2 1
 $ KeystoneKW       : Factor w/ 2 levels "Automotive Repair",..: 2 1
 $ CampaignDuplicate: Factor w/ 1 level "NO": 1 1
 $ AdGroupDuplicate : Factor w/ 1 level "NO": 1 1
 $ CampaignLocation : Factor w/ 2 levels "Abbe,Loua,United States",..: 2 1

最新更新