我有一个像这样的CSV文件
Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Bro Pest Control,cat|home & garden|pest control,kw|entry,Bro Pest Control,http://www.ci.com/profile/66/ab/brrd_pest_control.html,Pest Control,,NO,NO,"Ablle,Louna,United States",,,,,,,,,
447,Dist Tire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,DisTire Ctr Inc,http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html,Autepair,,NO,NO,"Abblle,Louana,United States",,,,,,,,,
6665,Best Control,geo|la|abbe la area,home & garden|pest control,Br Pest Control,http://www.cit.com/profile/66/abbee_la/broud_pest_control.html,Pest Control,,NO,NO,"A,Louisiana,United States",,,,,,,,,
我想要的输出是
Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,NO,NO,"Abbe,Louiana,United States"
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,NO,NO,"Abbe,Loua,United States"
我正在使用的代码段是
mydf <- read.csv("C:/Users/Administrator/Downloads/FinalLocationList1.csv", header=FALSE, skip=1)
d <- setNames(mydf[,sapply(mydf, function(x) all(!is.na(x)))],names(n))
z <- mydf <- Filter(function(x)!all(is.na(x)), mydf)
信用 - 托马斯
但是以上不是在解决标题问题吗?如何解决?R的新手。任何帮助,不胜感激。
编辑 : 输出 dput(mydf)
structure(list(V1 = c(666L, 447L, 6665L), V2 = structure(c(2L,
3L, 1L), .Label = c("Best Control", "Bro Pest Control", "Dist Tire Ctr Inc"
), class = "factor"), V3 = structure(c(2L, 1L, 3L), .Label = c("cat|automotive sales & services|automotive repair",
"cat|home & garden|pest control", "geo|la|abbe la area"), class = "factor"),
V4 = structure(c(2L, 2L, 1L), .Label = c("home & garden|pest control",
"kw|entry"), class = "factor"), V5 = structure(c(2L, 3L,
1L), .Label = c("Br Pest Control", "Bro Pest Control", "DisTire Ctr Inc"
), class = "factor"), V6 = structure(1:3, .Label = c("http://www.ci.com/profile/66/ab /brrd_pest_control.html",
"http://www.cit.com/profile/44/abbeville_la/discoutire_ctr_inc.html",
"http://www.cit.com/profile/66/abbee_la/broud_pest_control.html"
), class = "factor"), V7 = structure(c(2L, 1L, 2L), .Label = c("Autepair",
"Pest Control"), class = "factor"), V8 = c(NA, NA, NA), V9 = structure(c(1L,
1L, 1L), .Label = "NO", class = "factor"), V10 = structure(c(1L,
1L, 1L), .Label = "NO", class = "factor"), V11 = structure(c(3L,
2L, 1L), .Label = c("A,Louisiana,United States", "Abblle,Louana,United States",
"Ablle,Louna,United States"), class = "factor"), V12 = c(NA,
NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, NA), V15 = c(NA,
NA, NA), V16 = c(NA, NA, NA), V17 = c(NA, NA, NA), V18 = c(NA,
NA, NA), V19 = c(NA, NA, NA), V20 = c(NA, NA, NA)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20"
), class = "data.frame", row.names = c(NA, -3L))
错误
Error in setNames(mydf[, sapply(mydf, function(x) all(!is.na(x)))], names(n)) :
'names' attribute [20] must be the same length as the vector [10]
对要
删除的列使用 colClasses="NULL"
...在这种情况下的最后九个,所以rep("NULL",9)
tx <- 'Identity,AdvertiserName,CampaignName,AdGroupName,Keyword,DestURL,KeystoneKW,,CampaignDuplicate,AdGroupDuplicate,CampaignLocation,,,,,,,,,
666,Broud Pest Control,cat|home & garden|pest control,kw|entry,Bssad Pest Control,http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html,Pest Control,,NO,NO,"Abbe,Louiana,United States",,,,,,,,,
44,DiscTire Ctr Inc,cat|automotive sales & services|automotive repair,kw|entry,Discount Tire Ctr Inc,http://www.cit.com/profile/44/ab/discouctr_inc.html,Automotive Repair,,NO,NO,"Abbe,Loua,United States",,,,,,,,,'
df <- read.table(text=tx, sep=",",
colClasses=c("numeric", rep("character",10), rep("NULL",9)),
header=TRUE)
> str(df)
'data.frame': 2 obs. of 11 variables:
$ Identity : num 666 44
$ AdvertiserName : chr "Broud Pest Control" "DiscTire Ctr Inc"
$ CampaignName : chr "cat|home & garden|pest control" "cat|automotive sales & services|automotive repair"
$ AdGroupName : chr "kw|entry" "kw|entry"
$ Keyword : chr "Bssad Pest Control" "Discount Tire Ctr Inc"
$ DestURL : chr "http://www.cit.com/profile/666/abbeville_la/brrd_pest_control.html" "http://www.cit.com/profile/44/ab/discouctr_inc.html"
$ KeystoneKW : chr "Pest Control" "Automotive Repair"
$ X : chr "" ""
$ CampaignDuplicate: chr "NO" "NO"
$ AdGroupDuplicate : chr "NO" "NO"
$ CampaignLocation : chr "Abbe,Louiana,United States" "Abbe,Loua,United States"
你可以试试:
setNames(
Filter(function(x) !all(is.na(x)), mydf),
names(mydf)[-grep("^X(\.[0-9]+)?$", names(mydf))]
)
生产:
Identity Number Data Result Add
1 1 4 55 92 62
2 3 7 43 12 74
3 7 3 58 52 64
4 0 6 10 22 96
5 3 8 13 92 22
Filter
将保留所有不全NA
列。 然后 grep 部分依赖于 read 产生的名称。CSV 用于空白列(X、X.1 等)以过滤掉错误的名称。 这应该一般有效。
编辑:使用更新的CSV运行会产生:
> str(setNames(Filter(function(x) !all(is.na(x)), mydf), names(mydf)[-grep("^X(\.[0-9]+)?", names(mydf))]))
'data.frame': 2 obs. of 10 variables:
$ Identity : int 666 44
$ AdvertiserName : Factor w/ 2 levels "Broud Pest Control",..: 1 2
$ CampaignName : Factor w/ 2 levels "cat|automotive sales & services|automotive repair",..: 2 1
$ AdGroupName : Factor w/ 1 level "kw|entry": 1 1
$ Keyword : Factor w/ 2 levels "Bssad Pest Control",..: 1 2
$ DestURL : Factor w/ 2 levels "http://www.cit.com/profile/44/ab/discouctr_inc.html",..: 2 1
$ KeystoneKW : Factor w/ 2 levels "Automotive Repair",..: 2 1
$ CampaignDuplicate: Factor w/ 1 level "NO": 1 1
$ AdGroupDuplicate : Factor w/ 1 level "NO": 1 1
$ CampaignLocation : Factor w/ 2 levels "Abbe,Loua,United States",..: 2 1