我有一个大数据集,3000x400。我需要创建新的列,这些列是由可变constituency
子集的现有列的手段。我有一个新列名称的列表,我想用来命名新列,下面称为newNames
。但是,当我直接键入所需的新名称时,我只能弄清楚如何命名列。
我目前要做的事情:
set.seed(1)
dataTest = data.table(turnout_avg = rnorm(20), urban_avg = rnorm(20,5,2), Constituency = c("A","B","C","D"), key = "Constituency")
oldColumnNames = c( "turnout_avg" , "urban_avg")
newNames = c( "turnout" , "urban")
# Here's my problem, naming these new columns
comm_means_by_district = cbind(
dataTest[,list(Const_turnout = mean(na.omit(get(oldColumnNames[[1]])))), by= Constituency],
dataTest[,list(Const_urban = mean(na.omit(get(oldColumnNames[[2]])))),by= Constituency])
实际上,我想创建两个以上的新列。因此,对于所有新列,我都无法可行地键入Const_turnout
,Const_urban
等。
我已经尝试了两个想法,但两项都起作用,1.
dataTest[,list(paste("district", newNames[1], sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]
或2。
dataTest[,list(paste(oldColumnNames[1], "constMean", sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]
首先获得所有列的平均值
DT <- dataTest[,lapply(.SD,function(x) mean(na.omit(x))), by= Constituency]
然后更改以后的colnames
setnames(DT,colnames(DT),vector_of_newnames)
为什么在应用该函数的同一行中更改名称很重要?我只是首先计算成选区的含义,然后在之后设置列名。这是这样的样子:
dt <- dataTest[, lapply(oldColumnNames, function(x) mean(na.omit(get(x)))),
by=Constituency]
setnames(dt, c("Constituency", paste("Const", newNames, sep="_")))
dt