r-组合两列中的属性,并对重复行中的值求和

  • 本文关键字:求和 属性 两列 组合 r dataframe
  • 更新时间 :
  • 英文 :


这个问题在这个问题的基础上稍作修改。

我有一个长表格式的数据帧,如下所示:

df1 <- data.frame(ID=c(1,1,1,1,1,1,2,2),
name=c("a","c","a","c","a","c","a","c"),
value=c("broad",50,"mangrove",50,"mangrove",50,"coniferous",50))
ID name       value
1    a        broad
1    c           50
1    a     mangrove
1    c           50
1    a     mangrove
1    c           50
2    a   coniferous
2    c           50

关于数据:第二行的值50对应于第一行的值wide。类似地,第四行50的值对应于第三行红树林,依此类推。简单地说,名称c与名称a相关。

我想以这样一种方式组合值,即我可以为每个名称获得相应的值,这也将聚合具有类似名称的值:

df2 <- data.frame(ID=c(1,1,2),
name=c("c_broad","c_mangrove","c_coniferous"),
value=c(50,100,50))

应该是这样的:

ID         name    value
1       c_broad       50
1    c_mangrove      100
2  c_coniferous       50

使用整形2

library(reshape2)
df1$grp = cumsum(df1$name == "a")
df2 = dcast(df1, ID + grp ~ name)
df2$c = as.numeric(df2$c)
aggregate(c ~ ID + a, df2, sum)
ID          a   c
1  1      broad  50
2  2 coniferous  50
3  1   mangrove 100

列名可以根据需要改变;c";可以通过粘贴添加到名称中。

使用tidyverse:

value_a <- df1 %>% dplyr::filter(name=="a") %>% dplyr::pull(value) 
df1 %>%
dplyr::filter(name=="c") %>% #Modify into a sensible data frame from here
dplyr::mutate(a = value_a,
name = stringr::str_c(name, "_" ,a)) %>%
dplyr::select(-a) %>% # to here
dplyr::group_by(ID, name) %>%
dplyr::summarise(value=sum(as.numeric(value)))
# A tibble: 3 x 3
# Groups:   ID [2]
ID name         value
<dbl> <chr>        <dbl>
1     1 c_broad         50
2     1 c_mangrove     100
3     2 c_coniferous    50

您在数据帧中发现的一个主要问题是,单个列包含名称和值,这是您应该解决的第一件事。我的建议是始终将原始数据帧修改为整洁的格式(https://tidyr.tidyverse.org/articles/tidy-data.html)并从那里利用所有的tidyverse力量,或data.table或您选择的框架。

请注意,时间变量value_a可以直接包含在管道中,为了清楚起见,我没有这样做。其主要思想是在不同的列中分离值和种类,即管道中的前三个调用,然后应用通常的tidyverse操作。

可能不是最优雅的,但它很有效:

df1 <- data.frame(ID=c(1,1,1,1,1,1,2,2),
name=c("a","c","a","c","a","c","a","c"),
value=c("broad",50,"mangrove",50,"mangrove",50,"coniferous",50)
)
df1 %>% group_by( 1+floor((1:n()-1)/2) ) %>%
summarize(
ID = ID[1],
name = paste0( name[2], "_", value[1] ),
value = as.numeric(value[2])
) %>% ungroup %>% select( -1 ) %>% group_by(name) %>%
mutate( value = sum(value) ) %>%
unique

这里有一些改进的东西,实际上是人类可读的:


i <- seq( 1, nrow(df1), 2 )
df1 %>% summarise(
ID = ID[i],
name = paste0( name[i+1], "_", value[i] ),
value = as.numeric(value[i+1])
) %>% group_by(name) %>%
summarize(
ID=ID[1], value = sum( value )
) %>% arrange(ID)

基本R解决方案:

# Nullify numeric values belonging to a grouping category: grps => character vector
grps <- gsub("\d+", NA, df1$value)
# Interpolate NA values using prior string value: a => character vector
df1$a <- na.omit(grps)[cumsum(!(is.na(grps)))]
# Split-Apply-Combine aggregation: data.frame => stdout(console)
data.frame(do.call(rbind, lapply(with(df1, split(df1, a)), function(x){
y <- transform(subset(x, !grepl("\D+", value)), value = as.numeric(value))
setNames(
aggregate(value ~ ID + a, y, FUN = function(z){sum(z, na.rm = TRUE)}),
c("ID", "a", "c")
)
}
)
), 
row.names = NULL
)

附加选项

df1 <- data.frame(ID=c(1,1,1,1,1,1,2,2),
name=c("a","c","a","c","a","c","a","c"),
value=c("broad",50,"mangrove",50,"mangrove",50,"coniferous",50))
library(tidyverse)
df1 %>% 
pivot_wider(ID, names_from = name, values_from = value) %>% 
unnest(c("a", "c")) %>% 
group_by(ID, name = a) %>% 
summarise(value = sum(as.numeric(c), na.rm = T), .groups = "drop") 
#> # A tibble: 3 x 3
#>      ID name       value
#>   <dbl> <chr>      <dbl>
#> 1     1 broad         50
#> 2     1 mangrove     100
#> 3     2 coniferous    50

创建于2021-04-12由reprex包(v2.0.0(

最新更新