r语言 - 根据其他/以前的列名设置列名,可能使用apply()或for循环



我有一个大数据集,其中每5列对应1条动脉的测量值,但只有5列中的第一列被命名。一个例子:


df <- structure(list("agatston", "area", "volume", "density", "mass", 
"agatston", "area", "volume", "density", "mass", "agatston", 
"area", "volume", "density", "mass"), 
.Names = c("Artery_1", NA, NA, NA, NA, "Artery_2", NA, NA, NA, NA, "Artery_3", NA, NA, NA, NA), 
row.names = c(NA, -1L), 
class = c("tbl_df", "tbl", "data.frame"))

它看起来像这样:

df
# A tibble: 1 x 15
Artery_1 ``    ``     ``      ``    Artery_2 ``    ``     ``      ``    Artery_3 ``    ``     ``      ``   
<chr>    <chr> <chr>  <chr>   <chr> <chr>    <chr> <chr>  <chr>   <chr> <chr>    <chr> <chr>  <chr>   <chr>
1 agatston area  volume density mass  agatston area  volume density mass  agatston area  volume density mass 

我试图使用for循环或apply()来获取用最近的非缺失列名填充的缺失列名。我想要实现的是这样的:

# A tibble: 1 x 15
Artery_1 Artery_1 Artery_1 Artery_1 Artery_1 Artery_2 Artery_2 Artery_2 Artery_2 Artery_2 Artery_3 Artery_3 Artery_3 Artery_3 Artery_3
<chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1 agatston area     volume   density  mass     agatston area     volume   density  mass     agatston area     volume   density  mass 

帮忙吗?

编辑:下一步,我想通过将列与其下行的名称组合,使列成为non_unique,从而产生以下输出:

# A tibble: 1 x 15
Artery_1_agatson Artery_1_area Artery_1_volume Artery_1_density Artery_1_mass Artery_2_agatson Artery_2_area Artery_2_volume
<chr>            <chr>         <chr>           <chr>            <chr>         <chr>            <chr>         <chr>          
1 agatston         area          volume          density          mass          agatston         area          volume         
# ... with 7 more variables: Artery_2_density <chr>, Artery_2_mass <chr>, Artery_3_agatson <chr>, Artery_3_area <chr>,
#   Artery_3_volume <chr>, Artery_3_density <chr>, Artery_3_mass <chr>

可以用zoo::na.locf代替NA的值

names(df) <- zoo::na.locf(names(df))
names(df)
# [1] "Artery_1" "Artery_1" "Artery_1" "Artery_1" "Artery_1" "Artery_2"
# [7] "Artery_2" "Artery_2" "Artery_2" "Artery_2" "Artery_3" "Artery_3"
#[13] "Artery_3" "Artery_3" "Artery_3"```

但是,具有相同的列名并不是一个好的做法,因此您可以使用make.unique来使列名唯一。

names(df) <- make.unique(zoo::na.locf(names(df)))
names(df)
# [1] "Artery_1"   "Artery_1.1" "Artery_1.2" "Artery_1.3" "Artery_1.4"
# [6] "Artery_2"   "Artery_2.1" "Artery_2.2" "Artery_2.3" "Artery_2.4"
#[11] "Artery_3"   "Artery_3.1" "Artery_3.2" "Artery_3.3" "Artery_3.4"

要与第一行合并列,可以使用

names(df) <- paste(zoo::na.locf(names(df)), df[1, ], sep = '_')
names(df)
# [1] "Artery_1_agatston" "Artery_1_area"     "Artery_1_volume"  
# [4] "Artery_1_density"  "Artery_1_mass"     "Artery_2_agatston"
# [7] "Artery_2_area"     "Artery_2_volume"   "Artery_2_density" 
#[10] "Artery_2_mass"     "Artery_3_agatston" "Artery_3_area"    
#[13] "Artery_3_volume"   "Artery_3_density"  "Artery_3_mass"    

然后可能使用df <- df[-1, ]删除第一行。

Base R解:

names(df) <- paste(
na.omit(names(df))[cumsum(!(is.na(names(df))))],
df[1,,drop=TRUE],
sep = "_"
)

或删除第一行:

clean_df <- setNames(
df[-1,],
paste(
na.omit(names(df))[cumsum(!(is.na(names(df))))],
df[1,,drop=TRUE],
sep = "_"
)
)

相关内容

  • 没有找到相关文章

最新更新