r-如何将整行作为列名

对于下面的数据帧，我想更改第一列以一个或多个单词开头的行的列名。这里是第二行和单词Company。然而，行可以不同，如具有不同数据帧的第1、第5或第10行，字也可以不同，例如Investment和其他。

structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference", 
""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", 
""), X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
3L), class = "data.frame")
X1             X2       X3        X4       X5       X6
<chr>              <chr>    <chr>      <dbl>   <chr>    <chr>
1                                           NA               NA
2   Company #           Type   Reference    NA   Footnotes   NA
3   Investments:                            NA               NA

我想首先在第一列以单词开头时获得行号，然后使用该行号更改为列名，或者可能有更好的方法。

names(my_df)<- my_df[row_number,]
my_df <- my_df[-row_number,]

所需输出

Company #   Type   Reference    NA   Footnotes    NA
<chr>           <chr>   <chr>      <dbl>   <chr>    <chr>
3   Investments:                         NA              NA

#row number of the first word in the first column
row_n <- min(which(nzchar(my_df[[1]])))
janitor::row_to_names(my_df, row_n)

输出

#     Company # Type Reference NA Footnotes   NA
#3 Investments:                NA           <NA>

请注意，如果这样做，您将拥有非唯一的列名(NA)。您可以使用clean_names快速解决此问题。

您可以尝试：

idx <- which(grepl('[^A-Za-z]', my_df$X1))[1]
colnames(my_df) <- my_df[idx, ]
my_df <- my_df[(idx + 1):nrow(my_df), ]

输出：

Company # Type Reference NA Footnotes   NA
3 Investments:                NA           <NA>

这将检查是否有任何一行以字母开头，将第一个出现的行作为列名，并只保留后面的行

基本R:

colnames(df) <- df[2,]
df <- df[-2,]
df

Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

您可以使用which来获取具有以下字符串的行：

idx <- which("Company #" == my_df)
names(my_df) <- my_df[idx, ]
my_df <- my_df[-idx,]
my_df
#>      Company # Type Reference NA Footnotes   NA
#> 1                             NA           <NA>
#> 3 Investments:                NA           <NA>

^{创建于2023-01-05，reprex v2.0.2}

您可以在第一列使用which查找名称，然后使用names分配给colnames，最后删除该行。所有这些都在基地R:

df <- structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", ""), 
X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")
row_names <- which(nchar(data.frame(df)[, "X1"]) > 1)[1]
names(df) <- df[row_names, ]
df[-c(row_names),]

输出：

Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

这在第一列中使用关键字来查找行，然后确保它与make.unique没有重复名称，与replace没有NAs(字符或数字)。

key <- "Company #"
str <- as.character(dat[dat[,1] == key,][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")

结果

dat
Company # Type Reference Missing Footnotes Missing_1
1                                  NA                <NA>
2    Company # Type Reference      NA Footnotes      <NA>
3 Investments:                     NA                <NA>

如果应该拾取第一个非空单元格，请使用此

str <- as.character(dat[nchar(dat[,1]) != 0, ][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")

结果

相关内容

最新更新

热门标签：