根据使用r修改行名的条件删除列

我正在从维基百科中提取表格。我所有的数据提取都是基于第一列的一行。行有时命名为total costs，有时命名为total current costs。

df1 <- data.frame(col1=c('', 'total costs', 'liability'),
col2=c("", 1000, 500),
col3=c("", $, ""),
col4=c("", $, 500,))
df1
col1            col2    col3    col4
total costs     1000      $      $
liability       500            500
============================================
df2
col1                    col2    col3    col4
total current costs     1000      $      $
liability               500             500

我想删除具有$符号的列，用于特定的行名—total costs或total current costs。

我正在尝试按照Sotos建议的以下脚本删除列:

row_num <- which(df$col1 == 'total costs') # or `total current costs`
df_final <- df[-which(df[row_num, ] == '$')]

但是，我必须为不同的表手动放置不同的行名。我如何将这个过程作为一个函数自动化，以便无论行名是什么——total costs还是total current costs——它都会自动选择行名?

期望输出值

col1            col2
total costs     1000 # or total current costs
liability       500

如有任何建议，不胜感激。谢谢!

您可以通过在开头添加包含不同版本的命名

的矢量来实现这一点

all_names = c("total costs","total current costs")

然后你可以在一行中调用它，类似于

df1[,-which(df1[which(df1$col1 %in% all_names),]=="$")]
col1 col2
1                 
2 total costs 1000
3   liability  50

您可以用上面的代码构建一个小函数，并将其应用于所有数据帧:

clean_df = function(data){
data[,-which(data[which(data$col1 %in% all_names),]=="$")]
}

lapply(list(df1,df2), clean_df)

[[1]]
col1 col2
1                 
2 total costs 1000
3   liability  500
[[2]]
col1 col2
1                         
2 total current costs 1000
3           liability  500

如果你遇到另一个命名和/或拼写错误的'total'，你可以随时将其添加到初始向量

相关内容

最新更新

热门标签：