i有一个CSV文件,其中包含两个列,A列A中的"分类单元"和C列中的" TIP"。我想将A列A与C列进行比较,如果字符串匹配C列中的另一个字符串我希望它在A列中的字符串旁边的B列中打印出" Y"或类似的内容,如果不是,我想打印" N"或等效。这是我的数据的开始:
Taxon B Tip
Nitrosotalea devanaterra Methanothermobacter thermautotrophicus
Nitrososphaera gargensis Methanobacterium beijingense
Nitrososphaera sca5445 Methanobacterium bryantii
Nitrososphaera sca2170 Methanosarcina mazei
Methanobacterium beijingense Persephonella marina
Methanobacterium bryantii Sulfurihydrogenibium azorense
Methanothermobacter thermautotrophicus Balnearium lithotrophicum
Methanosarcina mazei Isosphaera pallida
Koribacter versatilis Methanobacterium beijingense
Acidicapsa borealis Parachlamydia acanthamoebae
Acidobacterium capsulatum Leptospira biflexa
这只是数据的一小部分,但想法是,除了"北京甲基杆菌"one_answers"甲基甲基杆菌"外,所有细菌都将在B列中印刷在B列中"提示"列,因此" y"将在此处发布。这些也可以是" 1"one_answers" 0"。
我知道dplyr具有过滤和连接数据的功能,但是我找不到与我的需求完全匹配的东西。如果还有一种使用Excel执行此操作的替代方法。
谢谢。
for Excel使用B2中的以下公式,
=if(isnumber(match(a2, c:c, 0)), "y", "n")
填写或双击"拖动按钮"。
使用r
和dplyr
:
# create example data
x = read.table(header = TRUE, stringsAsFactors = FALSE, text =
"Taxon B Tip
Nitrosotalea_devanaterra 1 Methanothermobacter_thermautotrophicus
Nitrososphaera_gargensis 1 Methanobacterium_beijingense
Nitrososphaera_sca5445 1 Methanobacterium_bryantii
Nitrososphaera_sca2170 1 Methanosarcina_mazei
Methanobacterium_beijingense 1 Persephonella_marina
Methanobacterium_bryantii 1 Sulfurihydrogenibium_azorense
Methanothermobacter_thermautotrophicus 1 Balnearium_lithotrophicum
Methanosarcina_mazei 1 Isosphaera_pallida
Koribacter_versatilis 1 Methanobacterium_beijingense
Acidicapsa_borealis 1 Parachlamydia_acanthamoebae
Acidobacterium_capsulatum 1 Leptospira_biflexa")
# Data management part
x1 = data.frame(A = x$Taxon,B = x$B)
x2 = data.frame(A = x$Tip,B = x$B)
x$B[which(x$Taxon == anti_join(x1,x2))] = 0