单击此处查看表
我认为这是一项简单的任务,但我是一名生物学家,只知道一点点代码,经过几天的努力,我已经无计可施了。
在Mac上使用终端。我有一个CSV文件,我想按行(162行(将其拆分为单独的文件,并且我想按第一列和第二列的内容命名文件(genus_species(。然后我需要将所有162个genus_species保存为HTML文件。
我只尝试过";"分裂";Ruby的一部分(来自StackExchange/overflow的建议(。以下是我的一些尝试。它们是有用的论坛的弗兰肯斯坦,每次论坛结束后,我都会对为什么它不起作用发表一些评论。
示例HTML
<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script></head>
<body>
<h1><em><!-- Species name --></em> - <!-- Common name --></h1>
<h2>Status</h2>
<p></p>
<h2>Info</h2>
<p></p>
<h2>Time of year this bee is seen</h2>
<p></p>
<h2>Identification</h2>
<p></p>
<h3>Similar Species</h3>
<p></p>
<h2>Flowers</h2>
<p></p>
<h2>Sociality</h2>
<p></p>
<h2>Nest</h2>
<p></p>
<div id="refs" class="references">
--<br>More information:<br> <!-- <a href="https://bugguide.net/node/view/70932">Bug Guide</a> --></div>
</body></html>
基于评论的更多信息
以下是从文本文件中复制的一些行:
Genus,species,Common name,Status,Info,Time of year this bee is seen,Identification,Similar Species,Flowers,Sociality,Nest,Bug Guide,Discover Life,Other,
Agapostemon,melliventris,Honey-tailed Striped-Sweat bee,Secure G5,Excavates into deep burrows in ground nests,March-December,Agapostemon males have black and yellow stripes on the abdomen. Females have a yellow band on the lower margin of the clypeus.,All other Agapostemon species,Wide variety of plants,Solitary,"Deep, underground excavation",https://bugguide.net/node/view/70932,https://www.discoverlife.org/20/q?search=Agapostemon+melliventris,https://explorer.natureserve.org/Taxon/ELEMENT_GLOBAL.2.928401/Agapostemon_melliventris,
Agapostemon,sericeus,Silky Striped Sweat Bee,Secure G5,"Not choosy about lawn, as long as flowers are present",April-October,Agapostemon males have black and yellow stripes on the abdomen. A. sericeus males have a tooth on its hind femur. Female has metallic green abdomen.,All other Agapostemon species,Wide variety of plants,Solitary,Ground-nester in loamy soils,https://bugguide.net/node/view/83023,https://www.discoverlife.org/mp/20q?search=Agapostemon+sericeus,https://www.sharpeatmanguides.com/sweat-bees,
Agapostemon,splendens,Brown-winged Striped-Sweat Bee,Secure G5,This is the most common Agapostemon found in the southeast region,April-October,Agapostemon males have black and yellow stripes on the abdomen. A. splendens have brown wings. The female abdomen is often somewhat bluish.,All other Agapostemon species,"Jacquemontia reclinata, wide variety of plants",Solitary,Ground-nester in sandy soils,https://bugguide.net/node/view/74478,https://www.discoverlife.org/mp/20q?search=Agapostemon+splendens,,
根据评论更新了我尝试过的代码这很有效,我认为它正朝着我想要的方向前进,但在终端窗口中很难判断:
f = File.new("bee_key_fact_sheet .csv")
f.each_line { |line| puts line }
Currently playing with some kind of File.write line to add here and then close?
尝试#1
file = File.open("bee_key_fact_sheet.csv")
awk
'(NR==1){header=$0;next}
(NR%l==2) {
close(file);
file=sprintf("%s.%0.5d.csv",FILENAME,++c)
sub(/csv[.]/,"",file)
print header > file
}
{f.write}'
File.close
#AWK未被识别;显示所有可能性(y/n(";我试着返回";y";以及";是";两次都说我的答案不被识别
尝试#2
file_data = File.read("bee_key_fact_sheet.csv").split
#这是有效的,但按每个逗号拆分
尝试#3
file_data = File.foreach("bee_key_fact_sheet.csv") { |line| puts line}.split
#这返回了一些比按每个逗号拆分稍微不那么混乱的东西,但得到了这个错误消息";nil:NilClass的未定义方法"split";
尝试#4
bee_key_fact_sheet.csv.foreach('so1.csv', :headers => true, :col_sep => ",", :skip_blanks => true) do |row|
id, name = row[0], row[1]
unless (id =~ /#/)
names = name.split
end
#这没有返回
CSV输入示例(bee_key_fact_sheet.CSV(:
Genus,species,Common name,Status,Info,Time of year this bee is seen,Identification,Similar Species,Flowers,Sociality,Nest,Bug Guide,Discover Life,Other,
Agapostemon,melliventris,Honey-tailed Striped-Sweat bee,Secure G5,Excavates into deep burrows in ground nests,March-December,Agapostemon males have black and yellow stripes on the abdomen. Females have a yellow band on the lower margin of the clypeus.,All other Agapostemon species,Wide variety of plants,Solitary,"Deep, underground excavation",https://bugguide.net/node/view/70932,https://www.discoverlife.org/20/q?search=Agapostemon+melliventris,https://explorer.natureserve.org/Taxon/ELEMENT_GLOBAL.2.928401/Agapostemon_melliventris,
Agapostemon,sericeus,Silky Striped Sweat Bee,Secure G5,"Not choosy about lawn, as long as flowers are present",April-October,Agapostemon males have black and yellow stripes on the abdomen. A. sericeus males have a tooth on its hind femur. Female has metallic green abdomen.,All other Agapostemon species,Wide variety of plants,Solitary,Ground-nester in loamy soils,https://bugguide.net/node/view/83023,https://www.discoverlife.org/mp/20q?search=Agapostemon+sericeus,https://www.sharpeatmanguides.com/sweat-bees,
Agapostemon,splendens,Brown-winged Striped-Sweat Bee,Secure G5,This is the most common Agapostemon found in the southeast region,April-October,Agapostemon males have black and yellow stripes on the abdomen. A. splendens have brown wings. The female abdomen is often somewhat bluish.,All other Agapostemon species,"Jacquemontia reclinata, wide variety of plants",Solitary,Ground-nester in sandy soils,https://bugguide.net/node/view/74478,https://www.discoverlife.org/mp/20q?search=Agapostemon+splendens,,
在这个CSV中,所有的行(包括标题(都以逗号结尾,所以最后一列可能没有任何意义,将被丢弃
此外,数据中有逗号(带双引号的字段(,因此需要realCSV解析器来读取文件的内容BTW,您选择Ruby执行此任务是正确的,因为它的核心库中包含一个CSV解析器
以下是读取CSV的一种方法(编辑:修复旧Rubys的CSV#Row
转换(:
require 'csv'
filepath = 'bee_key_fact_sheet.csv'
CSV.foreach(filepath, headers: true) do |row|
genus, species = row[0], row[1]
#data = row[0...-1] # NOTE: not sure about the Ruby version compatibility
data = row.to_hash.values[0...-1]
filename = "#{genus}_#{species}.txt".tr("