假设我有一个动态数量的来自文件的输入字符串(条形码)。我想根据与输入字符串的匹配来分割一个巨大的111GB文本文件,并将这些匹配写入文件。
我不知道需要多少输入。
我已经完成了所有的文件输入和字符串匹配,但是在输出步骤上卡住了。
理想情况下,我会为输入矢量条形码中的每个输入打开一个文件,只包含字符串。有什么方法可以打开动态数量的输出文件吗?
一种次优的方法是搜索条形码字符串作为输入参数,但这意味着我必须反复读取这个巨大的文件。
条形码输入向量只包含字符串,例如"TAGAGTAT","TAGAGTAG"
理想情况下,如果前两个字符串是输入
,输出应该是这样的file1 -> TAGAGTAT.txt
file2 -> TAGAGTAG.txt
谢谢你的帮助。
extern crate needletail;
use needletail::{parse_fastx_file, Sequence, FastxReader};
use std::str;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn read_barcodes () -> Vec<String> {
// TODO - can replace this with file reading code (OR move to an arguments based model, parse and demultiplex only one oligomer at a time..... )
// The `vec!` macro can be used to initialize a vector or strings
let barcodes = vec![
"TCTCAAAG".to_string(),
"AACTCCGC".into(),
"TAAACGCG".into()
];
println!("Initial vector: {:?}", barcodes);
return barcodes
}
fn main() {
//let filename = "test5m.fastq";
let filename = "Undetermined_S0_R1.fastq";
println!("Fastq filename: {} ", filename);
//println!("Barcodes filename: {} ", barcodes_filename);
let barcodes_vector: Vec<String> = read_barcodes();
let mut counts_vector: [i32; 30] = [0; 30];
let mut n_bases = 0;
let mut n_valid_kmers = 0;
let mut reader = parse_fastx_file(&filename).expect("Not a valid path/file");
while let Some(record) = reader.next() {
let seqrec = record.expect("invalid record");
// get sequence
let sequenceBytes = seqrec.normalize(false);
let sequenceText = str::from_utf8(&sequenceBytes).unwrap();
//println!("Seq: {} ", &sequenceText);
// get first 8 chars (8chars x 2 bytes)
let sequenceOligo = &sequenceText[0..8];
//println!("barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
if sequenceOligo == barcodes_vector[0]{
//println!("Hit ! Barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
counts_vector[0] = counts_vector[0] + 1;
}
您可能需要一个HashMap<String, File>
。你可以像这样从你的条形码向量构建它:
use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
fn build_file_map(barcodes: &[String]) -> HashMap<String, File> {
let mut files = HashMap::new();
for barcode in barcodes {
let filename = Path::new(barcode).with_extension("txt");
let file = File::create(filename).expect("failed to create output file");
files.insert(barcode.clone(), file);
}
files
}
你可以这样调用它:
let barcodes = vec!["TCTCAAAG".to_string(), "AACTCCGC".into(), "TAAACGCG".into()];
let file_map = build_file_map(&barcodes);
你会得到一个像这样写入的文件:
let barcode = barcodes[0];
let file = file_map.get(&barcode).expect("barcode not in file map");
// write to file
我只需要一个例子a)如何正确实例化以相关字符串命名的文件向量b)正确设置输出文件对象c)写入这些文件
下面是一个注释的例子:
use std::io::Write;
use std::fs::File;
use std::io;
fn read_barcodes() -> Vec<String> {
// read barcodes here
todo!()
}
fn process_barcode(barcode: &str) -> String {
// process barcodes here
todo!()
}
fn main() -> io::Result<()> {
let barcodes = read_barcodes();
for barcode in barcodes {
// process barcode to get output
let output = process_barcode(&barcode);
// create file for barcode with {barcode}.txt name
let mut file = File::create(format!("{}.txt", barcode))?;
// write output to created file
file.write_all(output.as_bytes());
}
Ok(())
}