r语言 - 打开 .mol 文件并编译信息



我正在尝试创建一个程序来打开大量文件(.mol),并从这些文件复制特定信息并将其保存到电子表格中(TAB 分隔文件"\t")。

我的计算机上有 10000 摩尔文件,看起来像SN00000001 SN00000002 SN00000003......SN00010000。

(下载链接 => http://bioinf-applied.charite.de/supernatural_new/src/download_mol.php?sn_id=SN00000001)

我有两个问题:

  1. 我已经尝试使用函数load.molecules(rcdk)和ChemmineR(loadsdf),但我没有成功地在R中打开.mol文件。

  2. 是否可以打开每个.mol文件并使用R将其保存为唯一的电子表格,例如" ID","名称","分子式"之类的特定信息?

好的,我会把代码发给你

# get the full path of your mol files
mol_files <- list.files(path = file.path(getwd(), "/Users/189919604/Desktop/Download 
SuperNatural II/SN00000001"), # specify your folder here
                    pattern = "*mol",
                    full.names = TRUE)
# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)
# create function to extract all the information 
extract_info <- function(sdfset) {
  # function to extract information from a sdfset (ChemmineR)
  # this only works if there is one molecule in the sdfset
  ID <- sdfset@SDF[[1]]@datablock["SNID"]
  Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
  Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]
  sdf_info <- tibble(SNID = ID,
                 Name = Name,
                 MolFormula = Molecular_Formula)
  return(sdf_info)
}
# read all files and extract info
df <- df %>% 
  mutate(sdf_data = map(.x = filenames,
                        .f = ~ read.SDFset(sdfstr = .x)),
         info = map(.x = sdf_data,
                    .f = ~ extract_info(sdfset = .x)))
# make a nice tibble with only the info you want
all_info <- df %>% 
  select(molecule) %>% 
  unnest(info)
# write to file
write_delim(x = all_info,
            path = file.path(getwd(), "test.tsv"),
            delim = "t")
我希望

这有效,我只用 2 mol 文件对其进行了测试。我使用ChemmineR包中的read.SDFset来读取所有 mol 文件。我使用的软件包tidyverse是处理tibbles。Tibbles实际上是具有一些额外属性/功能的数据帧。

library(tidyverse)
library(ChemmineR)
# get the full path of your mol files
mol_files <- list.files(# specify your folder here in case of windows also add your drive letter e.g.: "c:/users/path/to/my/mol_files"
                        path = "/home/rico/r-stuff/temp",
                        pattern = "*mol",
                        full.names = TRUE)
# create tibble, with filenames (incl. the full path)
df <- tibble(filenames = mol_files)
# create function to extract all the information 
extract_info <- function(sdfset) {
  # function to extract information from a sdfset (ChemmineR)
  # this only works if there is one molecule in the sdfset
  ID <- sdfset@SDF[[1]]@datablock["SNID"]
  Name <- sdfset@SDF[[1]]@header["Molecule_Name"]
  Molecular_Formula <- sdfset@SDF[[1]]@datablock["Molecular Formula"]
  sdf_info <- tibble(SNID = ID,
                     Name = Name,
                     MolFormula = Molecular_Formula)
  return(sdf_info)
}
# read all files and extract info
df <- df %>% 
  mutate(sdf_data = map(.x = filenames,
                        .f = ~ read.SDFset(sdfstr = .x)),
         info = map(.x = sdf_data,
                    .f = ~ extract_info(sdfset = .x)))
# make a nice tibble with only the info you want
all_info <- df %>% 
  select(info) %>% 
  unnest(info)
# write to file
write_delim(x = all_info,
            path = file.path(getwd(), "temp", "test.tsv"),
            delim = "t")

相关内容

最新更新