从pandas列中选择特定值并求和



我在csv文件中有8个填充,格式列也是:弹出

我正在尝试此代码仅提取AD和DP值:

import io
import os
import pandas as pd

def read_vcf(path1):
with open(path1, 'r') as f:
lines = [l for l in f if not l.startswith('##')]
return pd.read_csv(
io.StringIO(''.join(lines)),
dtype={'#CHROM': str, 'POS': int, 'ID': str, 'REF': str, 'ALT': str,
'QUAL': str, 'FILTER': str, 'INFO': str},
sep='t'
).rename(columns={'#CHROM': 'CHROM'})

def extract_AD(info):
AD= int((info.split(':')[1]).split(',')[0])
return AD
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/test_1.vcf"
file =read_vcf(path1)

pop1 = file[["FORMAT","NEN_001","NEN_003","NEN_200","NEN_300","LAB_004","LAB_300","LAB_400","LAB_500"]]
cols_to_apply = ["NEN_001","NEN_003","NEN_200","NEN_300","LAB_004","LAB_300","LAB_400","LAB_500"]
tst1pop1 = pd.DataFrame(pop1)
AD= tst1pop1[cols_to_apply].applymap(extract_AD)
#AD= pop1["NEN_001"].apply(extract_AD)

def extract_DP(info):
DP = info.split(':')[2:3]
return DP
print("AD Values:"+"n",AD)

DP= tst1pop1[cols_to_apply].applymap(extract_DP)
print("DP Values:n",DP)

Sum1 = AD.sum(axis=1)
print(Sum1)
SumAD = sum(Sum1)
print(SumAD)

但它在列表中给了我DP值,所以我无法对它们进行求和

输出:输出

如何从列表中获取整数中的dp值,以便按行求和?

应该这样做吗?:如何删除熊猫数据帧中的方括号

df['value'] = df['value'].str[0]

如果您想编辑DP的所有列,一个简单的修复方法是循环遍历列并更改值,如下所示:

DP= tst1pop1[cols_to_apply].applymap(extract_DP)
for column in DP: 
DP[column] = DP[column].str[0].astype(int)
print("DP Values:n",DP)

最新更新