使用Python和CSV匹配字符串时遇到问题



如果我恢复,我正在尝试制作一个Python脚本,可以从终端给出的医疗任务中读取症状,并将其与datasset .csv中的其他症状进行比较,然后给出患者形成任务的内容很可能患有。

我的问题是它似乎不读取数据集。csv,只是给我:

The patient is likely suffering from d.

dataset.csv如下所示:

Asthma, Wheezing, coughing, chest tightness, and shortness of breath
Atelectasis, Shortness of breath, chest pain or discomfort, and a cough
Atypical pneumonia, Fever, chills, chest pain or discomfort, and shortness of breath
Basal cell carcinoma, Flat, pale, or yellowish patch of skin
Bell's palsy, Facial droop or weakness, numbness, pain around the jaw
Biliary colic, Pain in the upper abdomen that may spread to the shoulder or back
Bladder cancer, Blood in the urine, pain or burning with urination, and frequent urination
Brain abscess, Headache, fever, confusion, drowsiness, seizures, and weakness

和我的脚本如下:

#!/usr/bin/env python3
import argparse
import csv
# Parse the command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-t', '--task', help='The symptoms to search for in the dataset')
parser.add_argument('-d', '--dataset', help='The dataset to search in')
args = parser.parse_args()
# Get the task symptoms
task_symptoms = args.task.split(', ')
# Initialize a dictionary to store disease counts
disease_counts = {}
# Open the dataset
try:
# Open the dataset
with open(args.dataset, 'r') as csv_file:
csv_reader = csv.reader('dataset.csv')
# Iterate through each row
for row in csv_reader:

# Get the disease and symptoms
disease = row[0].strip()
symptoms = row[1:]

# Initialize the count
count = 0

# Iterate through each symptom in the task
for task_symptom in task_symptoms:

# Iterate through each symptom in the dataset
for symptom in symptoms:
# If the symptom matches a symptom in the task
if task_symptom == symptom:

# Increment the count
count += 1
# Store the disease name and count in the dictionary
disease_counts[disease] = count
# Get the maximum count
max_count = max(disease_counts.values())
# Get the most probable disease from the counts
most_probable_disease = [k for k, v in disease_counts.items() if v == max_count][0]
print(f'The patient is likely suffering from {most_probable_disease}.')
except FileNotFoundError:
print("Error: Could not open the file.")

我做错了什么?

我所排除的一个例子是(取决于症状):

The patient is likely suffering from Asthma 

已经三个星期了,但我还是想不明白。

谢谢你的帮助

我认为问题出在csv文件的格式上。

Asthma, Wheezing, coughing, chest tightness, and shortness of breath

因为每个逗号后面都有一个空格,所以csv文件中的这一行将产生这些字段:

row[0] = "Asthma"
row[1] = " Wheezing"
row[2] = " coughing"
row[3] = " chest tightness"
row[4] = " and shortness of breath"

看到第一个字段之后的所有字段都以空格开头了吗?字符串" coughing"与字符串"coughing"不匹配

默认情况下,使用csv.reader()读取CSV文件时,每个值单独在,上分割。您的CSV包含附加的空格,这些空格将包含在值中。例如,您可以使用以下CSV文件进行测试:

Asthma,Wheezing,coughing,chest tightness,and shortness of breath

您可以为csv.reader()使用skipinitialspace=True参数。这将确保每个symptom不以空格字符开头。

例如:

csv_reader = csv.reader(csv_file', skipinitialspace=True)

或者,您可以通过为每个symptom使用.strip()来确保没有额外的空格:

if task_symptom == symptom.strip():

您可能还希望通过将两个参数转换为小写来确保比较不区分大小写:

if task_symptom.lower() == symptom.strip().lower():

其他几个问题:

  1. csv_reader()的调用应该传递文件句柄csv_file而不是文件名。这就是为什么你得到d作为一种可能的疾病。如果你是print(row),你会看到问题。

  2. 缩进错误。您需要确保遍历with语句中的行。您的代码在with语句之外,因此文件将自动关闭。

试如下:

import argparse
import csv
# Parse the command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('-t', '--task', help='The symptoms to search for in the dataset')
parser.add_argument('-d', '--dataset', help='The dataset to search in')
args = parser.parse_args()
# Get the task symptoms
task_symptoms = args.task.split(', ')
# Initialize a dictionary to store disease counts
disease_counts = {}
# Open the dataset
try:
# Open the dataset
with open(args.dataset, 'r') as csv_file:
csv_reader = csv.reader(csv_file, skipinitialspace=True)
# Iterate through each row
for row in csv_reader:
# Get the disease and symptoms
disease = row[0].strip()
symptoms = row[1:]

# Initialize the count
count = 0

# Iterate through each symptom in the task
for task_symptom in task_symptoms:

# Iterate through each symptom in the dataset
for symptom in symptoms:
# If the symptom matches a symptom in the task
if task_symptom.lower() == symptom.lower():

# Increment the count
count += 1
# Store the disease name and count in the dictionary
disease_counts[disease] = count

print(disease_counts)
# Get the maximum count
max_count = max(disease_counts.values())
# Get the most probable disease from the counts
most_probable_disease = [k for k, v in disease_counts.items() if v == max_count][0]
print(f'The patient is likely suffering from {most_probable_disease}.')
except FileNotFoundError:
print("Error: Could not open the file.")    

相关内容

最新更新