从 csv 文件中提取用户输入的特定列的数据(无熊猫)



我需要代码方面的帮助,该代码从我拥有的大型csv文件中获取用户输入的他们想要的特定列。在他们自己键入所需的列后,他们还必须键入整数输入。该整数输入将为他们提供该列出现次数最少的结果数。例如,如果他们输入:hospital_name"5",它将向他们显示 5 家不同的医院(该列下至少有 50 家不同的医院名称(,这些医院的计数与他们并列最低。我将编写一个示例输入和输出:

键入所需的列:hospital_name 输入您想要的最低结果数:3

输出可能如下所示:

400 births are tied to Gains Hospital                                                                            
347 births are tied to Petri Hospital 
200 births are tied to Brit Hospital 

整个csv是关于出生的报告,因此您必须计算每个项目在每列中的次数并报告(大多数计数较低(

我已经使用"with"读取了我的csv文件

我在制作连接所有这些的循环时遇到麻烦。 我知道用户输入本身将是input((和int(input(((,但这并不能将我连接回csv文件。

代码

import csv
column_name = input('Which column: ').upper()
number_lowest = int(input('How many lowest: '))
# Calculate births by specified column name
with open("data.csv", "r") as f:
reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
births_count = {}
for d in reader:
# Use column_name as key
# accumulate births for this key
if not d[column_name] in births_count:
births_count[d[column_name]] = 0
births_count[d[column_name]] += 1 # since each row is a different birth
# Find number_lowest lowest births
lowest_births = {}
for i in range(number_lowest):
# By looping number_lowest times, 
# we find this many lowest values
if len(births_count) > 0:
# find lowest births
lowest_val = 1e37 # just use a large number
# that we know actual
# count will be less than
lowest_name = ""
for name, value in births_count.items():
if value < lowest_val:
lowest_val = value
lowest_name = name
# Add to lowest births
lowest_births[lowest_name] = lowest_val
# remove from births_count
# this reduces count of items in dictionary
del births_count[lowest_name]
else:
break  # births_count is empty
# Output results
for name, births in lowest_births.items():
print(f"{births} births are tied to {name} {column_name.title()}")

测试

由逗号分隔的CSV数据组成,包含三列:出生,医院,位置

File: data.csv
HOSPITAL_NAME,BIRTH_DAY, BIRTH_YEAR, BIRTH_WEIGHT
Gains,1/14,2015,8.5 lbs
Mayo Clinic,2/11,2018,6.5 lbs
Gains,1/15,2016,8.9 lbs
Stanford Health Care,2/15,2016,7.4 lbs
Mayo Clinic,11/10,2018,7.3 lbs
Gains,1/09,2011,7.5 lbs
John Hopkins,12/23,2012,6.9 lbs
Massachusetts General,9/14,2001,8.3 lbs
Stanford Health Care,8/17,2005,7.6 lbs
Massachusetts General,7/18,2016,8.7 lbs
John Hopkins,3/11,2017,7.2 lbs
Massachusetts General,4/16,2014,7.4 lbs
Northwestern Memorial,10/12,2012,8.3 lbs
UCLA Medical Center,9/19,2011,8.1 lbs
Petri,11/21,2003,7.5 lbs
UCSF Medical Center,2/15,2004,7.9 lbs

运行示例:

Which column: hospital_name
How many lowest: 5
HOSPITAL_NAME
1 births are tied to Northwestern Memorial Hospital_Name
1 births are tied to UCLA Medical Center Hospital_Name
1 births are tied to Petri Hospital_Name
1 births are tied to UCSF Medical Center Hospital_Name
2 births are tied to Mayo Clinic Hospital_Name

使用插入排序更新查找最大值

import csv
# Source: https://www.geeksforgeeks.org/python-program-for-insertion-sort/
def insertionSort(arr): 
" Inplace location sort "
# Traverse through 1 to len(arr) 
for i in range(1, len(arr)): 
key = arr[i] 
# Move elements of arr[0..i-1], that are 
# greater than key, to one position ahead 
# of their current position 
j = i-1
while j >=0 and key < arr[j] : 
arr[j+1] = arr[j] 
j -= 1
arr[j+1] = key
def find_maxs_by_sort(data, number):
""" Finds extreems of mins or max's 
depending upn bLowest flag
"""
# Get list of key, value pairs as tuples of (value, key)
tuple_list = []
for k, v in data.items():
tuple_list.append((v, k))
# Sort will be in ascending order
# Does an inplace sort
# insertSort also works on array of tuples
# Will sort by v since it's first in the each tuple
insertionSort(tuple_list)
# Place sorted tuples back as a dictionary
# tuples are sorted by [(v1, k1), (v2, k2), ...]
# We start at the end and work backwards since sort is
# in ascending order
n = len(tuple_list)
results = {}
for i in range(n-1, n - number - 1, -1):
v, k = tuple_list[i]
results[k] = v
return results
for i in range(3):
# To do this 3 times
column_name = input('Which column: ').upper()
number = int(input('How many maxs: '))
with open("data.csv", "r") as f:
reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
births_count = {}
for d in reader:
# Use column_name as key
# accumulate births for this key
if not d[column_name] in births_count:
births_count[d[column_name]] = 0
births_count[d[column_name]] += 1 # since each row is a different birth
# find max
max_births = find_maxs_by_sort(births_count, number)
# Output results
for name, births in max_births.items():
print(f"t{births} births are tied to {name} {column_name.title()}")

相关内容

  • 没有找到相关文章

最新更新