循环三个列表以创建组合输出



我正在进行一个MapReduce项目,我的输入是(天、站、温度(,我的目标是每天输出每个站的最高和最低温度。所以基本上,对于这个输入,我会有一个看起来像这样的输出:

输入:

20200101, station1, 35
20200101, station1, 44
20200101, station1, 77
20200101, station3, 66,
20200101, station3, 99
20200102, station1, 54, 
20200102, station2, 55, 

输出:

20200101, station1, max(77) min(35)
20200101, station3, max(99) min(66)
20200102, station1, max(54) min(..)
20200102, station2, max(55) min(..)

到目前为止,我所尝试的仅适用于2个列表,不适用于3个列表:为每一天,找到每一个气象站,为每一个天气站找到每一种温度。。。

以下是我迄今为止尝试的代码:

# Read file txt file in 
file1 = open('bigdatatemp.txt', 'r') 
Lines = file1.readlines() 
Lines ouput: (the variables that are important are (WBAN NUMBER = station, YearMonthDay = day, DryBulb Temp = temperature) 
['Wban Number, YearMonthDay, Time, Station Type, Maintenance Indicator, Sky Conditions, Visibility, Weather Type, Dry Bulb Temp, Dew Point Temp, Wet Bulb Temp, % Relative Humidity, Wind Speed (kt), Wind Direction, Wind Char. Gusts (kt), Val for Wind Char., Station Pressure, Pressure Tendency, Sea Level Pressure, Record Type, Precip. Totaln',
'03011,20070401,0050,AO2 ,-,SCT055                                       ,10SM   ,-,32,23,28,69  , 4   ,130,-,0  ,30.13,-,-,AA,-n',
'03011,20070401,0150,AO2 ,-,BKN055                                       ,10SM   ,-,32,23,28,69  , 4   ,140,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0250,AO2 ,-,OVC050                                       ,10SM   ,-,32,23,28,69  , 3   ,130,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0350,AO2 ,-,OVC050                                       ,10SM   ,-,34,23,30,64  , 3   ,120,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0450,AO2 ,-,BKN050                                       ,10SM   ,-,34,23,30,64  , 4   ,130,-,0  ,30.11,-,-,AA,-n',
'03011,20070401,0550,AO2 ,-,SCT050 SCT070                                ,10SM   ,-,32,25,28,75  , 3   ,150,-,0  ,30.10,-,-,AA,-n',
'03011,20070401,0650,AO2 ,-,SCT070                                       ,10SM   ,-,34,25,30,70  , 3   ,130,-,0  ,30.12,-,-,AA,-n',
'03012,20070401,0750,AO2 ,-,CLR                                          ,10SM   ,-,37,27,34,67  , 4   ,140,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0850,AO2 ,-,SCT060 BKN075                                ,10SM   ,-,41,27,36,58  , 0   ,000,-,0  ,30.13,-,-,AA,-n',
'03011,20070401,0950,AO2 ,-,SCT060 OVC075                                ,10SM   ,-,45,23,37,42  , 0   ,000,-,0  ,30.14,-,-,AA,-n',

然后我创建了一个dictionairy,并创建了3个包含所需变量(站、年、温度(的列表

# Create a dictionary
# Iterate each line
# If the key doesn't exist, create one equal to empty list
# Otherwise, append temperature to list
# This also uses an interim dictionary (tmp).
years = []
stations = []
temps = []
for line in Lines:
(station, year, ac, ad, af, ag, ah, aj, temp, al, ae, ar, at, ay, au, ai, alc, ap, ax, av, an) = line.split(',')
stations.append(station)
years.append(year)
temps.append(temp)

最后但并非最不重要的是我被困的地方。我为两个列表创建了一个循环,并对它们进行迭代:

dayTemps = {d:[] for d in stations}
for d,t in zip(stations,temps): dayTemps[d].append(t)
print(dayTemps)
output:
{'Wban Number': [' Dry Bulb Temp'], '03011': ['32', '32', '32', '34', '34', '32', '34', '41', '45', '55', '54', '54', '52', '46', '43', '43', '43'], '03012': ['37', '46', '54', '46', '45', '43'], '03013': ['50', '52', '50', '46', '45'], '03014': ['45']}

但实际上我也需要day变量,我似乎无法理解它。它应该是一个以day为关键字、以我上面的字典为值的字典吗?另外,我该如何构建它,以便获得每个气象站的最高和最低温度,这是一步还是两步多步?

或多或少下面的

data = {}
MIN = 0
MAX = 1
DATE = 0
STATION = 1
VALUE = 2
with open('in.txt') as f:
lines = [line.strip() for line in f.readlines()]
for line in lines:
fields = [f.strip() for f in line.split(',')]
if data.get(fields[DATE]) is None:
data[fields[DATE]] = {}
if fields[STATION] not in data[fields[DATE]]:
data[fields[DATE]][fields[STATION]] = [None, None]
if data[fields[DATE]][fields[STATION]][MIN] is None:
data[fields[DATE]][fields[STATION]][MIN] = (int(fields[VALUE]))
else:
if data[fields[DATE]][fields[STATION]][MIN] > int(fields[VALUE]):
data[fields[DATE]][fields[STATION]][MIN] = (int(fields[VALUE]))
if data[fields[DATE]][fields[STATION]][MAX] is None:
data[fields[DATE]][fields[STATION]][MAX] = (int(fields[VALUE]))
else:
if data[fields[DATE]][fields[STATION]][MAX] < int(fields[VALUE]):
data[fields[DATE]][fields[STATION]][MAX] = (int(fields[VALUE]))

for date, stations in data.items():
for station, values in stations.items():
print(f'{date} {station} {values}')

in.txt

20200101, station1, 35
20200101, station1, 44
20200101, station1, 77
20200101, station3, 66
20200101, station3, 99
20200102, station1, 54
20200102, station2, 55

输出

20200101 station1 [35, 77]
20200101 station3 [66, 99]
20200102 station1 [54, 54]
20200102 station2 [55, 55]
lines = ['Wban Number, YearMonthDay, Time, Station Type, Maintenance Indicator, Sky Conditions, Visibility, Weather Type, Dry Bulb Temp, Dew Point Temp, Wet Bulb Temp, % Relative Humidity, Wind Speed (kt), Wind Direction, Wind Char. Gusts (kt), Val for Wind Char., Station Pressure, Pressure Tendency, Sea Level Pressure, Record Type, Precip. Totaln',
'03011,20070401,0050,AO2 ,-,SCT055                                       ,10SM   ,-,32,23,28,69  , 4   ,130,-,0  ,30.13,-,-,AA,-n',
'03011,20070401,0150,AO2 ,-,BKN055                                       ,10SM   ,-,32,23,28,69  , 4   ,140,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0250,AO2 ,-,OVC050                                       ,10SM   ,-,32,23,28,69  , 3   ,130,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0350,AO2 ,-,OVC050                                       ,10SM   ,-,34,23,30,64  , 3   ,120,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0450,AO2 ,-,BKN050                                       ,10SM   ,-,34,23,30,64  , 4   ,130,-,0  ,30.11,-,-,AA,-n',
'03011,20070401,0550,AO2 ,-,SCT050 SCT070                                ,10SM   ,-,32,25,28,75  , 3   ,150,-,0  ,30.10,-,-,AA,-n',
'03011,20070401,0650,AO2 ,-,SCT070                                       ,10SM   ,-,34,25,30,70  , 3   ,130,-,0  ,30.12,-,-,AA,-n',
'03012,20070401,0750,AO2 ,-,CLR                                          ,10SM   ,-,37,27,34,67  , 4   ,140,-,0  ,30.12,-,-,AA,-n',
'03011,20070401,0850,AO2 ,-,SCT060 BKN075                                ,10SM   ,-,41,27,36,58  , 0   ,000,-,0  ,30.13,-,-,AA,-n',
'03011,20070401,0950,AO2 ,-,SCT060 OVC075                                ,10SM   ,-,45,23,37,42  , 0   ,000,-,0  ,30.14,-,-,AA,-n',]

lst = [i.split(',')[0:2] + [i.split(',')[8]] for i in lines[1:]]
station = set([i[0] for i in lst])
data = list(map(lambda station_now: (max([l for l in lst if l[0] == station_now]), min([l for l in lst if l[0] == station_now])), station))
for collected_data in data:
print(collected_data[0][1],collected_data[0][0],' max(',collected_data[0][2],')',' min(',collected_data[1][2],')')
>>> 20070401 03012  max( 37 )  min( 37 )
20070401 03011  max( 45 )  min( 32 )

创建子列表

然后创建另一个包含不同站号的子列表列表

然后对每个子列表进行迭代,以获得最大和最小

最新更新