Python/Teradata SQL:从一个表中提取数据,然后进行一些分析,将它们放入一个表,然后将相关数据放入另一个



我有一个包含所有列的主表

student id  name    subject    marks    class
123         John    Maths      56       7
234         Meth    Maths      78       8
456         Jes     Science    67       7
678         Tom     Science    89       8
879         Ria     Maths      92       7
890         Carl    Science    88       8
654         Gerd    Science    77       8

从上表中,我想创建两个表,其中student_id作为主键,在第一个表中,我们想提取student_id,class=8的所有学生的名字,在第二个表中我想输入表1中学生的student_id、名字、主题和分数。

表1

student id  name        
234         Meth        
678         Tom         
890         Carl        
654         Gerd       

表2

student id  name    subject    marks    
234         Meth    Maths      78       
678         Tom     Science    89       
890         Carl    Science    88       
654         Gerd    Science    77      

如何在Teradata SQL或Python中做到这一点?以上数据只是一个例子。我正在同时处理大量的数据。所以我想找到最好的方法。

谢谢:(

create table students as 
(select unique student_id, "name" from main_table where "class"=8) 
with data primary index (student_id);
create table marks as 
(select s.student_id, s."name", subject, marks 
from main_table mt, students s 
where s.student_id=mt.student_id) with data; 

class和name都是Teradata中的保留字,这就是双引号的原因。

p.s.不管怎样,Meth是一个什么样的学生名字?这是化学课还是数学课?

SQL:

CREATE MULTISET TABLE table1 AS (
SELECT
student_id
,name
FROM maintable
WHERE class = 8
)WITH DATA PRIMARY INDEX(student_id)
CREATE MULTISET TABLE table2 AS (
SELECT
student_id
,name
,subject
,marks
FROM maintable
WHERE student_id IN (SELECT student_id FROM table1)
)WITH DATA PRIMARY INDEX(student_id)

Python:

data = {'student_id': [123,234,456,678,879,890,654], 
'name':['John', 'Meth', 'Jes', 'tom', 'Ria', 'Carl', 'Gerd'],
'subject':['Maths', 'Maths', 'Science', 'Science', 'Maths', 'Science','Science'],
'marks':[56,78,67,89,92,88,77],
'class':[7,8,7,8,7,8,8]
}
maintable = pd.DataFrame(data=data)

table1 = maintable[['student_id', 'name']].where(maintable['class'] == 8).dropna().astype({"student_id": int})

student_id  name
1   234         Meth
3   678         tom
5   890         Carl
6   654         Gerd

student_ids = table1['student_id']
table2 = maintable.loc[:, 'student_id':'marks'].where(maintable['student_id'].isin(student_ids)).dropna().astype({"student_id": int, 'marks':int})
student_id  name    subject  marks
1   234         Meth    Maths    78
3   678         tom     Science  89
5   890         Carl    Science  88
6   654         Gerd    Science  77

最新更新