我有一个包含所有列的主表
student id name subject marks class
123 John Maths 56 7
234 Meth Maths 78 8
456 Jes Science 67 7
678 Tom Science 89 8
879 Ria Maths 92 7
890 Carl Science 88 8
654 Gerd Science 77 8
从上表中,我想创建两个表,其中student_id作为主键,在第一个表中,我们想提取student_id,class=8的所有学生的名字,在第二个表中我想输入表1中学生的student_id、名字、主题和分数。
表1
student id name
234 Meth
678 Tom
890 Carl
654 Gerd
表2
student id name subject marks
234 Meth Maths 78
678 Tom Science 89
890 Carl Science 88
654 Gerd Science 77
如何在Teradata SQL或Python中做到这一点?以上数据只是一个例子。我正在同时处理大量的数据。所以我想找到最好的方法。
谢谢:(
create table students as
(select unique student_id, "name" from main_table where "class"=8)
with data primary index (student_id);
create table marks as
(select s.student_id, s."name", subject, marks
from main_table mt, students s
where s.student_id=mt.student_id) with data;
class和name都是Teradata中的保留字,这就是双引号的原因。
p.s.不管怎样,Meth是一个什么样的学生名字?这是化学课还是数学课?
SQL:
CREATE MULTISET TABLE table1 AS (
SELECT
student_id
,name
FROM maintable
WHERE class = 8
)WITH DATA PRIMARY INDEX(student_id)
CREATE MULTISET TABLE table2 AS (
SELECT
student_id
,name
,subject
,marks
FROM maintable
WHERE student_id IN (SELECT student_id FROM table1)
)WITH DATA PRIMARY INDEX(student_id)
Python:
data = {'student_id': [123,234,456,678,879,890,654],
'name':['John', 'Meth', 'Jes', 'tom', 'Ria', 'Carl', 'Gerd'],
'subject':['Maths', 'Maths', 'Science', 'Science', 'Maths', 'Science','Science'],
'marks':[56,78,67,89,92,88,77],
'class':[7,8,7,8,7,8,8]
}
maintable = pd.DataFrame(data=data)
table1 = maintable[['student_id', 'name']].where(maintable['class'] == 8).dropna().astype({"student_id": int})
student_id name
1 234 Meth
3 678 tom
5 890 Carl
6 654 Gerd
student_ids = table1['student_id']
table2 = maintable.loc[:, 'student_id':'marks'].where(maintable['student_id'].isin(student_ids)).dropna().astype({"student_id": int, 'marks':int})
student_id name subject marks
1 234 Meth Maths 78
3 678 tom Science 89
5 890 Carl Science 88
6 654 Gerd Science 77