SQL查询多次返回结果



我对SQL很陌生,我试图在SQL中加入一些表。我使用SQLite3Pandas,并具有以下表结构:

User
|
Measurement - Environment - meas_device - Device
|                          | 
Data                 Unit_of_Measurement

为什么我得到以下sql查询多次(4x)的结果?

query = """
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data 
FROM Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE User.name = 'nicola'
"""
pd.read_sql_query(query, conn)

我猜我在连接上做错了什么,但我看不出是什么。我希望能够在某个地方保存一个JOIN语句,用于每个可能的查询,这就是为什么要为这个查询连接更多的表。

我认为问题出在环境表上。每当我连接这个表,结果就会相乘。由于环境是meas_devices的集合,因此有多个具有相同环境id的条目。(我可以将Environment表与相应的meas_device_id保存为列表,但是我认为不可能将Environment表与meas_device表链接起来。)

id | meas_device_id
1  |      1
1  |      2
1  |      5
2  |      3
2  |      4

到目前为止,我用pandas DataFrame.to_sql()创建了表,因此id没有标记为主键或类似的东西。这就是我的问题的原因吗?

更新2我找到了问题所在。我不认为这对将来的人有什么帮助。但为了完整起见,这里给出解释。这其实不是一个如何连接表格的问题,但我忽略了一个关键的链接。因为环境有多个索引,它创建了相同的值"开放结束"这导致了结果的倍增。我需要在环境之间添加交叉检查。subset_id和Data.subset_id。下面的查询可以正常工作:

query = f""" SELECT {SELECT}
FROM Data
JOIN Measurement ON Data.measurement_id = Measurement.id
JOIN User ON Measurement.user_id = User.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.Device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE {WHERE} AND Environment.subset_id = Data.subset_id
"""

如果需要对在连接结果中产生额外行的表进行过滤,则不要连接它们,而是将它们包含在WHERE子句中的子查询中。

SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
WHERE
Measurement.Environment_id IN (
SELECT Environment.id
FROM
Environment
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE Device.name = 'xy'
)

在此子查询中,您可以连接多个表而无需生成额外的记录。

如果因为想要从其他表中选择条目而没有这个选项,您可以简单地在原始查询中添加一个DISTINCT。

SELECT DISTINCT
User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data 
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE
User.name = 'nicola'

最新更新