从 T SQL 中的 varchar 列生成唯一 ID



我有两个表要连接。这两个表都包含一些我用来连接它们的 varchar 列。但是,运行查询以使用 varchar 列进行计算以便连接是一个缓慢的过程。因此,我想将这些 varchar 列转换为唯一的整数 id,以便比较更快。

SELECT /*do calculations*/
FROM   [dbo].[messages]  m WITH (NOLOCK)
JOIN   [dbo].[jointable] j ON j.address = m.orig OR j.address = m.recip

地址、orig 和 recip 是包含字符串的列,最好有 id 以提高性能。我意识到该部分ON j.address = m.orig OR j.address = m.recip会降低性能。

我要联接的表具有以下结构:

CREATE TABLE [dbo].[jointable](
    [displayname] [nvarchar](256) NULL,
    [alias] [nvarchar](129) NULL,
    [firstname] [nvarchar](129) NULL,
    [lastname] [nvarchar](129) NULL,
    [address] [nvarchar](256) NULL,
    [company] [nvarchar](129) NULL,
    [department] [nvarchar](129) NULL,
    [office] [nvarchar](129) NULL) 

CREATE TABLE [dbo].[messages](
    [messageid] [bigint] NOT NULL,
    [message] [varchar](150) NULL,
    [orig] [nvarchar](256) NULL,
    [recip] [nvarchar](256) NULL)

我该怎么做?有什么函数可以将 id 从 varchar 转换为整数吗?提前谢谢你。

  1. 规范化表中的数据
  2. 将 int 主键添加到第一个表
  3. 将 int 外键添加到第二个表。从第一个表中设置相应的值
  4. 通过整数键加入

您可以从 VARCHAR 生成 GUID,但我怀疑您对此感到满意(因此您需要其他答案中建议的某种映射表(。只是为了说明原则:

如果您的字符串在 16 个字节内很短且唯一,这可能对您有用:

DECLARE @tbl TABLE(SomeString VARCHAR(100),TheGUID UNIQUEIDENTIFIER);
--a GUID is a 16-Byte(128 bit) sized type
INSERT INTO @tbl(SomeString) VALUES
 ('test1')
,('Some short text')
,('Some very very very very long text')
,('Some very very very very long text which is the same as the other one in the first 16 bytes');
UPDATE @tbl SET TheGUID=CAST(CAST(SomeString AS VARBINARY(16)) AS UNIQUEIDENTIFIER);
SELECT SomeString
      ,TheGUID
      ,CAST(CAST(TheGUID AS VARBINARY(16)) AS VARCHAR(16))
FROM @tbl;

结果(滚动到一边(

+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| SomeString                                                                                  | TheGUID                              | (Kein Spaltenname) |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| test1                                                                                       | 74736574-0031-0000-0000-000000000000 | test1              |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some short text                                                                             | 656D6F53-7320-6F68-7274-207465787400 | Some short text    |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some very very very very long text                                                          | 656D6F53-7620-7265-7920-766572792076 | Some very very v   |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some very very very very long text which is the same as the other one in the first 16 bytes | 656D6F53-7620-7265-7920-766572792076 | Some very very v   |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+

首先尝试在列上添加索引(即使它们是VARCHAR(。如果您仍在为性能而苦苦挣扎,则可以使用以下方法按整数值连接。

-- Create a table to link a varchar with an integer
CREATE TABLE WordIndex(
    WordID INT IDENTITY PRIMARY KEY,
    Word VARCHAR(500))
CREATE NONCLUSTERED INDEX NCI_WordIndex_Word ON WordIndex (Word)
GO
-- Load the table with all available words
INSERT INTO WordIndex (
    Word)
SELECT DISTINCT
    YourVarcharColumn
FROM
    YourTable
UNION
SELECT DISTINCT
    YourOtherVarcharColumn
FROM
    YourSecondTable
GO

-- Add the integer ID to your tables
ALTER TABLE YourTable ADD WordID INT
ALTER TABLE YourSecondTable ADD WordID INT
ALTER TABLE YourTable ADD FOREIGN KEY (WordID) REFERENCES WordIndex (WordID)
ALTER TABLE YourSecondTable ADD FOREIGN KEY (WordID) REFERENCES WordIndex (WordID)
GO
-- Optionally (but recommended) add indexes on the ID
CREATE NONCLUSTERED INDEX NCI_YourTable_WordID ON YourTable (WordID)
CREATE NONCLUSTERED INDEX NCI_YourSecondTable_WordID ON YourSecondTable (WordID)
GO

-- Update the integer ID
UPDATE T SET
    WordID = W.WordID
FROM
    YourTable AS T
    INNER JOIN WordIndex AS W ON T.Word = W.Word
UPDATE T SET
    WordID = W.WordID
FROM
    YourSecondTable AS T
    INNER JOIN WordIndex AS W ON T.Word = W.Word
GO

-- Join by integer
SELECT
    1
FROM
    YourTable AS T
    INNER JOIN YourSecondTable AS N ON T.WordID = N.WordID

使用此方法需要维护单词索引表。

需要查看表格设计才能正确回答。

如果您只是为关系添加 int 并正确分配 int,那么问题是如果您稍后更改数据,那么 int 关系是错误的。

如果索引这些列,varchar 连接并不比 int 快多少。

可能是相同的查询计划,但

SELECT /*do calculations*/
FROM   [dbo].[messages]  m 
JOIN   [dbo].[jointable] j ON j.address in (m.orig, m.recip)

最新更新