获取mysql中unicode字符的主键



我正试图编写一个查询,将分析跨多个表中存储在主键中的数据,以查找它们是否包含unicode字符。

下面是我的表的模式:

mysql> SHOW CREATE TABLE employee_plain;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table          | Create Table                                                                                                                                                                                                                                                                |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_plain | CREATE TABLE `employee_plain` (
`emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`age` int(3) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE employee_unicode;
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table            | Create Table                                                                                                                                                                                                                                                                  |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| employee_unicode | CREATE TABLE `employee_unicode` (
`emp_id` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`emp_name` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`age` int(3) DEFAULT NULL,
PRIMARY KEY (`emp_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

下面是两个表上的数据存储。表employee_unicode包含PRIMARY KEY列中的unicode值:

mysql> select * from employee_plain;
+------------------------------+----------+------+
| emp_id                       | emp_name | age  |
+------------------------------+----------+------+
| asdasd123                    | abcsd    |   12 |
| fsoiuioujvsdf4               | abvkd    |   13 |
| sdfgjshgjshdfljsfklju4532489 | sdfsdff  |   11 |
+------------------------------+----------+------+
3 rows in set (0.00 sec)
mysql> select * from employee_unicode;
+--------------------------------------------------------------+----------+------+
| emp_id                                                       | emp_name | age  |
+--------------------------------------------------------------+----------+------+
| A ΠΛΦΟΙΚ ΑΕ#1420000000000000000                              | sdfsf    |   11 |
| sdfsdfsf234                                                  | fsdfsd   |   12 |
| ΑΣΕΛ - ΑΦΟΙ. ΣΕΛΙΔΗ Α.Ε.#000000000000000                     | sdfsd    |   13 |
| ΦΩΤΗΣ#10000000000                                            | sdfsdfd  |   14 |
+--------------------------------------------------------------+----------+------+
4 rows in set (0.00 sec)

我尝试了各种查询使用ASCII,二进制和REGEX:

mysql> SELECT
->   TABLE_NAME,
->   COLUMN_NAME,
->   COLUMN_TYPE,
->   IF( COLUMN_NAME REGEXP '[^x00-x7F]', 'Contains Unicode', 'No Unicode') AS Unicode_validation
-> FROM
->   information_schema.columns
-> WHERE
->   table_schema = 'amv_testdb' AND
->   COLUMN_KEY = 'PRI'
-> ORDER BY
->   TABLE_NAME,
->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | No Unicode         |
| employee_unicode | emp_id      | varchar(100) | No Unicode         |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)
mysql> SELECT
->   TABLE_NAME,
->   COLUMN_NAME,
->   COLUMN_TYPE,
->   IF( COLUMN_NAME <> CONVERT( COLUMN_NAME USING ASCII), 'No Unicode', 'Contains Unicode') AS Unicode_validation
-> FROM
->   information_schema.columns
-> WHERE
->   table_schema = 'amv_testdb' AND
->   COLUMN_KEY = 'PRI'
-> ORDER BY
->   TABLE_NAME,
->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | Contains Unicode   |
| employee_unicode | emp_id      | varchar(100) | Contains Unicode   |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)
mysql> SELECT
->   TABLE_NAME,
->   COLUMN_NAME,
->   COLUMN_TYPE,
->   IF(CONVERT(COLUMN_NAME USING BINARY) <> COLUMN_NAME, 'Contains Unicode', 'No Unicode') AS Unicode_validation
-> FROM
->   information_schema.columns
-> WHERE
->   table_schema = 'amv_testdb' AND
->   COLUMN_KEY = 'PRI'
-> ORDER BY
->   TABLE_NAME,
->   ORDINAL_POSITION;
+------------------+-------------+--------------+--------------------+
| TABLE_NAME       | COLUMN_NAME | COLUMN_TYPE  | Unicode_validation |
+------------------+-------------+--------------+--------------------+
| employee_plain   | emp_id      | varchar(100) | No Unicode         |
| employee_unicode | emp_id      | varchar(100) | No Unicode         |
+------------------+-------------+--------------+--------------------+
2 rows in set (0.00 sec)

请帮助我理解为什么我得到错误的结果。

如果查找包含非ASCII字符的数据,则将存储值与使用ASCII转换的值进行比较,例如WHERE emp_id <> CONVERT(emp_id USING ASCII)

CREATE TABLE employees (
emp_id VARCHAR(255),
emp_name VARCHAR(255),
age INT
);
INSERT INTO employees (emp_id, emp_name, age)
VALUES
('A ΠΛΦΟΙΚ ΑΕ#1420000000000000000', 'sdfsf', 11),
('sdfsdfsf234', 'fsdfsd', 12),
('ΑΣΕΛ - ΑΦΟΙ. ΣΕΛΙΔΗ Α.Ε.#000000000000000', 'sdfsd', 13),
('ΦΩΤΗΣ#10000000000', 'sdfsdfd', 14);

SELECT *
FROM employees
WHERE emp_id <> CONVERT(emp_id USING ASCII)
emp_id一个ΠΛΦΟΙΚΑΕ# 1420000000000000000ΑΣΕΛ - ΑΦΟΙ。ΣΕΛΙΔΗΑΕ。#000000000000000ΦΩΤΗΣ# 10000000000
WHERE HEX(emp_id) REGEXP '^(..)*[CDEFcdef]'

将发现任何至少有一个UTF-8字符的emp_id。(即CHARACTER SET utf8utf8mb4)

最新更新