如果不符合条件,则需要性能良好的MYSQL查询来更新和导入数据,并移动到另一个表



我有一个数据库,包括:

  • loaddata_temp表(我的主表)
  • error_data表格
  • company_category表格
  • company_industry
  • company_level表格
  • company_type表格

这里是我的主表(loaddata_temp)

company_id  company_name                         company_parent_id  company_type_id  company_category_id  company_industry_id  company_level_id  
----------  -----------------------------------  -----------------  ---------------  -------------------  -------------------  ----------------  
2  A Plus Lawn Care                                     0          Partner                   PT                  ATL       Head office  
3  A. L. Price                                          0          Partner                   CV                  ATL       Head office  
4  A.J. August Fashion Wear                             0          Partner                   UD                  ATL       Head office  
5  A+ Electronics                                       0          Partner             KOPERASI                  LAT       Head office  
6  A+ Investments                                       0         Customer               Warung                  AAA       CITY OFFICE  
7  Aaronson Furniture                                   0            OTHER                   PT                  ATL       Head office  
8  ABC Markets                                          0             Test                   CV                  ATL       Head office  

主表有8个主键列。company_parent_idcompany_category_id等(每个包含_id的列都是primary_key)

这是我的参考表

company_type_id  company_type_description  
---------------  ------------------------  
1  Costumer                  
2  Partner                   
3  Other                     
18  Competitor   

关于我需要将数据从CSV导入Mysql:的背景信息

我按照这段代码从csv中导入了数千个数据。

http://www.softwareprojects.com/resources/programming/t-how-to-use-mysql-fast-load-data-for-updates-1753.html

上面的链接确实帮助我改进了我的查询。(感谢道恩·罗西)

在插入到实际表(company)之前,我需要验证loaddata_temp中的主键或将其转换为与另一个表(如company_categorycompany_industry等)的id.redifference。

所以我做了下面的代码。

$sql_updates[]="UPDATE company 
LEFT JOIN   loaddata_temp 
ON      company.company_name = loaddata_temp.company_name
SET     loaddata_temp.company_name = COALESCE( concat('Error Found Duplicate ',loaddata_temp.company_name),loaddata_temp.company_name)";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   demography_country 
ON  demography_country.demography_country_name = loaddata_temp.demography_country_id
SET     loaddata_temp.demography_country_id = COALESCE(demography_country.demography_country_id, concat('Error ',loaddata_temp.demography_country_id,' Your Country Not In the List'))";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   demography_city 
ON  demography_city.demography_city_name = loaddata_temp.demography_city_id
SET     loaddata_temp.demography_city_id = COALESCE(demography_city.demography_city_id,concat('Error ',loaddata_temp.demography_city_id,' : Your city Not In the List'))";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   demography_provinces 
ON  demography_provinces.demography_province_name = loaddata_temp.demography_province_id
SET     loaddata_temp.demography_province_id = COALESCE(demography_provinces.demography_province_id, concat('Error ',loaddata_temp.demography_province_id,' : Your Province Not In the List'))";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   company_type 
ON  company_type.company_type_description = loaddata_temp.company_type_id
SET     loaddata_temp.company_type_id = COALESCE(company_type.company_type_id, concat('Error ',loaddata_temp.company_type_id,' : Your company Type Not In the List'))";
$sql_updates[]="UPDATE  loaddata_temp lt
LEFT JOIN   company_category cc
ON  cc.company_category_description = lt.company_category_id
SET     lt.company_category_id = COALESCE(cc.company_category_id, concat('Error ',lt.company_category_id,' : Your Company Category Not In the List'))";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   company_level 
ON  company_level.company_level_description = loaddata_temp.company_level_id
SET     loaddata_temp.company_level_id = COALESCE(company_level.company_level_id,  concat('Error ',loaddata_temp.company_level_id,' : Your Company Level Not In the List'))";
$sql_updates[]="UPDATE loaddata_temp 
LEFT JOIN   company_industry 
ON  company_industry.company_industry_short_description = loaddata_temp.company_industry_id
SET     loaddata_temp.company_industry_id = COALESCE(company_industry.company_industry_id, concat('Error ',loaddata_temp.company_industry_id,' : Your Company Industry Not In the List'))";

根据上面的代码,结果是

company_id  company_name                         company_parent_id  company_type_id  company_category_id  company_industry_id  company_level_id  
----------  -----------------------------------  -----------------  ---------------  -------------------  -------------------  ----------------  
2  A Plus Lawn Care                                     0                2                    3                    1                 1  
3  A. L. Price                                          0                1                    4                    5                 1  
4  A.J. August Fashion Wear                             0                2                    5                    7                 1  
5  A+ Electronics                                       0                2                   23                Error                 1  
6  A+ Investments                                       0                1                Error                Error            Errror  
7  Aaronson Furniture                                   0                3                    3                    1                 1  
8  ABC Markets                                          0            ERROR                    4                    1                 1  

根据上面的结果,我需要用下面的代码将有错误的数据分离(移动)到表error_data中。

INSERT INTO error_data 
SELECT * FROM loaddata_temp 
WHERE SUBSTRING_INDEX(company_name,' ',1)='Error' or
company_category_id REGEXP '^[A-Za-z :]+$' OR 
company_type_id REGEXP '^[A-Za-z :]+$' OR
company_industry_id REGEXP '^[A-Za-z :]+$' OR
company_level_id REGEXP '^[A-Za-z :]+$' OR
demography_city_id REGEXP '^[A-Za-z :]+$' OR
demography_country_id REGEXP '^[A-Za-z :]+$' OR
demography_province_id REGEXP '^[A-Za-z :]+$'
DELETE FROM loaddata_temp  
WHERE SUBSTRING_INDEX(company_name,' ',1)='Error' or
company_category_id REGEXP '^[A-Za-z :]+$' OR 
company_type_id REGEXP '^[A-Za-z :]+$' OR
company_industry_id REGEXP '^[A-Za-z :]+$' OR
company_level_id REGEXP '^[A-Za-z :]+$' OR
demography_city_id REGEXP '^[A-Za-z :]+$' OR
demography_country_id REGEXP '^[A-Za-z :]+$' OR
demography_province_id REGEXP '^[A-Za-z :]+$'

现在在表loaddata_temp中没有数据错误,因为它已经移动到表error_data(见下表)中

company_id  company_name                         company_parent_id  company_type_id  company_category_id  company_industry_id  company_level_id  
----------  -----------------------------------  -----------------  ---------------  -------------------  -------------------  ----------------  
5  A+ Electronics                                       0                2                   23                Error                 1  
6  A+ Investments                                       0                1                Error                Error            Errror  
8  ABC Markets                                          0            ERROR                    4                    1                 1  

问题

  • 我需要查询将error_data表回滚到引用表或没有id的原始数据中。(见下文)

    company_id company_name company_rent_id company_type_id company_category_id company-industry_id company_level_id


    5  A+ Electronics                                       0          partner                   23                Error                 1  
    6  A+ Investments                                       0         costumer                Error                Error            Errror  
    8  ABC Markets                                          0            ERROR                    4                    1                 1  
    
  • 我需要关于上面所有代码的建议,以使它更漂亮并具有良好的性能。

  • 如果有任何参考链接与我的问题有关,请通知我。

    非常感谢您的帮助

让我看看是否可以重新表述这个问题。。。您有一个包含字符串的CSV文件(如"Head office"),并且您希望根据规范化表规范化这些值,将该字符串转换为数字(如"3")?

此外,CSV文件可能有新的字符串,因此您需要在Normalization表中INSERT新的字符串编号对吗?

然后,您想要构建(或添加到)一个只有id(如"3")而没有字符串("总行")的表。

我在博客中讨论了一个非常有效的2-SQL解决方案。第一个SQL发现并插入任何新行(INSERT ... SELECT ... LEFT JOIN ...)。第二个发现所有id(UPDATE ... JOIN ... SET ...)。

(如果这不是你问题的重点,那么我不得不抱怨缺乏简单性/清晰度。)

最新更新