大佬教程收集整理的这篇文章主要介绍了拉链表的创建、查询和回滚,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
-- 源系统中订单表 CREATE TABLE@H_801_41@ orders ( orderid INT@H_801_41@, createtime StriNG, modifiedtime StriNG, status StriNG ) stored AS@H_801_41@ textfile; -- 订单的增量数据表,按天分区,存放每天的增量数据,保留半年左右 CREATE TABLE@H_801_41@ t_ods_orders_inc ( orderid INT@H_801_41@, createtime StriNG, modifiedtime StriNG, status StriNG ) PARTITIONED BY (day@H_801_41@ StriNG) stored AS@H_801_41@ textfile; -- 订单的历史数据拉链表 CREATE TABLE@H_801_41@ t_dw_orders_his ( orderid INT@H_801_41@, createtime StriNG, modifiedtime StriNG, status StriNG, dw_start_date StriNG, dw_END_DATE StriNG ) stored AS textfile;
-- 1. 源表orders,假设此表为21日状态 insert into@H_801_41@ orders values@H_801_41@ (1,'2015-08-18','2015-08-18','创建'@H_801_41@), (2,'2015-08-18','2015-08-18','创建'@H_801_41@), (3,'2015-08-19','2015-08-21','支付'@H_801_41@), (4,'2015-08-19','2015-08-21','完成'@H_801_41@), (5,'2015-08-19','2015-08-20','支付'@H_801_41@), (6,'2015-08-20','2015-08-20','创建'@H_801_41@), (7,'2015-08-20','2015-08-21','支付'@H_801_41@), (8,'2015-08-21','2015-08-21','创建'@H_801_41@); -- 2. 初始化ODS增量表-21日数据(全量初始化,将21号前的累加到此分区) INSERT overwrite TABLE t_ods_orders_inc PARTITION (day = '2015-08-21'@H_801_41@) SELECT@H_801_41@ orderid, createtime, modifiedtime, status FROM@H_801_41@ orders WHERE createtime <= '2015-08-21'@H_801_41@; -- 3. 初始化ODS增量表-22日数据 insert overwrite table t_ods_orders_inc partition(day='2015-08-22'@H_801_41@) values@H_801_41@ (1,'2015-08-18','2015-08-22','支付'@H_801_41@), (2,'2015-08-18','2015-08-22','完成'@H_801_41@), (6,'2015-08-20','2015-08-22','支付'@H_801_41@), (8,'2015-08-21','2015-08-22','支付'@H_801_41@), (9,'2015-08-22','2015-08-22','创建'@H_801_41@), (10,'2015-08-22','2015-08-22','支付'@H_801_41@); -- 4. 初始化ODS增量表-23日数据 insert overwrite table t_ods_orders_inc partition(day='2015-08-23'@H_801_41@) values@H_801_41@ (1,'2015-08-18','2015-08-23','完成'@H_801_41@), (3,'2015-08-19','2015-08-23','完成'@H_801_41@), (5,'2015-08-19','2015-08-23','完成'@H_801_41@), (8,'2015-08-21','2015-08-23','完成'@H_801_41@), (11,'2015-08-23','2015-08-23','创建'@H_801_41@), (12,'2015-08-23','2015-08-23','创建'@H_801_41@), (13,'2015-08-23','2015-08-23','支付');
-- 初始化拉链表,假设21号的就是原始数据 INSERT overwrite TABLE@H_801_41@ t_dw_orders_his SELECT@H_801_41@ orderid, createtime, modifiedtime, status, createtime AS@H_801_41@ dw_start_date, '9999-12-31' AS@H_801_41@ dw_END_DATE FROM@H_801_41@ t_ods_orders_inc WHERE day = '2015-08-21';
-- 22号增量数据进来后,与21号的状态数据(t_dw_orders_his)比对,更新拉链表 -- ${dt1} = '2015-08-22' DROP TABLE IF EXISTS@H_801_41@ t_dw_orders_his_tmp; CREATE TABLE t_dw_orders_his_tmp AS SELECT@H_801_41@ orderid, createtime, modifiedtime, status, dw_start_date, dw_END_DATE FROM@H_801_41@ ( -- 22号前需更新状态的数据 SELECT@H_801_41@ a.orderid, a.createtime, a.modifiedtime, a.status, a.dw_start_date, CASE WHEN b.orderid IS NOT NULL AND a.dw_END_DATE = '9999-12-31' THEN date_add('${dt1}',-1) -- 把22号前有效的数据失效日期置为dt1的上一日 ELSE a.dw_END_DATE END AS@H_801_41@ dw_END_DATE FROM@H_801_41@ t_dw_orders_his a left join@H_801_41@ t_ods_orders_inc b ON a.orderid =@H_801_41@ b.orderid and b.day = '${dt1}' UNION ALL -- 22号最新状态数据 SELECT@H_801_41@ orderid, createtime, modifiedtime, status, modifiedtime AS@H_801_41@ dw_start_date, '9999-12-31' AS@H_801_41@ dw_END_DATE FROM@H_801_41@ t_ods_orders_inc WHERE day = '${dt1}'@H_801_41@ ) x ORDER BY@H_801_41@ orderid,dw_start_date; -- 临时数据正式入库 INSERT overwrite TABLE@H_801_41@ t_dw_orders_his SELECT * FROM@H_801_41@ t_dw_orders_his_tmp; -- 23号数据入t_dw_orders_his表请重复上述22号的刷新步骤 -- ${dt1} = '2015-08-23' -- ... ...
--假设查询22号数据状态 --'${dt1}' = '2021-08-22' SELEct * from@H_801_41@ t_dw_orders_his where dw_start_date<='${dt1}' and dw_END_DATE>='${dt1}';
-- 1. 绿色,保留 DROP TABLE@H_801_41@ t_dw_orders_his_tmp1; CREATE TABLE@H_801_41@ t_dw_orders_his_tmp1 AS SELECT@H_801_41@ orderid, createtime, modifiedtime, status, dw_start_date, dw_END_DATE FROM@H_801_41@ t_dw_orders_his WHERE@H_801_41@ dw_END_DATE < '2015-08-22'@H_801_41@; -- 2. 黄色,更新-当日生效的数据 DROP TABLE@H_801_41@ t_dw_orders_his_tmp2; CREATE TABLE@H_801_41@ t_dw_orders_his_tmp2 AS SELECT@H_801_41@ orderid, createtime, modifiedtime, status, dw_start_date, '9999-12-31' AS@H_801_41@ dw_END_DATE FROM@H_801_41@ t_dw_orders_his WHERE@H_801_41@ dw_END_DATE = '2015-08-22'@H_801_41@; -- 2. 蓝色,更新-22号前到22号后还生效的数据 DROP TABLE@H_801_41@ t_dw_orders_his_tmp3; CREATE TABLE@H_801_41@ t_dw_orders_his_tmp3 AS SELECT@H_801_41@ orderid, createtime, modifiedtime, status, dw_start_date, '9999-12-31'@H_801_41@ dw_END_DATE FROM@H_801_41@ t_dw_orders_his WHERE@H_801_41@ dw_start_date <= '2015-08-22' AND dw_END_DATE > '2015-08-22'@H_801_41@; -- 4. 数据插入到新表 CREATE TABLE@H_801_41@ t_dw_orders_his_new AS SELEct * from@H_801_41@ ( SELECT a.* ,'绿色,保留,号前的数据' FROM@H_801_41@ t_dw_orders_his_tmp1 a UNION ALL SELECT b.*,'黄色,更新-当日生效的数据' FROM@H_801_41@ t_dw_orders_his_tmp2 b UNION ALL SELECT c.*,'蓝色,更新-22号前到22号后还生效的数据' FROM@H_801_41@ t_dw_orders_his_tmp3 c ) a ORDER BY a.orderid,a.dw_start_date;
拉链表的出现是为了压缩存储和记录数,针对频繁更新的数据会很有效。但是回滚操作不方便,查询的时候也必须要指定时间才能正确取数,操作成本高,得权衡利弊后再确认是否合适自己使用。
以上是大佬教程为你收集整理的拉链表的创建、查询和回滚全部内容,希望文章能够帮你解决拉链表的创建、查询和回滚所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。