程序笔记   发布时间:2022-07-20  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了hive SerDe序列化和反序列序列化表大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

什么是SerDe

SerDe 是两个单词的拼写 serialized(序列化) 和 deserialized(反序列化)。 什么是序列化和反序列化呢?

当进程在进行远程通信时,彼此可以发送各种类型的数据,无论是什么类型的数据都会以 二进制序列的形式在网络上传送。发送方需要把对象转化为字节序列才可在网络上传输, 称为对象序列化;接收方则需要把字节序列恢复为对象,称为对象的反序列化。

Hive的反序列化是对Key/value反序列化成hive table的每个列的值。Hive可以方便 的将数据加载到表中而不需要对数据进行转换,这样在处理海量数据时可以节省大量的时间。

Hive SerDe

what is a SerDe?

  • SerDe is a short name for "serializer and Deserializer."
  • Hive uses SerDe (and FileFormat) to read and write table rows.
  • HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object     (读流程)
  • Row object --> serializer --> <key, value> --> OutputFileFormat --> HDFS files      (写流程)

----参官网:https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HiveSerDe

2 Built-in SerDes(SerDe包括内置类型)

  • Avro (Hive 0.9.1 and later)
  • ORC (Hive 0.11 and later)
  • RegEx
  • Thrift
  • Parquet (Hive 0.13 and later)
  • CSV (Hive 0.14 and later)
  • JsonSerDe (Hive 0.12 and later in hcatalog-corE)

----参官网:https://cwiki.apache.org/confluence/display/Hive/SerDe

 

序列化的使用

3.1 建表时指定序列化方式

· RegexSerDe

create table apacHelog (
  host StriNG,
  identity StriNG,
  user StriNG,
  time StriNG,
  request StriNG,
  status StriNG,
  size StriNG,
  referer StriNG,
  agent StriNG)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\[^\]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|".*") ([^ "]*|".*"))?"
)
STORED AS TEXTFILE;

· JsonSerDe

ADD JAR /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;

create table my_table(a String, b bigint, ...)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;

· CSVSerDe

create table my_table(a String, b String) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.openCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = "t","quoteChar"= "'","escapeChar"= "\")   STORED AS TEXTFILE;

 

·ORCSerDe

ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.orcSerde'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.orcInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.orcOutputFormat'

 

官网:

Registration of Native SerDes

As of Hive 0.14 a registration mechanism has been introduced for native Hive SerDes.  This allows dynamic binding between a "STORED AS" keyword in place of a triplet of {SerDe, InputFormat, and OutputFormat} specification, in CreateTable statements.

The following mappings have been added through this registration mechanism:

@H_772_252@
Syntax
Equivalent
Syntax
Equivalent

STORED AS AVRO /

STORED AS AVROFILE

@H_616_259@
@H_247_262@ROW FORMAT SERDE
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
@H_247_262@  @H_247_262@STORED AS INPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
@H_247_262@  @H_247_262@OUTPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
@H_616_259@

STORED AS ORC /

STORED AS ORCFILE

@H_616_259@
@H_247_262@ROW FORMAT SERDE
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.orc.orcSerde@H_247_262@'
@H_247_262@  @H_247_262@STORED AS INPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.orc.orcInputFormat@H_247_262@'
@H_247_262@  @H_247_262@OUTPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.orc.orcOutputFormat@H_247_262@'
@H_616_259@

STORED AS PARQUET /

STORED AS PARQUETFILE

@H_616_259@
@H_247_262@ROW FORMAT SERDE
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe@H_247_262@'
@H_247_262@  @H_247_262@STORED AS INPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.parquet.MapredParqueTinputFormat@H_247_262@'
@H_247_262@  @H_247_262@OUTPUTFORMAT
@H_247_262@  @H_247_262@'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat@H_247_262@'
@H_616_259@
STORED AS RCFILE@H_616_259@
@H_247_262@STORED AS INPUTFORMAT
@H_247_262@  'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
@H_247_262@  OUTPUTFORMAT
@H_247_262@  'org.apache.hadoop.hive.ql.io.RCFiLeoutputFormat'
@H_616_259@
STORED AS SEQUENCEFILE@H_616_259@
@H_247_262@STORED AS INPUTFORMAT
@H_247_262@  'org.apache.hadoop.mapred.SequenceFileInputFormat'
@H_247_262@  OUTPUTFORMAT
@H_247_262@  'org.apache.hadoop.mapred.SequenceFiLeoutputFormat'
@H_616_259@
STORED AS TEXTFILE@H_616_259@
@H_247_262@STORED AS INPUTFORMAT
@H_247_262@  'org.apache.hadoop.mapred.TexTinputFormat'
@H_247_262@  OUTPUTFORMAT
@H_247_262@  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
@H_616_259@

大佬总结

以上是大佬教程为你收集整理的hive SerDe序列化和反序列序列化表全部内容,希望文章能够帮你解决hive SerDe序列化和反序列序列化表所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。