Postgre SQL   发布时间:2022-05-20  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了配置PostgreSQL以获得读取性能大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我们的系统编写了大量数据(大数据系统的种类).写入性能足以满足我们的需求,但读取性能实在太慢.

所有表的主键(约束)结构都类似:

timestamp(timestamp) ; index(smallint) ; key(Integer).

一个表可以包含数百万行,甚至数十亿行,并且读取请求通常用于特定时间段(时间戳/索引)和标记.有一个返回大约200k行的查询是很常见的.目前,我们可以读取大约每秒15k行,但我们需要快10倍.这是可能的,如果是的话,怎么样?

注意:POSTGResql与我们的软件一起打包,因此不同客户端的硬件不同.

它是用于测试的Vm. VM的主机是windows Server 2008 R2 x64,具有24.0 GB的RAm.

服务器规范(虚拟机VMWarE)

Server 2008 R2 x64
2.00 GB of memory
Intel Xeon W3520 @ 2.67GHz (2 cores)

POSTGResql.conf优化

shared_buffers = 512MB (default: 32MB)
effective_cache_size = 1024MB (default: 128MB)
checkpoint_segment = 32 (default: 3)
checkpoint_completion_target = 0.9 (default: 0.5)
default_statistics_target = 1000 (default: 100)
work_mem = 100MB (default: 1MB)
maintainance_work_mem = 256MB (default: 16MB)

表定义

create table "AnalogTransition"
(
  "KeyTag" Integer NOT NulL,"timestamp" timestamp with time zone NOT NulL,"timestampQuality" smallint,"timestampIndex" smallint nOT NulL,"Value" numeric,"Quality" Boolean,"QualityFlags" smallint,"updatetimestamp" timestamp without time zone,-- (UTC)
  CONSTraiNT "PK_AnalogTransition" PRIMARY KEY ("timestamp","timestampIndex","KeyTag" ),CONSTraiNT "FK_AnalogTransition_Tag" FOREIGN KEY ("KeyTag")
      REFERENCES "Tag" ("Key") MATCH SIMPLE
      ON updatE NO ACTION ON deletE NO ACTION
)
WITH (
  OIDS=falSE,autoVACUUM_enabled=true
);

询问

查询在pgadmin3中执行大约需要30秒,但我们希望在5秒内获得相同的结果(如果可能).

SELECT 
    "AnalogTransition"."KeyTag","AnalogTransition"."timestamp" AT TIME ZONE 'UTC',"AnalogTransition"."timestampQuality","AnalogTransition"."timestampIndex","AnalogTransition"."Value","AnalogTransition"."Quality","AnalogTransition"."QualityFlags","AnalogTransition"."updatetimestamp"
FROM "AnalogTransition"
WHERE "AnalogTransition"."timestamp" >= '2013-05-16 00:00:00.000' AND "AnalogTransition"."timestamp" <= '2013-05-17 00:00:00.00' AND ("AnalogTransition"."KeyTag" = 56 OR "AnalogTransition"."KeyTag" = 57 OR "AnalogTransition"."KeyTag" = 58 OR "AnalogTransition"."KeyTag" = 59 OR "AnalogTransition"."KeyTag" = 60)
ORDER BY "AnalogTransition"."timestamp" DESC,"AnalogTransition"."timestampIndex" DESC
liMIT 500000;

解释1

"limit  (cost=0.00..125668.31 rows=500000 wIDth=33) (actual time=2.193..3241.319 rows=500000 loops=1)"
"  Buffers: shared hit=190147"
"  ->  Index Scan BACkWARD using "PK_AnalogTransition" on "AnalogTransition"  (cost=0.00..389244.53 rows=1548698 wIDth=33) (actual time=2.187..1893.283 rows=500000 loops=1)"
"        Index Cond: (("timestamp" >= '2013-05-16 01:00:00-04'::timestamp with time zonE) AND ("timestamp" <= '2013-05-16 15:00:00-04'::timestamp with time zonE))"
"        Filter: (("KeyTag" = 56) OR ("KeyTag" = 57) OR ("KeyTag" = 58) OR ("KeyTag" = 59) OR ("KeyTag" = 60))"
"        Buffers: shared hit=190147"
"@R_960_10586@l runtime: 3863.028 ms"

解释2

在我的最新测试中,选择我的数据需要7分钟!见下文:

"limit  (cost=0.00..313554.08 rows=250001 wIDth=35) (actual time=0.040..410721.033 rows=250001 loops=1)"
"  ->  Index Scan using "PK_AnalogTransition" on "AnalogTransition"  (cost=0.00..971400.46 rows=774511 wIDth=35) (actual time=0.037..410088.960 rows=250001 loops=1)"
"        Index Cond: (("timestamp" >= '2013-05-22 20:00:00-04'::timestamp with time zonE) AND ("timestamp" <= '2013-05-24 20:00:00-04'::timestamp with time zonE) AND ("KeyTag" = 16))"
"@R_960_10586@l runtime: 411044.175 ms"
数据对齐和存储大小

实际上,元组标头的每个元组的开销是24字节,项目指针的开销是4字节.
此相关答案中计算的更多细节:

> Use GIN to index bit Strings

this related answer on SO中数据对齐和填充的基础知识:

> CalculaTing and saving space in POSTGReSQL

我们有三列主键:

PRIMARY KEY ("timestamp","KeyTag")

"timestamp"      timestamp (8 bytes)
"timestampIndex" smallint  (2 bytes)
"KeyTag"         Integer   (4 bytes)

结果是:

 4 bytes item pointer in the page header (not counTing toWARDs multiple of 8 bytes)
---
23 bytes for the tuple header
 1 byte  padding for data alignment (or NulL bitmap)
 8 bytes "timestamp"
 2 bytes "timestampIndex"
 2 bytes padding for data alignment
 4 bytes "KeyTag" 
 0 padding to the nearest multiple of 8 bytes
-----
44 bytes per tuple

有关在此相关答案中测量对象大小的更多信息

> Measure the size of a POSTGReSQL table row

多列索引中的列顺序

阅读这两个问题和答案,了解:

> Is a composite index also good for queries on the first field?
> Working of indexes in POSTGReSQL

您拥有索引(主键)的方式,您可以在没有排序步骤的情况下检索行,这很有吸引力,尤其是使用liMIT.但检索行似乎非常昂贵.

通常,在多列索引中,“相等”列应首先出现,“范围”列应最后:

> Multicolumn index and perfoRMANce

因此,请尝试使用反向列顺序的其他索引:

CREATE INDEX analogransition_mult_IDx1
   ON "AnalogTransition" ("KeyTag","timestamp");

这取决于数据分布.但是,有数百万行,这可能要快得多.

由于数据对齐和放大,元组大小增加了8个字节.填充.如果您将此作为普通索引使用,则可能会尝试删除第三列“timestamp”.可能有点快或不快(因为它可能有助于排序).

您可能希望保留两个索引.根据许多因素,您的原始索引可能更合适 – 特别是小liMIT.

autoVACUUM和表统计信息

您的表统计信息需要是最新的.我相信你有autoVACUUM跑.

由于您的表格似乎很大且统计信息对于正确的查询计划很重要,因此我会大幅增加相关列的statistics target:

alter table "AnalogTransition" ALTER "timestamp" SET STATISTICS 1000;

…甚至更高的数十亿行.最大值为10000,默认值为100.

对WHERE或ORDER BY子句中涉及的所有列执行此操作.然后运行ANALYZE.

表格布局

在此期间,如果您应用了解有关数据对齐和填充的知识,这种优化的表格布局应该节省一些磁盘空间并帮助提高性能(忽略pk& fk):

create table "AnalogTransition"(
  "timestamp" timestamp with time zone NOT NulL,"KeyTag" Integer NOT NulL,-- (UTC)
  "QualityFlags" smallint,"Value" numeric
);

CLUSTER / pg_repack

要优化使用特定索引(无论是原始索引还是我建议的替代索引)的查询的读取性能,您可以按索引的物理顺序重写表. CLUSTER这样做,但它是相当侵入性的,并且在操作期间需要独占锁. pg_repack是一种更复杂的替代方案,可以在没有桌面排他锁的情况下做同样的事情.
这可以大大有助于大型表,因为必须读取更少的表块.

内存

通常,2GB的物理RAM不足以快速处理数十亿行.更多RAM可能会有很长的路要走 – 伴随着适应的设置:显然是一个更大的effective_cache_size开始.

大佬总结

以上是大佬教程为你收集整理的配置PostgreSQL以获得读取性能全部内容,希望文章能够帮你解决配置PostgreSQL以获得读取性能所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。
标签: