Mariadb   发布时间:2022-05-23  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了SELECTs -- do's and don'ts大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

原文地址:http://Mysql.rjweb.org/doc.php/ricksrots

brought to you by Rick James

Here are 160+ tips,tricks,suggestions,etc. They come from a decade of improving perfoRMANce in MysqL in thousands of situations. There are exceptions to the statements below,but they should Help guIDe you into better understanding how to effectively use Mysql. 

SELECTs -- do's and don'ts

table border="1"> RoTs discussion lign="top"> IDth="60%">     ⚈  Do not hIDe an indexed column insIDe a function call: DATE(X) = '...',LCASE(col) = 'foo'     ⚈  LCASE() is usually unnecessary because the collation will compare 'correctly' without it.     ⚈  #1: SubquerIEs perform poorly     ⚈  Never use "IN (SELECT...)" -- optimizes poorly. Turn into JOIN. (5.6.5 improves)     ⚈  A subquery that condenses data (GROUP BY,liMIT,etC) may perform well     ⚈  OR may be very inefficIEnt; turn into UNION.     ⚈  A Coding pattern: dt >= '2010-02-01' AND dt < '2010-02-01'="" +="" interval="" 7="">     ⚈  ORDER BY NulL -- a little-kNown trick to avoID GROUP BY doing a sort (if there is another way).     ⚈  WHERE (a,b) > (7,8) is poorly optimized     ⚈  Gather these to study a slow query: SHOW create table,@R_995_616@,EXPLAIN.     ⚈  Do not use OFFSET for pagination -- conTinue where you "left off"     ⚈  Don't mix disTinCT and GROUP BY     ⚈  Be explicit about union all vs UNION disTinCT -- it makes you think about which to use     ⚈  Do not use SELECT * except for deBUGging or when fetching into a hash.     ⚈  views are poorly optimized     ⚈  A subquery in the FROM clause may be useful for retrIEving BLOBs without sorTing them: Speed up a query by first finding the IDs,then self-JOIN to fetch the rest. IEs came to MysqL rather late in the game. They have not been well optimized,so it is usually better to turn your SELECTs into an equivalent JOIN. This is especially true for "IN ( SELECT ... )",but that is better optimized in 5.6.5 and MariaDB 5.5. Sometimes a subquery is really the best way to optimize a SELECT. The common thread of these "good" subquerIEs seems to be when the subquery has to scan a lot of rows,but boils down thE intermediate resultset to a small number of rows. This is likely to happen with GROUP BY or liMIT in the subquery. index hints (FORCE INDEX,etC) may Help you today,but may be the wrong thing for tomorrow -- different constants in the WHERE clause may lead FORCE to do the "wrong" thing. For analyzing a slow query,SHOW create table provIDes the datatypes,indexes,and ENGIne. (DESCRIBE provIDes much less info.) @R_995_616@ tells how big the table is. EXPLAIN says how the query optimizer decIDed to perform the query. it is so tempTing to use ORDER BY ID liMIT 30,10to find the 4th page of 10 items. But it is so inefficIEnt,especially when you have thousands of pages. The thousandth page has to read (at some level) all the pages before it. "left off" refers to having the "Next" @R_502_5554@ on one page give the ID (or other sequencing info) of where the next page can be found. Then that page simply does WHERE ID > $leftoff ORDER BY ID liMIT 10

</tr>

INDEXing

table border="1"> RoTs discussion lign="top"> a SELECT by orders of magnitude and will slow down INSERTs a little. (A fair tradeoff?)     ⚈  Adding indexes is not a panacea.     ⚈  BTree is an excellent all-around indexing mechanism     ⚈  A BTree index node contains ~100 items. (1M rows = 3 levels; 1B rows = 5 levels)     ⚈  Flags,and other fIElds with few values,should not be alone in an index -- the index won't be used.     ⚈  MysqL rarely uses two INDEXes in one SELECT. Main exceptions: subquerIEs,UNION.     ⚈  A "prefix" index -- INDEX(name(10)) -- is rarely useful. Exception: TEXT     ⚈  A UNIQUE "prefix" is probably wrong -- UNIQUE(name(10)) forces 10 chars to be unique.     ⚈  it is ok to have Index_length > data_length     ⚈  5 fIElds in a compound index seems "too many"     ⚈  Having no compound indexes is a clue that you do not understand their power. INDEX(a,b) may be much better than INDEX(a),INDEX(b)     ⚈  index(a,b) covers for INDEX(a),so drop the latter.     ⚈  2x speedup when "Using index" (a "covering" indeX)     ⚈  Akiban (3rd party) "groups" tables together,interleaved,to improve JOIN perfoRMANce.     ⚈  FulLTEXT (MyISAM) -- watch out for ft_min_word_len=4,stopwords,and 50% rule     ⚈  A FulLTEXT index will be used before any other index.     ⚈  FulLTEXT -- consIDer Syphinx,Lucene,etc (3rd Party) important to any database. GetTing the "right" index can make a query run orders of magnitude faster. So,how to do that? Often "compound indexes" (multiple columns in a single INDEX(...)) are better than single-column indexes. A WHERE clause that has column=constant begs for an index that starts with that column. If the WHERE clause has multiple fIElds AND'd together,"="s should come first. indexes are structured in BTrees. The "root" node (a block) of a BTree has pointers to child blocks. This goes as deep as necessary,but really not very deep (see the 100x RoT). MyISAM uses 1KB blocks; InnoDB uses 16KB blocks. Each INDEX is its own BTree. A PRIMary KEY is a UNIQUE key is an INDEX. INDEX and KEY are synonomous. In InnoDB,the data included in the BTree for the PRIMary KEY. In MyISAM,the data is a separate file (.MYD). A "covering" index is one where all the fIElds needed in a SELECT are included in the INDEX.

</tr>

table>

ENGINE Differences

RoTs Discussion

</tr>

table>

Optimizations,and not

RoTs Discussion If you can arrange for rows to be "adjacent" to each other,then one disk fetch will bring in many rows (10x speedup). "Batched" INSERTs are where one INSERT statement has multiple rows. Nearly all of the perfoRMANce benefit is in the first 100 rows; going beyond 1000 is really getTing into 'diminishing returns'. Furthermore,in a Replication environment,a huge INSERT would cause the Slave to get 'behind'. 

</tr>

PARTITIONing

table border="1"> RoTs discussion lign="top"> 1M rows     ⚈  No more than 50 PARTITIONs on a table (open,@R_995_616@,are impacted) (fixed in 5.6.6?)     ⚈  PARTITION BY RANGE is the only useful method.     ⚈  SUBPARTITIONs are not useful.     ⚈  The partition fIEld should not be the fIEld first in any key.     ⚈  it is OK to have an auto_INCREMENT as the first part of a compound key,or in a non-UNIQUE index. IEve that PARTITIONing will solve perfoRMANce problems. But it is so often wrong. PARTITIONing splits up one tablE into several smaller tables. But table size is rarely a perfoRMANce issue. Instead,I/O time and indexes are the issues. Perhaps the most common use case where PARTITIONing shines is the the dataset where "old" data is deleted from the table. RANGE PARTITIONing by day (or other unit of timE) lets you do a nearly instantaneous DROP PARTITION plus REORGANIZE PARTITION instead of a much slower deletE. An auto_INCREMENT column must be the first column in some index. (That lets the ENGIne find the 'next' value when opening the table.) It does not have to be the only fIEld,nor does it have to be PRIMary or UNIQUE. If it is not UNIQUE,you Could INSERT a duplicate ID if you explicitly provIDe the number

</tr>

@H_746_8@memory Usage table border="1"> RoTs discussion lign="top"> 1/sec,increase table_open_cache.     ⚈  Turn off the query Cache. Type=off and size=0 @H_822_20@mysqL perfoRMANce depends on being in control of its use of RAm. The biggest pIEces are the caches for MyISAM or InnoDB. these caches should be tuned to use a large chunk of RAm. Other things that can be tuned rarely matter much,and the default values in my.cnf (my.ini) tend to be "good enough". The "query Cache" is @R_378_10586@lly disTinct from the key_buffer and the buffer_pool. ALL QC entrIEs for one table are purged when ANY change to that table occurs. Hence,if a table is being frequently modifIEd,the QC is virtually uSELEss. 

</tr>

Character Sets

table border="1"> RoTs discussion lign="top"> utf8_general_ci > utf8_bin     ⚈  DeBUG stored data via HEX(col),LENGTH(col),CHAR_LENGTH(col)     ⚈  Do not use utf8 for hex or ascii Strings (GUID,md5,ip address,country code,postal code,etc.) = CHAR_LENGTH(col): with European text '=' for laTin1,'>' for utf8. 

</tr>

Datatypes - Directly supported

table border="1"> RoTs discussion lign="top"> IDth="58%">     ⚈  INT(5) is not what you think. (See smaLliNT,etC)     ⚈  float(7,2) -- No; just say float     ⚈  Learn the sizes: INT & float are 4 bytes. etc     ⚈  Before using BIGINT (8 bytes),ask whether you really need such a big range.     ⚈  Almost never use CHAR instead of VARCHAR.     ⚈  Do not have separate DATE and TIME columns,nor separate YEAR,MONTH,etc.     ⚈  Most INTs should be UNSIGNED     ⚈  Most columns should be NOT NulL     ⚈  timestAMP is half the size of datetiR_233_11845@E (changing in 5.6)     ⚈  timestAMP DEFAulT changes in 5.6.5     ⚈  VARCHAR(255) has some drawBACks over VARCHAR(11)     ⚈  OverlapPing time ranges: WHERE a.start < b.end="" and="" a.end=""> b.start     ⚈  Don't be surprised by auto_INCREMENT values after uncommon actions.  More cacheable --> Faster. An auto_INCREMENT is very non-random,at least for inserTing. Each new row will be on the 'end' of the table. That is,the last block is "hot spot". Thanks to caching very little I/O is needed for an auto_INCREMENT index. VARCHAR(255) for everything is tempTing. And for "small" tables it won't hurt. For large tables one needs to consIDer what happens during the execution of complex SELECTs. If a "temporary" table is implicitedly generated,the VARCHAR will take 767 bytes in the temp table (2+3*255) bytes. 2=VAR overhead,3=utf8 expansion,255=your limit. A deletE of the last row may or many not burn that auto_INCREMENT ID. INSERT IGnorE burns IDs because it allocates values before checking for duplicate keys. A Slave may see InnoDB IDs arriving out of order (because transactions arrive in COMMIT order). A RolLBACK (explicit or implicit) will burn any IDs already allocated to INSERTs. replaCE = deletE + INSERT,so the INSERT comments apply to replaCE. After a crash,the next ID to be assigned may or may not be what you expect; this varIEs with ENGIne. 

</tr>

table>

Datatypes - Implicit

RoTs Discussion Since GUID,UUID,MD5,and SHA1 are fixed length,VAR is not needed. If they are in hex,don't bother with utf8; use BINARY or CHAR CHARSET ascii. Images Could be stored in BLOB (not TEXT). This better assures referential integrity (not accIDentally deleting the Metadata or image,but not both). On the other hand,it is clumsy. With files,an img tag can point directly to the image on disk. 

</tr>

table>

Hardware

RoTs Discussion 8 cores degrade perfoRMANce. (Changes coming in XtraDB,5.6,MariaDB,etC) (5.6 claims to be good to 48 cores - ymMV; 5.7 claims 64)     ⚈  A single connection will not use more than one core. Not even with UNION or PARTITION.     ⚈  Don't put a cache in front of a cache     ⚈  10x speed up when disk blocks are cached,so... Time a query twice -- first will get things cached,second will do no I/O     ⚈  Benchmark with "SELECT SQL_NO_CACHE ..." (to avoid Query cachE) 

</tr>

PXC / galera

table border="1"> RoTs discussion lign="top"> IDth="58%">     ⚈  InnoDB only; always have a PRIMary KEY     ⚈  check for errors,even after COMMIT     ⚈  For optimal perfoRMANce,use 'medium-sized' transactions     ⚈  Cross-colo Replication may be faster or slower than Traditional Replication     ⚈  auto_INCREMENT values won't be consecutive     ⚈  Handle "critical reads" using wsrep_causal_reads     ⚈  ALTERs need to be handled differently (see RSU vs TOI)     ⚈  Lots of tricks are based on: remove from cluster + do stuff + add BACk to cluster     ⚈  Minimal HA: 1 node in each of 3 datacenters; one Could be just a grabd galera-based systems,such as Percona XTradB Cluster. Since there is onE inter-node action per transaction,medium-sized transactions are a good tradeoff between inter-node delays and prompt Replication. Trust nodes to heal themselves (via sst or IST); this leads to significantly lowered manual intervention for dead nodes,etc. Critical Reads are no longer a problem,except for the minor code change. 

</tr>

Data Warehouse

table border="1"> RoTs discussion lign="top"> IDth="62%">     ⚈  #1: Create SumMary table(s) ("materialized vIEws")     ⚈  Look into 3rd party solutions: InfoBright,TokuDB (In MariaDB)     ⚈  normalize,but don't over-normalize.     ⚈  Do not normalize "conTinuous" values -- dates,floats,etc -- especially if you will do range querIEs     ⚈  The average of averages is (usually) mathematically incorrect.     ⚈  InfoBright (3rd party) -- 10:1 compression; all columns automatically indexed     ⚈  TokuDB (3rd party) -- 3:1 compression; faster loading ("fractal" technology)     ⚈  range PARTITION the Fact table on a unit of time (to make deletE efficIEnt).     ⚈  Use the smallest practical datatype for each fIEld (to shrink the 'fact' tablE).     ⚈  Use InnoDB. That way,recovery from a power failure will be fast and painless.     ⚈  Don't have any indexes other than an auto_INCREMENT PRIMary KEY for the fact table. That way,INSERTs into it will be fast. PerioDic augmentation of the sumMary table(s) can use that to keep track of where they "left off".     ⚈  "Sharding" (data spread across multiple machines) is mostly do-it-yourself. Or... CluStrix,SpIDer,Fabric broken down by hours/days/weeks,together with some other "dimensions" like department,country,product,etc. Doing such reports against the raw ("Fact") table is costly because of the I/O to read lots of that table. CreaTing and maintaining "SumMary table" is a technique for generaTing reports much more efficIEntly (typically 10x-100x faster). A SumMary table has usually has,say,PRIMary KEY(product,day),plus other columns that are COUNTs,of metrics for the given product+day. A report reads the SumMary table,not the Fact table,and it finishes any further arithmetic. A SumMary table based on days can be used to genearte a weekly report by suitable SUMs and GROUP BY. AVERAGEs should be done by SUM(sum)/sum(count). normalizing dates runs afoul of the 20% rule,plus making it impossible to do range scans. 

</tr>

@H_746_8@miscellany

table border="1"> RoTs discussion lign="top"> IDth="68%">     ⚈  MysqL can run 1000 qps. (just a RoT; ymMV)     ⚈  The SlowLog is the main cluE into perfoRMANce problems. Keep it on. Use long_query_time=2.     ⚈  1000+ Databases or tables is a clue of poor scheR_233_11845@a design     ⚈  10,000+ Databases or tables will run very slowly because of OS overhead     ⚈  < 10%="" improvement="" --="" don't="" bother.="" exception:="" shrink="" the="" datatypes="" before="" deploying     ⚈  beware="" of="" sql="" injection     ⚈  if="" you="" can't="" finish="" an="" innodb="" transaction="" in="" 5="" seconds,redesign="" it.     ⚈  mysql="" has="" many="" builtin="" 'hard'="" limits;="" you="" will="" not="" hit="" any="" of="" them.     ⚈  an="" excessive="" maxclients="" (apache)="" can="" cause="" trouble="" with="" max_connections     ⚈  connection="" pooling="" is="" generally="" not="" worth="" the="" effort.="" (not="" to="" be="" confused="" with="" 5.7's="" thread="" pooling.)     ⚈  sbr="" vs="" rbr="" --="" too="" many="" variables="" to="" make="" a="" call     ⚈  a="" slave="" can="" have="" only="" one="" master.="" (exceptions:="" 5.7,~galera)     ⚈  do="" all="" alters="" in="" a="" single="" statement.="" exceptions:="" partition,ndb,some="" cases="" in="" 5.6+     ⚈  alter="" to="" add="" an="" enum="" option="" is="" efficient.="" (this="" has="" not="" always="" been="" the="" case.)     ⚈  "load="" average"="" often="" raises="" false="" alarms.     ⚈  pick="" carefully="" between="" replace="" (="=" delete="" +="" insert),insert="" ignore,and="" insert="" ...="" on="" duplicate="" key="" update.     ⚈  when="" threads_running=""> 10,you @H_160_3@may be in serIoUs trouble.     ⚈  SHOW PROCESSList with some threads "Locked" -- some other thread is hogging something.     ⚈  SHOW PROCESSList may fail to show the locking thread -- it is SleePing,but not yet COMMITted.     ⚈  >90% cpu --> investigate querIEs/indexes. (The SlowLog also catches such.)     ⚈  >90% of one core -- since MysqL won't use multiple cores in a single connection,this inDicates an inefficIEnt query. (Eg,12% overall on an 8-core Box is probably consuming one core.)     ⚈  >90% I/O -- tuning,overall scheR_233_11845@a design,missing index,etc.     ⚈  "Nosql" is a catchy phrase looking for a deFinition. By the time Nosql gets a deFinition,it will look a lot like an RDBMS solution. @H_822_20@mysqL can run thousands of trivial querIEs on modern harDWare. Some special benchmarks have driven InnoDB past 100K qps. At the other extreme,I have seen a query run for a month. 1000 qps is simply a RoT that applIEs to a lot of systems; but your mileage can really vary a lot. Over-normalization can lead to inefficIEncIEs. Why have a 4-byte int as an ID for the 200 countrIEs in the world; simply use a 2-byte CHAR(2) CHARSET ascii. Don't normalize dates -- see Data Warehousing. sql Injection is where you take user input (say,from an HTML form) and insert it verbatim into a sql statement. Some Hacker will soon find that your site is not protecTing itself and have his way with your data. "SELECT *" will break your code tomorrow when you add another fIEld. it is better to spell out the fIElds explicitly. (There is no noticeable perfoRMANce difference.) ALTER,in most situations,completely copIEs over the table and rebuilds all the indexes. For a huge table,the can take days. Doing two ALTERs means twice the work; A single ALTER statement with several operations in it. OPTIMIZE is similarly costly,and may not provIDe much benefit. MariaDB 5.3's "Dynamic columns" eats into a big excuse for "NoSQL". 

</tr>

table>

大佬总结

以上是大佬教程为你收集整理的SELECTs -- do's and don'ts全部内容,希望文章能够帮你解决SELECTs -- do's and don'ts所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。
标签: