Optimizing MySQL Performance – An Introduction

What can a badly designed database do?

Anyone whose bread and butter comes from database seldom underestimates database performance. But Paulo Cohelo said in veronica decides to die that,

People never learn anything by being told, they have to find out for themselves.

Echoing the similar sentiments, I was happily having breakfast, lunch and dinner until one of the applications that I look after started stopping abruptly. We restarted it only to find it periodically stopping with time period decreasing continuously. The winter came and what followed was days and nights of debugging and that is going to be the crux of this post.

To summarise, a badly designed database will give you sleepless nights and more prayers to God. With this, let’s see how we can optimize sql queries.

Slow queries with where clause may need an index. We can speed up retrieval by creating index on columns used in where clause. We should make index which helps speeding up other queries, if present, to reduce the disk usage. We should also avoid overdoing creating indexes as it increases INSERT/UPDATE/DELETE operations.
Reduce the full table scan for the big tables.
SQL optimizers optimize the queries for faster execution. Therefore, we should avoid writing queries which are hard to understand by the optimizers.
Locking issue may slow down our query when table or row is locked by some other session. Using a incorrect locking strategy can slow down significantly.
COUNT(*) without an where clause is returned without scanning the table. This data is stored in table-info table.
MySQL reads data directly from index tree when the column in question is numeric.
Among multiple indexes used in query, fastest one (according to the optimizer) is used.
If we can, we should INSERT statements with multiple VALUES lists to insert several rows at a time. This speeds up query by virtue of removing overhead that comes with individual insert by operations like: connecting to database for each query, sending query to server, inserting row, inserting indexes and closing connection
If we need to insert data from a file in table, we should use LOAD DATA, this is at least twenty times faster than the equivalent INSERT statements.
If there are multiple update statement, we should perform them together.
User Defined Functions can better the performance when we need to generate single result set based on large volume of data or multiple columns.
If the data can not be structured into row and column, we should use BLOB data type as that would reduce I/O timing.
We should design the database to minimize the space on disk. It decreases the I/O data. Declare the column with most efficient data types.
Declare the column non null if possible. This reduce the overhead of testing each value is null. This also saves storage by not needing to maintain one bit per column.
We should create a single composite index which is better than creating separate index for each column. The first part of the index should be the column most used.
We should prefer numeric columns to string columns for unique IDs or values that can be represented as either strings or numbers. It is faster and takes less memory to transfer and compare the numeric value than string.
We should prefix randomly generated value primary ID with ascending value such as current date and time stamp, this makes insertion and retrieval faster.
We should store hash of a long string and then use equality over the equality against the long string column. Please keep in mind that hash function can produce same output hash for different input, so an additional check against long string can be used to safeguard.
We should use the OPTIMIZE TABLE statement to reorganize the table and compact any wasted space when data size reaches hundreds of megabytes. The reorganized tables require less disk I/O to perform full table scans.

What can a badly designed database do?

Share this:

Related

Leave a Reply Cancel reply