MYSQL: Query Execution Basics

1. The client sends the SQL statement to the server.
-> The protocol is halfduplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both. It also means there is no way to cut a message short.
The client sends a query to the server as a single packet of data. This is why the max_packet_size configuration variable is important if you have large queries. Once the client sends the query, it doesn’t have the ball anymore; it can only wait for results. The response from the server usually consists of many packets of data.When the server responds, the client has to receive the entire result set. It cannot simply fetch a few rows and then ask the server not to bother sending the rest. If the client needs only the first few rows that are returned, it either has to wait for all of the server’s packets to arrive and then discard the ones it doesn’t need, or disconnect ungracefully. Neither is a good idea, which is why appropriate LIMIT clauses are so important.Here’s another way to think about this: when a client fetches rows from the server, it thinks it’s pulling them. But the truth is, the MySQL server is pushing the rows as it generates them. The client is only receiving the pushed rows; there is no way for it to tell the server to stop sending rows. The client is “drinking from the fire hose,” so to speak.



2. The server checks the query cache. If there’s a hit, it returns the stored result from the cache; otherwise, it passes the SQL statement to the next step.
-> Before even parsing a query, MySQL checks for it in the query cache, if the cache is enabled. This operation is a case sensitive hash lookup. If the query differs from a similar query in the cache by even a single byte, it won’t match, and the query processing will go to the next stage.
If MySQL does find a match in the query cache, it must check privileges before
returning the cached query. This is possible without parsing the query, because
MySQL stores table information with the cached query. If the privileges are OK,
MySQL retrieves the stored result from the query cache and sends it to the client,
bypassing every other stage in query execution. The query is never parsed, optimized,
or executed.

3. The server parses, preprocesses, and optimizes the SQL into a query execution
plan.

-> MySQL’s parser breaks the query into tokens and builds a “parse tree”
from them. The parser uses MySQL’s SQL grammar to interpret and validate the
query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as quoted strings that aren’t terminated. The preprocessor then checks the resulting parse tree for additional semantics that the parser can’t resolve. For example, it checks that tables and columns exist, and it resolves names and aliases to ensure that column references aren’t ambiguous.Next, the preprocessor checks privileges. This is normally very fast unless your server has large numbers of privileges.
-> The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different ways and produce the same result. The optimizer’s job is to find the best option. MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and choose the least expensive. The unit of cost is a single random four-kilobyte data page read.

4. The query execution engine executes the plan by making calls to the storage engine API.
-> The parsing and optimizing stage outputs a query execution plan, which MySQL’s
query execution engine uses to process the query. The plan is a data structure; it is
not executable byte-code, which is how many other databases execute queries. In contrast to the optimization stage, the execution stage is usually not all that complex: MySQL simply follows the instructions given in the query execution plan.
Many of the operations in the plan invoke methods implemented by the storage
engine interface, also known as the handler API. Each table in the query is represented by an instance of a handler. If a table appears three times in the query, for example, the server creates three handler instances. Though we glossed over this before, MySQL actually creates the handler instances early in the optimization stage. The optimizer uses them to get information about the tables, such as their column names and index statistics.

5. The server sends the result to the client.
-> The final step in executing a query is to reply to the client. Even queries that don’t return a result set still reply to the client connection with information about the query, such as how many rows it affected.

Comments

Keith said…
Thanks for this very elegant and clear article. I stumbled it, so hopefully others will get to see it.
Unknown said…
privilege check should be done before checking in cache.

Popular posts from this blog

How do I Use the Linux Top Command?

IOPS measurement