MySQL Basics

Original text address🔗

Tuple: In a two-dimensional table, a row is called a tuple. In a relational database, a relation is a table, and each row (i.e., each record in the database) is a tuple, while each column is an attribute.
Code: In a two-dimensional table, a column is often referred to as a code. An attribute that can uniquely identify an entity corresponds to a column in the table.
Candidate Key: If a certain attribute or set of attributes in a relation can uniquely identify a tuple, and no subset of these attributes can do so, then this set of attributes is a candidate key. For example, in a student table, {student ID} and {name, class} are both candidate keys.
Primary Key: A primary key is selected from the candidate keys.
Foreign Key: A foreign key. If an attribute in one relation is also a primary key in another relation, then this attribute is a foreign key.
Primary Attribute: Attributes that appear in the candidate keys are called primary attributes.
Non-Primary Attribute: Attributes that are not included in any candidate key.

drop, delete, and truncate#

Different Usage#

Different Types#

DML Language: Data Manipulation Language, used for inserting, deleting, updating, and querying data in tables.
DDL Language: Data Definition Language, used for creating, deleting, and modifying database objects, focusing on the database structure.

delete is a DML language; after execution, the delete operation is placed in the rollback segment, taking effect after the transaction is committed and can also be rolled back.

drop and truncate are DDL languages; they take effect immediately after execution, and data cannot be rolled back.

Different Execution Speeds#

drop > truncate > delete

utf8 and utf8mb4#

utf8:
- The utf8 encoding supports 1-3 bytes.
- Chinese characters occupy 3 bytes, while other numbers, English letters, and symbols occupy one byte.
- Emoji symbols occupy 4 bytes, so utf8 cannot store emoji symbols.
utf8mb4: A complete implementation of UTF-8. It supports up to 4 bytes to represent characters and can store emoji symbols.

Database Normalization#

First Normal Form (1NF)#

Atomicity: Attributes (columns in the data table) in the database cannot be further divided.

Second Normal Form (2NF)#

Based on the first normal form, every column in the table should be associated with the primary key.

For example, storing basic student information and student grades in the same table violates the second normal form.

Non-primary attributes in each table must fully depend on the code.

Third Normal Form (3NF)#

Non-primary attributes in the table cannot depend on each other.

Building on the second normal form, eliminate the dependencies between non-primary attributes, retaining only the dependencies between non-primary attributes and the code.

MySQL Data Types#

Integer Types#

tinyint, smallint, mediumint, int, bigint, occupying 8, 16, 24, 32, and 64 bits of storage respectively.

Floating Point Numbers#

float
double
decimal: High-precision decimal type.

All three floating-point types can specify column width, for example, using decimal(18,9) to indicate a total of 18 digits, with 9 digits for the decimal part and the remainder for the integer part.

Strings#

char: Fixed length, automatically removes trailing spaces.
varchar: Variable length, does not remove trailing spaces.
blob: Stored in binary format.
text: Stored in string format.

Time#

datetime: Saves dates and times from the year 1001 to 9999, with a precision of seconds, using 8 bytes of storage, without time zone information.
timestamp: Saves time starting from 1970-01-01. Uses 4 bytes of storage and can only represent dates up to 2038, with time zone information.

MySQL Architecture#

MySQL Storage Engines#

Use the show engines command to view MySQL storage engines.

Commonly used engines include InnoDB and MyISAM.

The default engine is InnoDB.

MyISAM#

Does not support transactions.
Does not support row-level locks, only table-level locks. Read operations use shared locks, while write operations use exclusive locks.
Risk of data loss.

InnoDB#

Supports transactions, implementing four isolation levels, with the default level being repeatable read, preventing phantom reads through MVCC (Multi-Version Concurrency Control) + Next-Key Locking (gap locks).
The primary index is a clustered index.

MyISAM vs. InnoDB#

Locks: MyISAM uses table-level locks, while InnoDB uses row-level locks.
Transactions: MyISAM does not support transactions, while InnoDB does.
Foreign Keys: MyISAM does not support foreign keys, while InnoDB does.
Crash Recovery: MyISAM does not support safe recovery after a crash, while InnoDB does.
Index Implementation: Both use B+ tree data structures. In MyISAM, index files and data files are separate, while in InnoDB, the data file is also the index file.

MySQL Transactions#

A transaction is a logical group of operations that either all execute or none execute.

Transaction Characteristics#

Atomicity: All operations within a transaction must either all succeed or all fail.
Consistency: The state before and after the transaction must be consistent. For example, in a transfer, the total amount of money remains unchanged after the transfer.
Isolation: Whether uncommitted data can be read.
Durability: Changes to the database after a transaction is committed are permanent.

Abbreviated as ACID.

Transaction Issues#

Dirty Read: An uncommitted transaction reads data from another uncommitted transaction.
Non-Repeatable Read: A slower-running transaction reads a record twice, obtaining different data, mainly concerning update and delete operations.
Phantom Read: Different counts of data records read in two separate reads, mainly concerning insert operations.

Transaction Isolation Levels#

(READ-UNCOMMITTED): The lowest level, allowing reading of uncommitted data changes, which may lead to dirty reads, non-repeatable reads, and phantom reads.
(READ-COMMITTED): Allows reading of committed data, preventing only dirty reads.
(REPEATABLE-READ): Once a transaction starts, all data read during the transaction cannot be modified by other transactions, but it cannot prevent phantom reads.
(SERIALIZABLE): The highest level, preventing all issues, with all transactions executed serially.

Transaction Isolation Level Implementation#

Implemented based on locks and MVCC mechanisms.

The serializable isolation level is implemented through locks, while the other isolation levels are based on MVCC.

Default Isolation Level#

Check the default isolation level by executing SELECT @@transaction_isolation;.

The default isolation level is repeatable read.

MySQL Locks#

Table-Level Locks and Row-Level Locks#

Table-Level Lock: The largest granularity lock, locking non-index fields and locking the entire table. It is simple to implement, consumes fewer resources, locks quickly, and does not lead to deadlocks.
Row-Level Lock: The smallest granularity lock, locking index fields and locking the operating rows. Locking is slower and may lead to deadlocks.

Row-Level Lock Usage#

InnoDB uses row-level locks for locking index fields.

When executing update or delete, if the where condition field does not hit a unique index or the index is invalid, it will lead to a full table scan and lock all rows in the table.

Common Lock Types#

Shared Lock (S Lock): Also known as a read lock, a transaction obtains a shared lock when reading data (select), allowing multiple transactions to acquire it simultaneously (lock compatible).
Exclusive Lock (X Lock): Also known as a write lock or exclusive lock, a transaction obtains an exclusive lock when modifying records (update), preventing multiple transactions from acquiring it simultaneously. If a record has already been locked with an exclusive lock, other transactions cannot acquire any other type of lock (lock incompatible).

By default, no locks are applied, but they can be explicitly declared.
```
SELECT ... LOCK IN SHARE MODE; # Shared lock
SELECT ... FOR UPDATE; # Exclusive lock
```
Intent Lock: A table-level lock used to quickly determine if there are row locks in the table.
- Intent Shared Lock: If a transaction intends to apply a shared lock on data in the table, it must first apply an intent shared lock.
- Intent Exclusive Lock: If a transaction intends to apply an exclusive lock on data in the table, it must first apply an intent exclusive lock.

Locks in InnoDB#

InnoDB has three types of row locks:

Record Lock: A lock on a single row record.
Gap Lock: Locks a range, excluding the record itself.
Next-Key Lock: Uses both record locks and gap locks.

Current Read and Snapshot Read#

InnoDB's default isolation level (repeatable read) can solve phantom read issues through current read and snapshot read.

Current Read (Consistent Locking Read): Uses next-key locks to ensure that phantom reads do not occur.
Snapshot Read (Consistent Non-Locking Read): Ensured by the MVCC mechanism to prevent phantom reads.

Differences between the two

Snapshot Read: When using select normally, snapshot read is used. If the data being read is currently undergoing update/delete operations, the read operation will not wait for the exclusive lock on the record to be released but will read the previous snapshot of the row.

Current Read: Under the transaction isolation levels RC (read committed) and RR (repeatable read), InnoDB will use locking reads.

Under the RC level: For snapshot data, current read will read the latest snapshot data of the locked row.
Under the RR level: For snapshot data, current read will read the version of the row data at the start of the transaction.

Snapshot refers to the historical version of records, where each row record may have multiple historical versions.

MySQL Performance Optimization#

File Storage#

It is not recommended to use MySQL to store binary file data.

It is recommended to use object storage to save files, with MySQL storing the file addresses.

IP Address Storage#

IP addresses can be converted to integers for storage.

INET_ATON(): Converts IP to an unsigned integer (4-8 bits).
INET_NTOA(): Converts integer IP back to address.

Define Columns as NOT NULL Whenever Possible#

Unless there is a specific reason to use NULL values, fields should always be kept as NOT NULL.

Indexing NULL columns requires extra space to store, thus occupying more space.
Special handling is required for NULL values during comparisons and calculations.

MySQL Indexes#

What is an Index#

An index is a data structure used for fast searching and retrieving data, with common index structures including B-trees, B+ trees, and hash tables.

Index Data Structures#

Hash Table#

A hash table is a collection of key-value pairs, allowing quick retrieval of values through keys. O(1). It calculates the index using a hash algorithm on the key and finds the value through the index.

Disadvantages:

Hash collisions.
Hash indexes do not support sequential and range queries.

B-Tree#

Binary Search Tree, Balanced Binary Tree, Red-Black Tree🔗

B-Tree Recommended Reading🔗

A detailed explanation of B-trees and related operations such as insertion and deletion has been provided above.

My personal understanding of B-trees: The drawbacks of ordinary binary trees are that each node can only have two child nodes and can only hold one piece of data, leading to two issues: 1. The height of the tree is very high; 2. Logically adjacent node data may be far apart in the tree, which is not a significant issue for in-memory data operations.

However, in databases, data is primarily stored on disk rather than in memory. A tree that is too high requires multiple lookups to retrieve results, compare, and then look up again, leading to more disk accesses as the height of the tree increases. This is not a good choice for disks. If the "locality principle" is satisfied, meaning that logically adjacent data is physically stored together, it can reduce disk access times.

B-trees solve this problem with the following characteristics: 1. Multi-way trees, no longer limited to two child nodes per node; 2. All child nodes have the same height.

B-trees can accommodate more data in each node, thereby reducing the height of the tree, while logically adjacent data is stored in adjacent disk spaces. This reduces the number of disk interactions during queries.

B-tree:

In the above image, leaf nodes contain multiple data entries, and searching will read the data from the leaf nodes into memory at once, allowing for in-memory traversal and comparison to find the required data.

Each node in a B-tree can hold 4K of data, as disk storage uses a block-based approach, with each block being 4K in size. Each time data is read, one node is read into memory.

B+ Tree#

In a B+ tree, child nodes do not store data but only store the index of the keys, with all data stored in the leaf nodes. Additionally, all leaf nodes' data form an ordered linked list.

Comparison of B+ Tree and B Tree#

B+ trees have more stable query speeds. Since all data is stored in the leaf nodes of a B+ tree, and the heights of the leaf nodes are the same, the number of searches is consistent, leading to stable query speeds.
Leaf nodes of B+ trees are ordered. The data in all leaf nodes of a B+ tree forms an ordered linked list, making it faster to query data in a range.
Full node traversal is faster in B+ trees. B+ trees only need to traverse leaf nodes, while B trees need to traverse all nodes.
If the data being accessed is close to the root node, the search speed may be faster than that of a B+ tree.

MyISAM and InnoDB Indexes#

Both use B+ trees as their index structure.

MyISAM uses non-clustered indexes: The data field in B+ tree nodes stores the address of the data, so during a search, it first finds the node and then retrieves the data using the address in the node.
InnoDB uses clustered indexes: The data field in B+ tree nodes stores the complete data, with the index key being the primary key of the database.

Index Types#

Primary Key Index#

The primary key column of a data table uses a primary key index.

If a table does not have a primary key set, InnoDB will first check if there is a unique and non-nullable field in the table; if so, it will use that field as the primary key. If not, InnoDB will create a 6-byte auto-increment primary key.

Secondary Index (Auxiliary Index)#

The leaf nodes of a secondary index store data as primary keys, allowing the primary key to be found through the secondary index.

Secondary indexes include unique indexes, normal indexes, prefix indexes, and full-text indexes.

Unique Index (UNIQUE): A unique field is set in the data table, disallowing duplicates but allowing nulls.
Normal Index (INDEX): An index field manually set in the data table, allowing duplicates and nulls.
Prefix Index: For string-type data, an index is created by truncating the first few characters of the string.
Full-Text Index (FULLTEXT): Searches for keyword information in large text data.

Clustered Index and Non-Clustered Index#

Clustered Index#

A clustered index is an index structure where the index and data are stored together. The primary key index is a clustered index. Characteristics: fast queries, slow modifications.

Advantages: Fast query speed, as the leaf nodes store data and the data is ordered, so finding the leaf node means finding the data.
Disadvantages:
- Depends on ordered data: Queries depend on the physical storage addresses of the data being arranged in the order of the index.
- High update costs: Since the physical storage addresses of the data are arranged in the order of the index, every insertion or deletion requires moving data to maintain physical order and continuity.

Non-Clustered Index#

The index structure and data are stored separately. Secondary indexes are non-clustered indexes. The leaf nodes of secondary indexes store primary keys, which are then used to look up data.

Advantages: Fast updates, as leaf nodes do not store data.
Disadvantages:
- Depends on ordered data.
- May require a second query to retrieve data.

Covering Index#

A covering index means that the fields being queried are exactly the fields of the index, allowing data to be retrieved directly from the index without needing to look up the table.

Composite Index#

An index created using multiple fields from the table.

Leftmost Prefix Matching Principle#

When using a composite index, the fields in the composite index are matched from left to right according to the order of the fields in the composite index.

Principles for Creating Index Fields#

Fields that are frequently queried, used as query conditions, or involved in join queries.
Fields that are not null.
Fields that are frequently sorted.
Fields that are not frequently updated.
Prefer using composite indexes over single-column indexes.

MySQL Logs#

MySQL logs mainly consist of binary logs (binlog, archive logs), transaction logs (redo log), and rollback logs (undo log).

Redo Log#

The redo log is the logging system used by InnoDB, recording modifications made to each data page. The redo log can be used to recover data from errors, ensuring data durability and integrity. If the MySQL service crashes, it will recover data using the redo log upon restart.

When querying data in MySQL, data is read from disk in pages, loading one page of data (16KB) into the Buffer Pool at a time. Subsequent queries will prioritize searching in the Buffer Pool. When executing update operations, data in the Buffer Pool is updated, while the redo log is recorded in the redo log buffer, which is then flushed to the redo log file at an appropriate time.

Flushing Timing#

Flushing: Writing data from memory to disk.

The innodb_flush_log_at_trx_commit parameter controls the flushing timing:
- 0: No flushing operation is performed at each transaction commit (risk of data loss).
- 1: Flushing is performed at each transaction commit (default value) (no data loss, as only uncommitted transactions are lost).
- 2: At each transaction commit, the contents of the redo log buffer are written to the system cache page cache.
  In the default case, every time a transaction is committed, fsync is called to flush the redo log.
Additionally, InnoDB automatically writes the contents of the redo log buffer to the system cache page cache every 1 second, followed by calling fsync to flush.
When the space occupied by the redo log buffer exceeds half of innodb_log_buffer_size, it triggers a flush.

Log File Group#

Multiple files store redo log logs, writing to one file until it is full, then writing to the next, and returning to write to the first file when the last one is full.

Binlog#

Binlog records logical database operations, capturing every update to data tables.

MySQL uses binlog for data synchronization in master-slave setups.

Recording Formats#

The recording format is specified by the binlog_format parameter.

Statement: Records the original SQL statement, but executing update_time=now() may lead to different times.
Row: Records the parameters of the executed SQL, capturing specific times.
Mixed: Automatically determines whether the SQL may have data inconsistency issues; if so, it uses row format; otherwise, it uses statement format.

Writing Mechanism#

During transaction execution, logs are first written to the binlog cache, and after the transaction is committed, the binlog cache is written to the binlog.

Two-Phase Commit#

The redo log and binlog work together to ensure data recovery capabilities and prevent data loss.

The order of log recording.

When executing update operations, the redo log is written first, and the status is set to prepare. After the transaction is committed, the update operations are written to the binlog, and the redo log status is set to commit.

If MySQL crashes while the redo log status is in the prepare phase, during data recovery, since the transaction has not been committed and there is no binlog, MySQL will choose to roll back this transaction.

If the transaction is committed and there is a binlog, MySQL will choose to recover data through the binlog and set the redo log status to commit.

Undo Log#

The undo log ensures the atomicity of transactions and is used to roll back operations in case of transaction exceptions.

All transaction operations first record rollback logs before executing the actual operations. Additionally, rollback logs are recorded before data is persisted to disk.

Functions of the Three Logs#

Use redo logs to ensure transaction durability.
Use undo logs to ensure transaction atomicity.
Use redo logs and undo logs to prevent data loss.
Use binlogs for complete data backup, master-slave synchronization, and to ensure data consistency.

MVCC#

Multi-Version Concurrency Control. It is a concurrency control method that allows concurrent access to the database in database management systems.

Implementation Principles#

Implicit Fields#

DB_ROW_ID: An implicit auto-increment primary key. If a data table does not have a primary key, InnoDB automatically generates a clustered index using fields like DB_ROW_ID.
DB_TRX_ID: The ID of the most recent transaction that inserted/modified the record, recording the transaction ID that last modified the record.
DB_ROLL_PTR: A rollback pointer that points to the previous version of this record.
DRFAULT_BIT: A deletion flag.

Undo Log#

ReadView#

A read view generated during snapshot operations in a transaction. When the transaction executes a snapshot, it generates the current snapshot of the database system, recording and maintaining the IDs of currently active transactions.

The ReadView has three attributes:

trx_list: A list of uncommitted transaction IDs.
up_limit_id: Records the smallest transaction ID in the trx_list.
low_limit_id: The next transaction ID, which is the current maximum ID + 1.

SQL Execution Flow#

Connector: Database connection, identity authentication, permission verification.
Analyzer: Lexical analysis (extracting keywords, select, table names, query conditions, etc.), syntax analysis (checking if SQL conforms to MySQL specifications).
Optimizer: Uses the index with the smallest (IO cost + CPU cost) to execute the query.
Executor: Calls the storage engine interface to execute SQL.
- Checks if the required data is in the BufferPool; if not, queries the disk and then places the data into the BufferPool.
- If the operation is an update, records the undo log and updates data in the BufferPool (the data in the BufferPool may not be consistent with the data in the database).
- The data in the BufferPool is updated, and the redo log is recorded (first written to the redo log buffer, then flushed to disk at an appropriate time).
- Records the bin log.
- Updates the bin log filename and current statement content in the redo log.
- Adds commit to the redo log (two-phase commit).

Index Failure#

When the data types on both sides of the operator (=) are different, implicit conversion occurs.

Situations of Failure#

When the left side of the where clause is a numeric type, implicit conversion occurs, and the index does not fail, having little impact on efficiency.
When the left side of the where clause is a character type, implicit conversion occurs, leading to index failure, resulting in a full table scan and low efficiency.

String to Numeric Conversion Methods:#

All strings not starting with a number are converted to 0. (e.g., abc, abc123 => 0)
Strings starting with a number only take the first numeric segment. (e.g., 123abc4 => 123, 012ab4 => 12)

Failure Case#

Index failure case: num2 is of varchar type.

SELECT * FROM `test1` WHERE num2 = 10000;

Executing this SQL causes index failure, resulting in a full table scan because '10000a', '010000', '10000', etc., can all be converted to the floating point 10000, making it impossible to use the index.

Index Failure#

In the above example, index failure occurs due to type conversion.
Using != > < leads to index failure.
Fuzzy queries like %S% will lead to index failure, while S% will not lead to index failure.

SQL Execution Priority in MySQL#

Single Table#

from > where > group by > select > order by > limit

Multiple Tables#

from > join > on > where > group by > AVG, SUM > having > select > distinct > order by > limit

Slow Query Optimization#

Use the EXPLAIN keyword to analyze query statements.

Query Using Index#

When using LIKE for fuzzy queries, if the first character is %, the index will not be used.

Optimize Database Structure#

Split tables with many fields and create new tables for infrequently queried data.

Related Concepts#

drop, delete, and truncate#

Different Usage#

Different Types#

Different Execution Speeds#

utf8 and utf8mb4#

Database Normalization#

First Normal Form (1NF)#

Second Normal Form (2NF)#

Third Normal Form (3NF)#

MySQL Data Types#

Integer Types#

Floating Point Numbers#

Strings#

Time#

MySQL Architecture#

MySQL Storage Engines#

MyISAM#

InnoDB#

MyISAM vs. InnoDB#

MySQL Transactions#

Transaction Characteristics#

Transaction Issues#

Transaction Isolation Levels#

Transaction Isolation Level Implementation#

Default Isolation Level#

MySQL Locks#

Table-Level Locks and Row-Level Locks#

Row-Level Lock Usage#

Common Lock Types#

Locks in InnoDB#

Current Read and Snapshot Read#

MySQL Performance Optimization#

File Storage#

IP Address Storage#

Define Columns as NOT NULL Whenever Possible#

MySQL Indexes#

What is an Index#

Index Data Structures#

Hash Table#

B-Tree#

B+ Tree#

Comparison of B+ Tree and B Tree#

MyISAM and InnoDB Indexes#

Index Types#

Primary Key Index#

Secondary Index (Auxiliary Index)#

Clustered Index and Non-Clustered Index#

Clustered Index#

Non-Clustered Index#

Covering Index#

Composite Index#

Leftmost Prefix Matching Principle#

Principles for Creating Index Fields#

MySQL Logs#

Redo Log#

Flushing Timing#

Log File Group#

Binlog#

Recording Formats#

Writing Mechanism#

Two-Phase Commit#

Undo Log#

Functions of the Three Logs#

MVCC#

MVCC#

Implementation Principles#

Implicit Fields#

Undo Log#

ReadView#

SQL Execution Flow#

Index Failure#

Situations of Failure#

String to Numeric Conversion Methods:#

Failure Case#

Index Failure#

SQL Execution Priority in MySQL#

Single Table#

Multiple Tables#