MySQL Large Scale Database Design: ออกแบบฐานข้อมูลขนาดใหญ่หลายล้านเรคคอร์ด

บทนำ: ทำไมต้องออกแบบฐานข้อมูลขนาดใหญ่?

เมื่อตาราง MySQL มีขนาดใหญ่ขึ้นเรื่อยๆ จนถึงหลายล้านเรคคอร์ด คุณจะเริ่มเจอปัญหา Performance ที่ลดลงอย่างมาก ไม่ว่าจะเป็น Query ที่ช้าลง, Insert ที่ใช้เวลานาน, หรือแม้แต่ระบบที่ Down เนื่องจาก Load เกินขีดจำกัด บทความนี้จะสอนวิธีออกแบบและแก้ไขปัญหาเหล่านี้อย่างเป็นระบบ

1M - 10M

เรคคอร์ด: ใช้ Indexing และ Query Optimization

10M - 100M

เรคคอร์ด: ใช้ Partitioning

100M - 1B+

เรคคอร์ด: ใช้ Sharding

1B+

เรคคอร์ด: ใช้ Distributed Database

สิ่งที่ต้องเตรียม (Prerequisites)

Hardware Requirements

CPU: 4+ Cores (แนะนำ 8+ Cores สำหรับ Production)
RAM: 8GB+ (แนะนำ 16-32GB สำหรับ Buffer Pool)
Storage: NVMe SSD (ไม่แนะนำ HDD สำหรับ Large Tables)
Network: 1Gbps+ (สำหรับ Replication)

Software Requirements

MySQL: 8.0+ (แนะนำ 8.0.35+ สำหรับ Partitioning ที่ดีขึ้น)
InnoDB: Storage Engine เริ่มต้น (รองรับ Partitioning)
ProxySQL: สำหรับ Load Balancing และ Query Routing
Monitoring: Prometheus + Grafana หรือ PMM

ปัญหาที่พบบ่อยในตารางขนาดใหญ่

ปัญหาที่ 1: Query ช้าลงเมื่อตารางใหญ่ขึ้น

เมื่อตารางมีขนาดเกิน 10 ล้านเรคคอร์ด การ Query ที่เคยเร็ว (เช่น 50ms) อาจช้าลงเป็น 5-10 วินาที เนื่องจาก MySQL ต้องสแกนข้อมูลจำนวนมาก (Full Table Scan)

-- Query ที่ช้าเมื่อตารางใหญ่
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY created_at DESC
LIMIT 100;

-- EXPLAIN แสดงว่าใช้ filesort และ scan ทั้งตาราง
EXPLAIN SELECT ... -- rows: 50,000,000 (Full Table Scan!)

ทางแก้ไข:

• สร้าง Index ที่ถูกต้อง: Composite Index บน (created_at, id)
• ใช้ Covering Index: รวม columns ที่ต้องการใน index
• ใช้ Partitioning: แบ่งตารางตามวันที่ ทำให้ Query สแกนเฉพาะ partition ที่จำเป็น

ปัญหาที่ 2: INSERT ช้าลงเรื่อยๆ

เมื่อตารางมี Index หลายตัว การ Insert จะช้าลงเพราะ MySQL ต้องอัปเดตทุก Index นอกจากนี้ B+Tree ที่เข้าถึง Depth มากขึ้นก็ทำให้การเขียนช้าลงด้วย

Insert Speed (rows/sec):

1M rows: ~10,000/sec 10M rows: ~5,000/sec 100M rows: ~1,000/sec

ทางแก้ไข:

• Batch Insert: ใช้ INSERT INTO ... VALUES (...), (...), (...) แทนทีละแถว
• ใช้ Partitioning: ช่วยลด B+Tree depth ในแต่ละ partition
• ปิด Index ชั่วคราว: สำหรับ Bulk Load (ไม่แนะนำบน Production)

ปัญหาที่ 3: Lock Contention และ Deadlocks

เมื่อมี concurrent operations มากขึ้น การ Lock ทั้งตารางหรือหลายแถวพร้อมกัน จะทำให้เกิด Deadlock และระบบช้าลงอย่างมาก

-- Deadlock scenario
-- Transaction 1
UPDATE orders SET status = 'processing' WHERE id = 1;
UPDATE orders SET status = 'processing' WHERE id = 2;

-- Transaction 2 (พร้อมกัน)
UPDATE orders SET status = 'processing' WHERE id = 2;
UPDATE orders SET status = 'processing' WHERE id = 1;
-- ERROR 1213 (40001): Deadlock found

ทางแก้ไข:

• Always lock in same order: เรียงลำดับ ID ก่อน UPDATE
• ใช้ Optimistic Locking: เช็ค version ก่อน update
• ลด Transaction duration: ทำให้ transaction สั้นที่สุดเท่าที่จะเป็นไปได้
• ใช้ Partitioning: ช่วยลดการ lock ข้าม partition

ปัญหาที่ 4: Maintenance Operations ใช้เวลานาน

การทำ ALTER TABLE, BACKUP, หรือ ANALYZE บนตารางขนาดใหญ่อาจใช้เวลาหลายชั่วโมง และอาจทำให้ระบบต้องหยุดทำงาน (Downtime)

ทางแก้ไข:

• ใช้ pt-online-schema-change: เปลี่ยนโครงสร้างตารางโดยไม่ lock
• ใช้ Partitioning: drop/truncate partition เก่าได้ทันที (Fast DELETE)
• Parallel Backup: ใช้ mydumper แทน mysqldump

กลยุทธ์การสร้าง Index สำหรับตารางใหญ่

Indexing Best Practices

ควรทำ

✓ สร้าง index บน columns ที่ใช้ใน WHERE, JOIN, ORDER BY
✓ ใช้ Composite Index สำหรับ multiple conditions
✓ ใช้ Covering Index เพื่อลด table lookup
✅ Analyze query patterns ก่อนสร้าง index

ไม่ควรทำ

✗ สร้าง index ทุก column (ช้าตอน insert)
✗ สร้าง index บน low-cardinality columns (เช่น gender)
✗ ใช้ index ที่ไม่ตรงกับ query pattern
✗ ลืม analyze table หลัง bulk insert

ตัวอย่าง: Composite Index (Index รวมหลาย Column)

Composite Index เป็นการสร้าง Index ที่รวมหลาย Column เข้าด้วยกัน โดย MySQL จะใช้ index ได้ก็ต่อเมื่อ query ใช้ columns จากซ้ายไปขวา (Leftmost Prefix Rule)

-- สร้าง Composite Index สำหรับ orders table
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at);

-- Query ที่ใช้ Index ได้ (Leftmost Prefix)
SELECT * FROM orders WHERE user_id = 123;
SELECT * FROM orders WHERE user_id = 123 AND status = 'completed';
SELECT * FROM orders WHERE user_id = 123 AND status = 'completed' AND created_at > '2024-01-01';

-- Query ที่ใช้ Index ไม่ได้ (ข้าม user_id)
SELECT * FROM orders WHERE status = 'completed'; -- ❌ ไม่ใช้ index
SELECT * FROM orders WHERE created_at > '2024-01-01'; -- ❌ ไม่ใช้ index

ตัวอย่าง: Covering Index (Index ครอบคลุมทุก Column ที่ต้องการ)

Covering Index คือ Index ที่มีทุก column ที่ query ต้องการ ทำให้ MySQL ไม่ต้อง lookup ไปที่ตารางหลัก (Table Lookup) แต่ดึงข้อมูลจาก Index โดยตรง

-- Query ที่ต้องการเฉพาะ user_id, status, total_amount
SELECT user_id, status, total_amount 
FROM orders 
WHERE user_id = 123 AND status = 'completed';

-- สร้าง Covering Index (รวม columns ที่ SELECT ด้วย)
CREATE INDEX idx_orders_covering 
ON orders(user_id, status, total_amount);

-- EXPLAIN แสดง "Using index" = Covering Index ✅
EXPLAIN SELECT user_id, status, total_amount 
FROM orders WHERE user_id = 123 AND status = 'completed';
-- Extra: Using index (ไม่มี "Using filesort" หรือ "Using temporary")

กรณีศึกษา: เพิ่ม Performance จาก 30 วินาที เหลือ 50ms

ระบบ E-commerce ตาราง orders มี 50 ล้านเรคคอร์ด Query หาออเดอร์ของลูกค้าใช้เวลา 30 วินาที

-- Query เดิม (ช้า)
SELECT o.id, o.total, o.status, o.created_at,
       u.name, u.email
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at BETWEEN '2024-01-01' AND '2024-12-31'
  AND o.status = 'completed'
ORDER BY o.created_at DESC
LIMIT 100;
-- Execution time: 30+ seconds 😱

-- Solution: Composite + Covering Index
CREATE INDEX idx_orders_status_date_user 
ON orders(status, created_at, user_id, id, total);

-- หลังสร้าง Index
-- Execution time: 50ms 🚀 (600x เร็วขึ้น)

MySQL Partitioning: แบ่งตารางใหญ่เป็นส่วนย่อย

Partitioning คืออะไร?

Partitioning เป็นการแบ่งตารางใหญ่ 1 ตาราง ออกเป็นหลายๆ ส่วน (Physical Storage) แต่ยังคงเป็นตารางเดียวในมุมมองของ Application (Logical View) ทำให้ MySQL สามารถ Query เฉพาะ partition ที่จำเป็น แทนที่จะสแกนทั้งตาราง

ข้อดี

✓ Query เร็วขึ้น (Partition Pruning)
✓ Delete เร็ว (DROP PARTITION)
✓ Maintenance ง่ายขึ้น
✅ ลด Index Size ในแต่ละ partition

ข้อจำกัด

• Partition Key ต้องอยู่ใน Primary Key
• Foreign Keys ไม่รองรับ
• Unique Key ต้องรวม Partition Key
• บาง Query อาจช้าลงถ้าไม่ใช้ Partition Key

ประเภทของ Partitioning

1. RANGE Partitioning (แบ่งตามช่วง)

แบ่งข้อมูลตามช่วงของค่า เช่น แบ่งตามวันที่, ID range เหมาะสำหรับข้อมูลที่มีลักษณะ Time-series

-- สร้างตารางที่มี RANGE Partitioning (แบ่งตามปี)
CREATE TABLE orders (
    id BIGINT AUTO_INCREMENT,
    user_id INT NOT NULL,
    total_amount DECIMAL(12,2),
    status VARCHAR(50),
    created_at DATETIME NOT NULL,
    PRIMARY KEY (id, created_at)  -- created_at ต้องอยู่ใน PK
) PARTITION BY RANGE (YEAR(created_at)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION pfuture VALUES LESS THAN MAXVALUE
);

-- Query ที่ใช้ Partition Pruning (สแกนเฉพาะ partition 2024)
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
-- Partitions: p2024 (แทนที่จะสแกนทั้งหมด)

-- เพิ่ม Partition ใหม่
ALTER TABLE orders ADD PARTITION (
    PARTITION p2026 VALUES LESS THAN (2027)
);

-- ลบข้อมูลเก่าได้เร็วมาก (ไม่ต้อง DELETE)
ALTER TABLE orders DROP PARTITION p2022; -- Instant! 🚀

2. LIST Partitioning (แบ่งตามค่าที่กำหนด)

แบ่งข้อมูลตามค่าที่กำหนด เช่น แบ่งตาม region, category เหมาะสำหรับข้อมูลที่มีค่าคงที่และรู้ล่วงหน้า

-- สร้างตารางที่มี LIST Partitioning (แบ่งตามภูมิภาค)
CREATE TABLE sales (
    id BIGINT AUTO_INCREMENT,
    region_id INT NOT NULL,
    amount DECIMAL(12,2),
    sale_date DATE,
    PRIMARY KEY (id, region_id)
) PARTITION BY LIST (region_id) (
    PARTITION p_bangkok VALUES IN (1, 2, 3),     -- กรุงเทพและปริมณฑล
    PARTITION p_central VALUES IN (4, 5, 6, 7),  -- ภาคกลาง
    PARTITION p_north VALUES IN (8, 9, 10),    -- ภาคเหนือ
    PARTITION p_northeast VALUES IN (11, 12, 13), -- ภาคอีสาน
    PARTITION p_south VALUES IN (14, 15, 16)   -- ภาคใต้
);

-- Query ที่ใช้ Partition Pruning
SELECT * FROM sales WHERE region_id = 1;
-- Partitions: p_bangkok เท่านั้น

3. HASH Partitioning (แบ่งตาม Hash Function)

แบ่งข้อมูลโดยใช้ Hash Function บน column ที่กำหนด ทำให้ข้อมูลกระจายเท่าๆ กัน เหมาะสำหรับกรณีที่ไม่มี Pattern ชัดเจน

-- สร้างตารางที่มี HASH Partitioning
CREATE TABLE user_logs (
    id BIGINT AUTO_INCREMENT,
    user_id INT NOT NULL,
    action VARCHAR(100),
    log_time DATETIME,
    PRIMARY KEY (id, user_id)
) PARTITION BY HASH(user_id)
  PARTITIONS 16;  -- แบ่งเป็น 16 partitions

-- MySQL จะคำนวณ partition จาก: MOD(user_id, 16)
-- user_id = 100 → partition 4
-- user_id = 200 → partition 8

-- Query ที่ใช้ Partition Pruning
SELECT * FROM user_logs WHERE user_id = 100;
-- สแกนเฉพาะ partition ที่ user_id=100 อยู่

4. KEY Partitioning (แบ่งตาม MySQL's Internal Hash)

คล้าย HASH แต่ใช้ MySQL's internal hashing function รองรับ multiple columns และ column ไม่จำเป็นต้องเป็น integer

-- สร้างตารางที่มี KEY Partitioning
CREATE TABLE sessions (
    session_id VARCHAR(64) NOT NULL,
    user_id INT,
    data JSON,
    expires_at DATETIME,
    PRIMARY KEY (session_id)
) PARTITION BY KEY(session_id)
  PARTITIONS 32;  -- แบ่งเป็น 32 partitions

-- เหมาะสำหรับ session_id ที่เป็น UUID/Random String

ตัวอย่างการใช้งานจริง: ระบบ Logging

ระบบ Log ที่เก็บข้อมูลทุกวัน ประมาณ 1 ล้านเรคคอร์ด/วัน = 365 ล้านเรคคอร์ด/ปี

-- สร้างตาราง logs ที่มี RANGE Partitioning (แบ่งตามเดือน)
CREATE TABLE application_logs (
    id BIGINT AUTO_INCREMENT,
    log_level VARCHAR(10),
    message TEXT,
    source VARCHAR(100),
    created_at DATETIME NOT NULL,
    PRIMARY KEY (id, created_at),
    INDEX idx_level_time (log_level, created_at)
) PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p_2024_01 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p_2024_02 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    PARTITION p_2024_03 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p_2024_04 VALUES LESS THAN (TO_DAYS('2024-05-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Query logs ของเดือนมกราคม 2024 (สแกนเฉพาะ p_2024_01)
SELECT * FROM application_logs
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
  AND log_level = 'ERROR';

-- ลบ logs เก่า (เก็บไว้ 3 เดือน)
ALTER TABLE application_logs DROP PARTITION p_2024_01;
-- Execution time: < 1 second (สำหรับ 30 ล้าน rows!)

📊 Performance Comparison

30 วินาที

DELETE แบบปกติ (30M rows)

< 1 วินาที

DROP PARTITION

30x

เร็วขึ้น

MySQL Sharding: กระจายข้อมูลข้ามหลายเซิร์ฟเวอร์

Sharding คืออะไร?

Sharding เป็นการแบ่งข้อมูลออกเป็นหลายๆ เซิร์ฟเวอร์ (Physical Database) โดยแต่ละเซิร์ฟเวอร์เก็บข้อมูลเพียงส่วนหนึ่ง (Horizontal Scaling) ต่างจาก Partitioning ที่ยังอยู่ในเซิร์ฟเวอร์เดียวกัน

Feature	Partitioning	Sharding
Physical Location	เซิร์ฟเวอร์เดียว	หลายเซิร์ฟเวอร์
Complexity	ต่ำ (MySQL จัดการเอง)	สูง (ต้องจัดการเอง)
Cross-shard Query	รองรับ	ซับซ้อนมาก
Scalability	จำกัด (Vertical)	ไม่จำกัด (Horizontal)
Data Size Limit	~100M - 500M rows	Unlimited

Sharding Strategies

1. Hash-based Sharding

ใช้ Hash Function กับ shard key (เช่น user_id) เพื่อหา shard ที่จะเก็บข้อมูล ข้อมูลจะกระจายเท่าๆ กัน

# Application-level Sharding Logic (Node.js Example)

function getShardForUser(userId, totalShards = 4) {
    // Simple hash function
    const shardNumber = userId % totalShards;
    return `shard_${shardNumber}`;
}

// Example
getShardForUser(12345);  // → "shard_1"
getShardForUser(12346);  // → "shard_2"
getShardForUser(12347);  // → "shard_3"
getShardForUser(12348);  // → "shard_0"

// Connection Pool for Multiple Shards
const shards = {
    shard_0: mysql.createPool({ host: 'db-shard-1.example.com', ... }),
    shard_1: mysql.createPool({ host: 'db-shard-2.example.com', ... }),
    shard_2: mysql.createPool({ host: 'db-shard-3.example.com', ... }),
    shard_3: mysql.createPool({ host: 'db-shard-4.example.com', ... }),
};

async function getUser(userId) {
    const shardName = getShardForUser(userId);
    const connection = shards[shardName];
    return await connection.query('SELECT * FROM users WHERE id = ?', [userId]);
}

2. Range-based Sharding

แบ่งข้อมูลตามช่วงของค่า เช่น User ID 1-1M ไป Shard 1, 1M-2M ไป Shard 2 เหมาะสำหรับข้อมูลที่มี pattern ชัดเจน

# Range-based Sharding Logic

const shardRanges = [
    { min: 1, max: 1000000, shard: 'shard_users_1' },
    { min: 1000001, max: 2000000, shard: 'shard_users_2' },
    { min: 2000001, max: 3000000, shard: 'shard_users_3' },
    { min: 3000001, max: Infinity, shard: 'shard_users_default' },
];

function getShardForUser(userId) {
    for (const range of shardRanges) {
        if (userId >= range.min && userId <= range.max) {
            return range.shard;
        }
    }
    return 'shard_users_default';
}

// Example
getShardForUser(500000);   // → "shard_users_1"
getShardForUser(1500000);  // → "shard_users_2"
getShardForUser(2500000);  // → "shard_users_3"

3. Geography-based Sharding

แบ่งข้อมูลตามตำแหน่งทางภูมิศาสตร์ เช่น ข้อมูลลูกค้าไทยไป Shard ในไทย ข้อมูลลูกค้าอเมริกาไป Shard ในอเมริกา

# Geography-based Sharding

const geoShards = {
    'TH': 'shard_thailand',      // Thailand → Singapore Data Center
    'US': 'shard_us_east',       // USA → Virginia Data Center
    'EU': 'shard_eu_west',       // Europe → Frankfurt Data Center
    'JP': 'shard_asia_east',     // Japan → Tokyo Data Center
    'default': 'shard_global',   // Others → Global Shard
};

function getShardByCountry(countryCode) {
    return geoShards[countryCode] || geoShards['default'];
}

// Example: ลูกค้าจากประเทศต่างๆ
getShardByCountry('TH');  // → "shard_thailand" (Singapore DC)
getShardByCountry('US');  // → "shard_us_east" (Virginia DC)
getShardByCountry('DE');  // → "shard_eu_west" (Frankfurt DC)

MySQL Performance Tuning สำหรับตารางใหญ่

MySQL Configuration (my.cnf)

# /etc/mysql/mysql.conf.d/mysqld.cnf
# Optimized for Large Tables (32GB RAM Server)

[mysqld]
# === InnoDB Settings ===
# Buffer Pool: 70-80% of RAM (สำหรับเก็บ data + index ใน memory)
innodb_buffer_pool_size = 24G
innodb_buffer_pool_instances = 8

# Log File Settings
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M

# Flush Settings (Performance vs Durability)
innodb_flush_log_at_trx_commit = 1  # 1 = Safe, 2 = Faster
innodb_flush_method = O_DIRECT

# Thread Concurrency
innodb_thread_concurrency = 0  # Auto
innodb_read_io_threads = 8
innodb_write_io_threads = 8

# === Query Cache (Disable สำหรับ Large Tables) ===
query_cache_type = 0
query_cache_size = 0

# === Connection Settings ===
max_connections = 500
max_connect_errors = 1000
wait_timeout = 600
interactive_timeout = 600

# === Buffer Sizes ===
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 4M

# === Table Settings ===
table_open_cache = 4000
table_definition_cache = 2000
open_files_limit = 65535

# === Slow Query Log ===
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2
log_queries_not_using_indexes = 1

Query Optimization Tips

1. ใช้ EXPLAIN ทุกครั้ง

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123
AND created_at > '2024-01-01';

-- ตรวจสอบ:
-- type: ref/eq_ref = Good
-- type: ALL = Bad (Full Scan)
-- rows: ตัวเลขน้อย = Good
-- Extra: Using index = Great!

2. หลีกเลี่ยง SELECT *

-- ❌ Bad: SELECT * ดึงทุก column
SELECT * FROM orders WHERE ...;

-- ✅ Good: เลือกเฉพาะที่ต้องการ
SELECT id, user_id, total, status
FROM orders WHERE ...;

-- ✅ Better: ใช้ Covering Index
CREATE INDEX idx_covering
ON orders(user_id, id, total, status);

3. ใช้ Batch Operations

-- ❌ Bad: Insert ทีละ row
INSERT INTO logs VALUES (...);
INSERT INTO logs VALUES (...);

-- ✅ Good: Batch Insert
INSERT INTO logs VALUES
 (...), (...), (...), (...),
 (...), (...), (...), (...);
-- 1,000 rows per batch = Optimal

4. Pagination ที่ถูกต้อง

-- ❌ Bad: OFFSET ใหญ่มาก
SELECT * FROM orders
LIMIT 1000000, 100;

-- ✅ Good: Cursor-based Pagination
SELECT * FROM orders
WHERE id > 1000000
ORDER BY id LIMIT 100;
-- Uses index, no offset scan!

Query สำหรับ Monitoring

-- 1. ดู Top 10 Slow Queries
SELECT query_time, lock_time, rows_sent, sql_text
FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;

-- 2. ดู Table Sizes
SELECT 
    table_name,
    ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb,
    table_rows
FROM information_schema.tables
WHERE table_schema = 'your_database'
ORDER BY (data_length + index_length) DESC;

-- 3. ดู Index Usage
SELECT 
    table_name,
    index_name,
    ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_mb
FROM mysql.innodb_index_stats
WHERE database_name = 'your_database'
AND stat_name = 'size'
ORDER BY stat_value DESC;

-- 4. ดู Running Queries
SHOW PROCESSLIST;

-- 5. ดู Partition Information
SELECT 
    partition_name,
    partition_expression,
    partition_description,
    table_rows,
    ROUND(data_length / 1024 / 1024, 2) AS data_mb
FROM information_schema.partitions
WHERE table_name = 'orders'
ORDER BY partition_ordinal_position;

Checklist: เมื่อไหร่ต้องใช้เทคนิคไหน?

ใช้ Indexing Optimization เมื่อ:

ตารางมี 1-10 ล้านเรคคอร์ด
Query ช้าแต่ยังไม่ถึงขั้นต้องแบ่งตาราง
เซิร์ฟเวอร์ยังมีทรัพยากรเพียงพอ
ต้องการ solution ที่ง่ายและเร็ว

ใช้ Partitioning เมื่อ:

ตารางมี 10-500 ล้านเรคคอร์ด
มี Time-series data (logs, transactions)
ต้องการลบข้อมูลเก่าเป็น batch
ต้องการเก็บข้อมูลในเซิร์ฟเวอร์เดียว

ใช้ Sharding เมื่อ:

ตารางมี 500 ล้าน+ เรคคอร์ด
เซิร์ฟเวอร์เดียวไม่พอ (Vertical Scaling ไม่ไหว)
ต้องการ High Availability
มีทีมพร้อมรับความซับซ้อน

ใช้ Distributed Database เมื่อ:

ตารางมี 1 พันล้าน+ เรคคอร์ด
ต้องการ Global Distribution
Sharding แบบ manual ใช้ไม่ไหว
พิจารณา: TiDB, CockroachDB, PlanetScale

คำถามที่พบบ่อย (FAQ)

Q: Partitioning vs Sharding เลือกอะไรดี?

A: เริ่มจาก Partitioning ก่อนเสมอ เพราะง่ายกว่าและ MySQL จัดการเอง ใช้ Sharding เมื่อเซิร์ฟเวอร์เดียวไม่พอแล้ว (เช่น RAM ไม่พอสำหรับ Buffer Pool)

Q: MySQL รองรับตารางใหญ่สุดเท่าไหร่?

A: InnoDB รองรับตารางสูงสุด 64TB และ ~1 ล้านล้าน rows แต่ในทางปฏิบัติ ควรใช้ Partitioning เมื่อเกิน 100 ล้าน rows เพื่อ performance ที่ดี

Q: ต้องเปลี่ยน Application มั้ยถ้าใช้ Partitioning?

A: ไม่ต้อง! Partitioning โปร่งใสต่อ Application แต่ควรแน่ใจว่า Query ใช้ Partition Key เพื่อให้ได้ประโยชน์จาก Partition Pruning

Q: Primary Key ต้องรวม Partition Key ทำไม?

A: เพราะแต่ละ Partition มี B+Tree Index ของตัวเอง หาก Primary Key ไม่รวม Partition Key MySQL จะไม่รู้ว่า row อยู่ partition ไหน

สรุป

การออกแบบ MySQL สำหรับฐานข้อมูลขนาดใหญ่ต้องพิจารณาหลายปัจจัย:

📊

ระดับข้อมูล

1-10M: Indexing
10-100M: Partitioning
100M+: Sharding
1B+: Distributed DB

⚡

Performance Tips

Composite + Covering Index
Batch Operations
Cursor Pagination
EXPLAIN ทุก Query

🔧

Configuration

Buffer Pool = 70% RAM
NVMe SSD Storage
Monitor Slow Queries
Regular ANALYZE

โครงสร้าง MySQL Large Scale Architecture

บทนำ: ทำไมต้องออกแบบฐานข้อมูลขนาดใหญ่?

สิ่งที่ต้องเตรียม (Prerequisites)

Hardware Requirements

Software Requirements

ปัญหาที่พบบ่อยในตารางขนาดใหญ่

ปัญหาที่ 1: Query ช้าลงเมื่อตารางใหญ่ขึ้น

ทางแก้ไข:

ปัญหาที่ 2: INSERT ช้าลงเรื่อยๆ

ทางแก้ไข:

ปัญหาที่ 3: Lock Contention และ Deadlocks

ทางแก้ไข:

ปัญหาที่ 4: Maintenance Operations ใช้เวลานาน

ทางแก้ไข:

กลยุทธ์การสร้าง Index สำหรับตารางใหญ่

Indexing Best Practices

ควรทำ

ไม่ควรทำ

ตัวอย่าง: Composite Index (Index รวมหลาย Column)

ตัวอย่าง: Covering Index (Index ครอบคลุมทุก Column ที่ต้องการ)

กรณีศึกษา: เพิ่ม Performance จาก 30 วินาที เหลือ 50ms

MySQL Partitioning: แบ่งตารางใหญ่เป็นส่วนย่อย

Partitioning คืออะไร?

ประเภทของ Partitioning

1. RANGE Partitioning (แบ่งตามช่วง)

2. LIST Partitioning (แบ่งตามค่าที่กำหนด)

3. HASH Partitioning (แบ่งตาม Hash Function)

4. KEY Partitioning (แบ่งตาม MySQL's Internal Hash)

ตัวอย่างการใช้งานจริง: ระบบ Logging

📊 Performance Comparison

MySQL Sharding: กระจายข้อมูลข้ามหลายเซิร์ฟเวอร์

Sharding คืออะไร?

Sharding Strategies

1. Hash-based Sharding

2. Range-based Sharding

3. Geography-based Sharding

MySQL Performance Tuning สำหรับตารางใหญ่

MySQL Configuration (my.cnf)

Query Optimization Tips

1. ใช้ EXPLAIN ทุกครั้ง

2. หลีกเลี่ยง SELECT *

3. ใช้ Batch Operations

4. Pagination ที่ถูกต้อง

Query สำหรับ Monitoring

Checklist: เมื่อไหร่ต้องใช้เทคนิคไหน?

ใช้ Indexing Optimization เมื่อ:

ใช้ Partitioning เมื่อ:

ใช้ Sharding เมื่อ:

ใช้ Distributed Database เมื่อ:

คำถามที่พบบ่อย (FAQ)

Q: Partitioning vs Sharding เลือกอะไรดี?

Q: MySQL รองรับตารางใหญ่สุดเท่าไหร่?

Q: ต้องเปลี่ยน Application มั้ยถ้าใช้ Partitioning?

Q: Primary Key ต้องรวม Partition Key ทำไม?

สรุป

ระดับข้อมูล

Performance Tips

Configuration