Linux Server + DevOps: คำสั่งสำคัญ การหา Error และการแก้ไขปัญหา

สารบัญ

1 คำสั่งสำคัญที่ใช้บ่อย

2 การตรวจสอบระบบ

3 การวิเคราะห์ Log

4 การแก้ไขปัญหาเครือข่าย

5 การจัดการ Disk/Storage

6 การแก้ไขปัญหาที่พบบ่อย

1. คำสั่งสำคัญที่ใช้บ่อย

คำสั่งเหล่านี้เป็นคำสั่งพื้นฐานที่ DevOps Engineer และ System Administrator ต้องใช้ทุกวัน การเข้าใจและใช้งานคำสั่งเหล่านี้ได้อย่างคล่องแคล่วจะช่วยให้การทำงานมีประสิทธิภาพมากขึ้น

1.1 การจัดการ Process

ps - แสดง Process ที่ทำงานอยู่

ps aux | grep nginx

ค้นหา process ของ nginx ที่กำลังทำงานอยู่

Output ตัวอย่าง:
root 1234 0.0 0.5 nginx: master process
www 1235 0.0 0.3 nginx: worker process

top / htop - ตรวจสอบ Process แบบ Real-time

top -c

แสดง process พร้อม command line เต็ม กด q เพื่อออก

Tip: ใช้ htop แทน top หากต้องการ UI ที่อ่านง่ายกว่า

kill - หยุด Process

kill -9 1234

killall nginx

-9 = SIGKILL (บังคับหยุด)
-15 = SIGTERM (หยุดแบบปกติ)

pgrep / pkill - ค้นหาและหยุด Process ด้วยชื่อ

pgrep -f "python script.py"

pkill -f "python script.py"

ค้นหา/หยุด process จาก command line ที่ตรงกัน

1.2 การจัดการไฟล์

find - ค้นหาไฟล์

find /var/log -name "*.log" -mtime +7

ค้นหาไฟล์ .log ที่เก่ากว่า 7 วัน

find /tmp -type f -size +100M

ค้นหาไฟล์ที่มีขนาดใหญ่กว่า 100MB

tar - บีบอัดและแตกไฟล์

tar -czvf backup.tar.gz /var/www

บีบอัดโฟลเดอร์ /var/www

tar -xzvf backup.tar.gz -C /restore/

แตกไฟล์ไปยัง /restore/

rsync - ซิงค์ไฟล์ (สำรองข้อมูล)

rsync -avz --progress /var/www/ user@backup-server:/backup/www/

-a = archive mode
-v = verbose
-z = compress during transfer
--progress = แสดงความคืบหน้า

chmod / chown - จัดการ Permission

chmod 755 /var/www/html

chown -R www-data:www-data /var/www

755 = rwxr-xr-x (owner ทุกอย่าง, อื่นๆ อ่าน+รัน)
644 = rw-r--r-- (owner เขียน, อื่นๆ อ่าน)

1.3 การประมวลผล Text

grep - ค้นหาข้อความในไฟล์

grep -r "error" /var/log/

grep -i "ERROR\|FATAL" /var/log/syslog

grep -A 5 -B 5 "exception" app.log

-i = ไม่สนใจ case
-A = แสดงบรรทัดหลัง
-B = แสดงบรรทัดก่อน
-r = recursive

sed - แก้ไขข้อความ

sed -i 's/old/new/g' config.txt

sed -n '10,20p' file.txt

-i = แก้ไขไฟล์ตรงๆ
s/old/new/g = replace ทุกตัว
-n '10,20p' = แสดงบรรทัด 10-20

awk - ประมวลผลข้อมูลแบบคอลัมน์

awk '{print $1, $3}' access.log

df -h | awk '$5 > 80 {print $0}'

$1, $2, ... = คอลัมน์ที่ 1, 2, ...
$0 = ทั้งบรรทัด

tail / head - แสดงท้าย/หัวไฟล์

tail -f /var/log/nginx/error.log

tail -n 100 /var/log/syslog | grep error

head -n 20 config.yaml

-f = follow (อัปเดตแบบ real-time)
-n = จำนวนบรรทัด

2. การตรวจสอบระบบ (System Monitoring)

การตรวจสอบระบบเป็นหัวใจของการทำ DevOps การรู้ว่าระบบมีปัญหาที่ไหนจะช่วยให้แก้ไขได้เร็วขึ้น

2.1 ตรวจสอบ CPU

mpstat - สถิติ CPU แต่ละ Core

mpstat -P ALL 1 5

แสดงสถิติ CPU ทุก core ทุก 1 วินาที เป็นเวลา 5 ครั้ง

CPU %usr %nice %sys %iowait %idle

all 15.2 0.0 3.5 0.8 80.5

0 12.1 0.0 2.3 0.5 85.1

1 18.3 0.0 4.7 1.1 75.9

lscpu - ข้อมูล CPU

lscpu

Architecture: x86_64

CPU(s): 8

Thread(s) per core: 2

Core(s) per socket: 4

Model name: Intel Xeon

Load Average - ภาระงานของระบบ

uptime

load average: 0.50, 0.75, 1.20

ความหมายของค่า:
• 0.50 = ภาระใน 1 นาทีที่ผ่านมา
• 0.75 = ภาระใน 5 นาทีที่ผ่านมา
• 1.20 = ภาระใน 15 นาทีที่ผ่านมา

Warning: หากค่า Load Average สูงกว่าจำนวน CPU cores = ระบบทำงานหนักเกินไป

2.2 ตรวจสอบ Memory

free - แสดงการใช้ Memory

free -h

total used free shared buff/cache available

Mem: 15Gi 8.2Gi 1.1Gi 512Mi 5.7Gi 6.3Gi

Swap: 2.0Gi 512Mi 1.5Gi

Tip: ดูที่ค่า available ไม่ใช่ free เพราะ Linux ใช้ memory ที่เหลือเป็น cache

vmstat - สถิติ Virtual Memory

vmstat 1 10

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

2 0 524288 1114112 524288 6029312 0 0 15 30 100 200 10 5 85 0 0

si = swap in, so = swap out
หากค่าสูง = ระบบใช้ swap มาก = memory ไม่พอ

smem - แสดง Memory ตาม Process (ต้องติดตั้ง)

smem -t -k -s rss | tail -n 10

แสดง process ที่ใช้ memory มากที่สุด 10 อันดับแรก
-t = แสดง total, -k = แสดงเป็น KB/MB/GB, -s rss = เรียงตาม RSS

2.3 ตรวจสอบ Disk I/O

iostat - สถิติ Disk I/O

iostat -x 1 5

Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s await %util

sda 0.0 5.0 10.0 50.0 0.5 2.5 8.50 45.2

sdb 0.0 2.0 5.0 25.0 0.2 1.0 5.20 22.1

%util = % เวลาที่ disk ทำงาน
await = เวลารอเฉลี่ย (ms)

iotop - แสดง Process ที่ใช้ Disk มาก

sudo iotop -o

แสดงเฉพาะ process ที่กำลังใช้ disk อยู่

ต้องติดตั้ง: sudo apt install iotop

3. การวิเคราะห์ Log (Log Analysis)

Log เป็นแหล่งข้อมูลสำคัญที่สุดในการหา error การรู้วิธีอ่านและกรอง log จะช่วยแก้ปัญหาได้เร็วขึ้น

3.1 journalctl - Systemd Logs

คำสั่ง journalctl ที่ใช้บ่อย

journalctl -u nginx -f

ติดตาม log ของ nginx service แบบ real-time

journalctl --since "1 hour ago" -p err

แสดง log ระดับ error ในชั่วโมงที่ผ่านมา

journalctl --since "2026-02-13 10:00" --until "2026-02-13 12:00"

แสดง log ในช่วงเวลาที่กำหนด

journalctl -u docker --no-pager -n 100

แสดง 100 บรรทัดสุดท้ายของ docker service แบบไม่มี pager

journalctl -u mysql --since today | grep -i error

ค้นหา error ใน mysql log ของวันนี้

Priority Levels (-p)

emerg - ระบบใช้ไม่ได้

alert - ต้องดำเนินการทันที

crit - วิกฤต

err - error

warning - warning

notice - สำคัญปกติ

info - ข้อมูลทั่วไป

debug - debug

3.2 ไฟล์ Log สำคัญ

ไฟล์	คำอธิบาย	ตัวอย่างการใช้
/var/log/syslog	System messages (Ubuntu/Debian)	`tail -f /var/log/syslog`
/var/log/messages	System messages (RHEL/CentOS)	`tail -f /var/log/messages`
/var/log/auth.log	Authentication logs	`grep "Failed password" /var/log/auth.log`
/var/log/nginx/error.log	Nginx errors	`tail -100 /var/log/nginx/error.log`
/var/log/mysql/error.log	MySQL errors	`grep -i error /var/log/mysql/error.log`
/var/log/dmesg	Kernel ring buffer	`dmesg \| grep -i error`
/var/log/kern.log	Kernel logs	`grep -i "out of memory" /var/log/kern.log`

3.3 เทคนิคการวิเคราะห์ Log

ค้นหา Error หลายรูปแบบพร้อมกัน

grep -E "error|fail|critical|fatal" /var/log/syslog | tail -50

ใช้ -E สำหรับ extended regex และ | เป็น OR

นับจำนวน Error แต่ละประเภท

grep "error" /var/log/nginx/error.log | awk '{print $NF}' | sort | uniq -c | sort -nr | head -10

นับความถี่ของ error แต่ละประเภท แล้วเรียงจากมากไปน้อย

ตรวจสอบ Failed Login

grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq -c | sort -nr

แสดง IP ที่พยายาม login ผิดพลาดพร้อมจำนวนครั้ง

ตรวจสอบ OOM (Out of Memory)

dmesg | grep -i "out of memory" | tail -20

แสดง process ที่ถูก kill เพราะ memory หมด

4. การแก้ไขปัญหาเครือข่าย (Network Troubleshooting)

ปัญหาเครือข่ายเป็นปัญหาที่พบบ่อยที่สุดในการทำ DevOps การรู้วิธีตรวจสอบจะช่วยแก้ไขได้เร็วขึ้น

4.1 คำสั่งเครือข่ายพื้นฐาน

ip - แสดงข้อมูล Network Interface

ip addr show

ip route show

ip link set eth0 up/down

แทนที่คำสั่ง ifconfig ที่เลิกใช้แล้ว

ss - แสดง Socket Statistics

ss -tulpn

ss -s

-t = TCP, -u = UDP
-l = listening, -p = process, -n = numeric

ping / mtr - ทดสอบการเชื่อมต่อ

ping -c 5 google.com

mtr -r -c 10 google.com

mtr = traceroute + ping รวมกัน (ต้องติดตั้ง)

curl / wget - ทดสอบ HTTP

curl -I https://example.com

curl -v --connect-timeout 5 https://api.example.com

-I = headers only, -v = verbose

4.2 การแก้ไขปัญหา Port และ Connection

ตรวจสอบว่า Port เปิดอยู่หรือไม่

ss -tulpn | grep :80

ตรวจสอบ port 80

netstat -tulpn | grep :3306

ตรวจสอบ MySQL port 3306

lsof -i :8080

ดูว่า process ไหนใช้ port 8080

fuser 80/tcp

ดู PID ที่ใช้ port 80

ตรวจสอบ Connection ที่มีปัญหา

ss -tan | awk '{print $1}' | sort | uniq -c

นับสถานะการเชื่อมต่อ (ESTABLISHED, TIME_WAIT, etc.)

ss -tan state time-wait | wc -l

นับจำนวน TIME_WAIT connections

netstat -an | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10

แสดง IP ที่เชื่อมต่อมากที่สุด 10 อันดับ

tcpdump -i eth0 port 80 -c 100

จับ packet ที่ port 80 (ต้องมีสิทธิ์ root)

4.3 การแก้ไขปัญหา DNS

dig - DNS Lookup

dig google.com +short

dig @8.8.8.8 example.com

dig example.com MX

ตรวจสอบ DNS record จาก nameserver ที่กำหนด

nslookup / host

nslookup google.com

host -t A example.com

cat /etc/resolv.conf

ตรวจสอบ DNS resolver configuration

5. การจัดการ Disk/Storage

5.1 ตรวจสอบพื้นที่ Disk

df - แสดงพื้นที่ Disk

df -hT

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 100G 45G 50G 48% /

/dev/sdb1 xfs 500G 200G 300G 40% /data

tmpfs tmpfs 7.8G 0 7.8G 0% /dev/shm

du - แสดงขนาดไฟล์/โฟลเดอร์

du -sh /* | sort -rh | head -10

แสดงโฟลเดอร์ใหญ่สุด 10 อันดับใน /

du -sh /var/log/* | sort -rh | head -5

แสดงไฟล์ log ใหญ่สุด 5 ไฟล์

ncdu - Disk Usage Analyzer (Interactive)

ncdu /

เครื่องมือแสดงพื้นที่ disk แบบ interactive (ต้องติดตั้ง: apt install ncdu)

Tip: ใช้ ncdu เพื่อหาไฟล์ใหญ่ๆ ที่ไม่จำเป็นแล้วลบทิ้งเพื่อประหยัดพื้นที่

5.2 ตรวจสอบสุขภาพ Disk

smartctl - SMART Data

sudo smartctl -a /dev/sda

sudo smartctl -H /dev/sda

ตรวจสอบสุขภาพ disk ด้วย SMART data
ดูค่า Reallocated_Sector_Ct และ Current_Pending_Sector

lsblk - แสดง Block Devices

lsblk -f

NAME FSTYPE LABEL UUID MOUNTPOINT

sda

├─sda1 ext4 root a1b2c3d4-e5f6-7890-abcd-ef1234567890 /

└─sda2 swap 12345678-90ab-cdef-1234-567890abcdef [SWAP]

6. การแก้ไขปัญหาที่พบบ่อย (Troubleshooting)

ส่วนนี้รวบรวมปัญหาที่พบบ่อยในการทำงานจริง พร้อมวิธีการวิเคราะห์และแก้ไข

ปัญหาที่ 1: CPU ใช้งานสูงเกินไป

อาการ:

Server ช้าลง หรือไม่ตอบสนอง
Load Average สูงกว่าจำนวน CPU cores
Process บางตัวใช้ CPU 100%

ขั้นตอนการตรวจสอบ:

1. หา process ที่ใช้ CPU สูง:

top -o %CPU

2. ดูรายละเอียด process:

ps -p PID -o pid,ppid,cmd,%cpu,%mem --sort -%cpu

3. ดู thread ที่ใช้ CPU:

top -H -p PID

วิธีแก้ไข:

✅ หยุด process ชั่วคราว:

kill -STOP PID

✅ หยุด process ถาวร:

kill -9 PID

✅ จำกัด CPU usage:

cpulimit -p PID -l 50

✅ เปลี่ยน priority:

renice +10 -p PID

ปัญหาที่ 2: Memory หมด (OOM)

อาการ:

Process ถูก kill โดยอัตโนมัติ (OOM Killer)
ระบบ swap มาก ทำให้ช้า
available memory ต่ำมาก

ขั้นตอนการตรวจสอบ:

1. ตรวจสอบ memory:

free -h && swapon --show

2. ดู process ที่ใช้ memory มาก:

ps aux --sort=-%mem | head -10

3. ตรวจสอบ OOM events:

dmesg | grep -i "out of memory" | tail -20

4. ตรวจสอบ OOM score:

cat /proc/PID/oom_score

วิธีแก้ไข:

✅ เพิ่ม swap file:

sudo fallocate -l 2G /swapfile

sudo chmod 600 /swapfile

sudo mkswap /swapfile && sudo swapon /swapfile

✅ ป้องกัน process สำคัญถูก kill:

echo -1000 > /proc/PID/oom_score_adj

✅ Clear cache:

sync && echo 3 > /proc/sys/vm/drop_caches

✅ จำกัด memory process:

systemctl set-property apache.service MemoryMax=2G

ปัญหาที่ 3: Disk เต็ม

อาการ:

ไม่สามารถสร้างไฟล์ใหม่ได้
Service ล่ม หรือทำงานผิดปกติ
df -h แสดง Use% = 100%

ขั้นตอนการตรวจสอบ:

1. หาไฟล์ใหญ่:

du -sh /* 2>/dev/null | sort -rh | head -10

2. หาไฟล์ log ใหญ่:

find /var/log -type f -size +100M -exec ls -lh {} \;

3. หาไฟล์ที่ถูกลบแต่ยังใช้พื้นที่:

lsof +L1

วิธีแก้ไข:

✅ ลบไฟล์ log เก่า:

find /var/log -name "*.log" -mtime +30 -delete

✅ Compress log:

gzip /var/log/syslog.1

✅ ลบ package cache:

apt clean

apt autoremove

✅ ลบ journal logs เก่า:

journalctl --vacuum-time=7d

ปัญหาที่ 4: Service เริ่มไม่ได้

อาการ:

systemctl start service แล้ว failed
Service รันแล้ว crash ทันที
Status แสดง exited หรือ failed

ขั้นตอนการตรวจสอบ:

1. ตรวจสอบ status:

systemctl status nginx

2. ดู log ของ service:

journalctl -u nginx -n 50 --no-pager

3. ตรวจสอบ configuration:

nginx -t

4. ตรวจสอบ port:

ss -tulpn | grep :80

วิธีแก้ไข:

✅ Restart service:

systemctl restart nginx

✅ Reset failed state:

systemctl reset-failed nginx

✅ Enable service:

systemctl enable nginx

✅ Kill process ที่ใช้ port:

fuser -k 80/tcp

ปัญหาที่ 5: ปัญหาเครือข่าย

อาการ:

ไม่สามารถเชื่อมต่อ server ได้
DNS ไม่ทำงาน
Port ไม่เปิด

ขั้นตอนการตรวจสอบ:

1. ทดสอบการเชื่อมต่อ:

ping -c 4 8.8.8.8

2. ทดสอบ DNS:

nslookup google.com

3. ตรวจสอบ firewall:

iptables -L -n

4. ตรวจสอบ route:

ip route show

วิธีแก้ไข:

✅ เปิด port firewall:

ufw allow 80/tcp

✅ Restart network:

systemctl restart networking

✅ Flush DNS:

systemd-resolve --flush-caches

✅ Reset interface:

ip link set eth0 down

ip link set eth0 up

สรุป

คำสั่งสำคัญ

• ps, top, htop - Process
• grep, awk, sed - Text
• find, rsync, tar - Files
• journalctl - Logs

Monitoring

• top, mpstat - CPU
• free, vmstat - Memory
• df, du, iostat - Disk
• ss, netstat - Network

Troubleshooting

• CPU สูง → top, kill
• OOM → free, dmesg
• Disk เต็ม → du, ncdu
• Network → ping, ss, dig

เคล็ดลับสำคัญ

✓ อ่าน log ก่อนแก้ไขเสมอ - journalctl -u service

✓ ตรวจสอบ resource ก่อน restart - top, df -h, free -h

✓ ใช้ grep -E "error|fail" ค้นหาปัญหา

✓ ตรวจสอบ firewall ถ้าเชื่อมต่อไม่ได้

โครงสร้างการจัดการ Linux Server สำหรับ DevOps

สารบัญ

1. คำสั่งสำคัญที่ใช้บ่อย

1.1 การจัดการ Process

ps - แสดง Process ที่ทำงานอยู่

top / htop - ตรวจสอบ Process แบบ Real-time

kill - หยุด Process

pgrep / pkill - ค้นหาและหยุด Process ด้วยชื่อ

1.2 การจัดการไฟล์

find - ค้นหาไฟล์

tar - บีบอัดและแตกไฟล์

rsync - ซิงค์ไฟล์ (สำรองข้อมูล)

chmod / chown - จัดการ Permission

1.3 การประมวลผล Text

grep - ค้นหาข้อความในไฟล์

sed - แก้ไขข้อความ

awk - ประมวลผลข้อมูลแบบคอลัมน์

tail / head - แสดงท้าย/หัวไฟล์

2. การตรวจสอบระบบ (System Monitoring)

2.1 ตรวจสอบ CPU

mpstat - สถิติ CPU แต่ละ Core

lscpu - ข้อมูล CPU

Load Average - ภาระงานของระบบ

2.2 ตรวจสอบ Memory

free - แสดงการใช้ Memory

vmstat - สถิติ Virtual Memory

smem - แสดง Memory ตาม Process (ต้องติดตั้ง)

2.3 ตรวจสอบ Disk I/O

iostat - สถิติ Disk I/O

iotop - แสดง Process ที่ใช้ Disk มาก

3. การวิเคราะห์ Log (Log Analysis)

3.1 journalctl - Systemd Logs

คำสั่ง journalctl ที่ใช้บ่อย

Priority Levels (-p)

3.2 ไฟล์ Log สำคัญ

3.3 เทคนิคการวิเคราะห์ Log

ค้นหา Error หลายรูปแบบพร้อมกัน

นับจำนวน Error แต่ละประเภท

ตรวจสอบ Failed Login

ตรวจสอบ OOM (Out of Memory)

4. การแก้ไขปัญหาเครือข่าย (Network Troubleshooting)

4.1 คำสั่งเครือข่ายพื้นฐาน

ip - แสดงข้อมูล Network Interface

ss - แสดง Socket Statistics

ping / mtr - ทดสอบการเชื่อมต่อ

curl / wget - ทดสอบ HTTP

4.2 การแก้ไขปัญหา Port และ Connection

ตรวจสอบว่า Port เปิดอยู่หรือไม่

ตรวจสอบ Connection ที่มีปัญหา

4.3 การแก้ไขปัญหา DNS

dig - DNS Lookup

nslookup / host

5. การจัดการ Disk/Storage

5.1 ตรวจสอบพื้นที่ Disk

df - แสดงพื้นที่ Disk

du - แสดงขนาดไฟล์/โฟลเดอร์

ncdu - Disk Usage Analyzer (Interactive)

5.2 ตรวจสอบสุขภาพ Disk

smartctl - SMART Data

lsblk - แสดง Block Devices

6. การแก้ไขปัญหาที่พบบ่อย (Troubleshooting)

ปัญหาที่ 1: CPU ใช้งานสูงเกินไป

อาการ:

ขั้นตอนการตรวจสอบ:

วิธีแก้ไข:

ปัญหาที่ 2: Memory หมด (OOM)

อาการ:

ขั้นตอนการตรวจสอบ:

วิธีแก้ไข:

ปัญหาที่ 3: Disk เต็ม

อาการ:

ขั้นตอนการตรวจสอบ:

วิธีแก้ไข:

ปัญหาที่ 4: Service เริ่มไม่ได้

อาการ:

ขั้นตอนการตรวจสอบ:

วิธีแก้ไข:

ปัญหาที่ 5: ปัญหาเครือข่าย

อาการ:

ขั้นตอนการตรวจสอบ: