Ceph back to nornal

  • Category: 電腦相關
  • Last Updated: Friday, 30 June 2017 15:35
  • Published: Thursday, 29 June 2017 10:52
  • Written by sam

繼上篇將vm guest 搬至新機之後,可以檢查ceph的狀況了

初估是因為我們無限制的開啟新的guest,導致了嚴重的超標,在調整之前pools是220%的使用

目前也只降至148%

先從這個基本的問題下手(但這個也是主要的問題)

root@uat157:~# systemctl status This email address is being protected from spambots. You need JavaScript enabled to view it.This email address is being protected from spambots. You need JavaScript enabled to view it. - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/[email protected]; enabled)
  Drop-In: /lib/systemd/system/[email protected]
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: start-limit) since Tue 2017-06-20 17:10:33 CST; 1 weeks 0 days ago
  Process: 28325 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=28)
 Main PID: 28325 (code=exited, status=28)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

root@uat157:~# systemctl status This email address is being protected from spambots. You need JavaScript enabled to view it.This email address is being protected from spambots. You need JavaScript enabled to view it. - Ceph object storage daemon
   Loaded: loaded (/lib/systemd/system/[email protected]; enabled)
  Drop-In: /lib/systemd/system/[email protected]
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Wed 2017-01-25 17:14:01 CST; 5 months 1 days ago
 Main PID: 6634 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/This email address is being protected from spambots. You need JavaScript enabled to view it.
           └─6634 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

以上可以見到osd的部份是正常的,但在mon的部份則是異常,並且無法取到參數的部份

再來看ceph的狀態,以下輸出可以見到在quorum的部份,和之前ui介面相同,0的部份已不在清單

root@uat163:~# ceph
ceph> health
HEALTH_ERR 1 full osd(s); 3 near full osd(s); nearfull,full,sortbitwise,require_jewel_osds flag(s) set; 1 mons down, quorum 1,2,3,4,5,6 1,2,3,4,5,6; mon.4 low disk space

讓我們直接進入ceph察看,相同是一個down,1個full,3個接近滿的狀態

ceph> status
    cluster 25654002-7cca-4ad3-89c5-a837e99796a8
     health HEALTH_ERR
            1 full osd(s)
            3 near full osd(s)
            nearfull,full,sortbitwise,require_jewel_osds flag(s) set
            1 mons down, quorum 1,2,3,4,5,6 1,2,3,4,5,6
            mon.4 low disk space
     monmap e7: 7 mons at {0=10.56.56.157:6789/0,1=10.56.56.158:6789/0,2=10.56.56.159:6789/0,3=10.56.56.160:6789/0,4=10.56.56.161:6789/0,5=10.56.56.162:6789/0,6=10.56.56.163:6789/0}
            election epoch 98, quorum 1,2,3,4,5,6 1,2,3,4,5,6
     osdmap e99: 7 osds: 7 up, 7 in
            flags nearfull,full,sortbitwise,require_jewel_osds
      pgmap v8444950: 512 pgs, 2 pools, 1110 GB data, 278 kobjects
            3341 GB used, 530 GB / 3871 GB avail
                 512 active+clean

這邊順便看一下ceph的磁碟狀態,以下圖來看,osd6的部份算是非常的高

root@uat163:~# ceph osd perf
osd fs_commit_latency(ms) fs_apply_latency(ms)
  6                   130                  276
  5                    31                   51
  4                    26                   41
  3                    27                   44
  2                    28                   44
  1                    28                   46
  0                    24                   34
root@uat163:~# ceph osd perf
osd fs_commit_latency(ms) fs_apply_latency(ms)
  6                   130                  276
  5                    31                   51
  4                    26                   41
  3                    27                   44
  2                    28                   44
  1                    28                   46
  0                    24                   34

先來手動一下把mon0跑起來

root@uat157:~# /usr/bin/ceph-mon -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
starting mon.0 rank 0 at 10.56.56.157:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 25654002-7cca-4ad3-89c5-a837e99796a8

如果無法啟動並告知過滿,可以暫先採用以下指令(這是預設值,所以請改小,也可以從系統碟清理著手)

mon data avail warn = 5
mon data avail crit  = 5

正常啟動mon之後,來看一下要砍誰

root@uat163:~# rbd ls ceph-vm -l
NAME                         SIZE PARENT FMT PROT LOCK
base-8889-disk-1           61440M          2
base-8889-disk-1@__base__  61440M          2 yes
vm-100-disk-1              10240M          2      excl
vm-101-disk-1                110G          2      excl
vm-101250-disk-1           32768M          2      excl
vm-102-disk-1             102400M          2      excl
vm-103-disk-1             102400M          2      excl
vm-104-disk-1             102400M          2      excl
vm-104002-disk-1           61440M          2      excl
vm-104002-disk-2          102400M          2      excl
vm-104003-disk-1           61440M          2      excl
vm-104004-disk-1           61440M          2      excl
vm-104005-disk-1           61440M          2      excl
vm-104006-disk-1           61440M          2      excl
vm-104007-disk-1           61440M          2      excl
vm-104008-disk-1           61440M          2      excl
vm-104009-disk-1           61440M          2      excl
vm-104010-disk-1           61440M          2      excl
vm-104011-disk-1           61440M          2      excl
vm-104012-disk-1           61440M          2      excl
vm-104013-disk-1           61440M          2      excl
vm-104014-disk-1           40960M          2      excl
vm-104014-disk-2             200G          2
vm-104014-disk-3             200G          2      excl
vm-104015-disk-1           40960M          2      excl
vm-104015-disk-2             200G          2      excl
vm-104016-disk-1           40960M          2      excl
vm-104017-disk-1           61440M          2      excl
vm-104018-disk-1           61440M          2      excl
vm-105-disk-1             102400M          2      excl
vm-106-disk-1             102400M          2      excl
vm-10601-disk-1              300G          2
vm-107-disk-1             102400M          2      excl
vm-108-disk-1              10240M          2      excl
vm-109-disk-1              51200M          2      excl
vm-110-disk-1              51200M          2      excl
vm-111-disk-1              51200M          2      excl
vm-112-disk-1              51200M          2      excl
vm-113-disk-1              51200M          2      excl
vm-114-disk-1              51200M          2      excl
vm-115-disk-1                200G          2      excl
vm-117882-disk-1           61440M          2      excl
vm-117888-disk-1          102400M          2      excl
vm-204200-disk-1           32768M          2      excl
vm-391-disk-1              20480M          2
vm-821-disk-1              32768M          2

找一個最不重要的測試機下手,當然是出現了錯誤,過滿,無法動作

root@uat163:~# rbd remove vm-112-disk-1 --pool ceph-vm
2017-06-28 11:23:50.359616 7fbbfdb2a700  0 client.19374307.objecter  FULL, paused modify 0x7fbbe4007bd0 tid 6

看一下目前osd滿載程度

root@uat163:/var/lib/ceph/osd/ceph-3/temp/dump/dump# ceph health detail
HEALTH_ERR 1 full osd(s); 3 near full osd(s); nearfull,full,sortbitwise,require_jewel_osds flag(s) set; mon.0 low disk space, shutdown imminent; mon.4 low disk space
osd.6 is full at 95%
osd.2 is near full at 89%
osd.3 is near full at 91%
osd.5 is near full at 92%
nearfull,full,sortbitwise,require_jewel_osds flag(s) set
mon.0 low disk space, shutdown imminent -- 3% avail
mon.4 low disk space -- 21% avail

加大一下滿載的ratio,再執行一次指令就行了(至此也能到proxmox pv 的介面去做了

root@uat163:/var/lib/ceph/osd/ceph-3/temp/dump/dump# rbd remove vm-112-disk-1 --pool ceph-vm
Removing image: 39% complete

正常後再檢查一下,似乎都正常了,連延遲也降低

root@uat163:~# ceph osd perf
osd fs_commit_latency(ms) fs_apply_latency(ms)
  6                     9                   17
  5                     7                    8
  4                     6                    7
  3                     4                    5
  2                     4                    4
  1                     5                    6
  0                     8                    9
root@uat163:~# ceph
ceph> status
    cluster 25654002-7cca-4ad3-89c5-a837e99796a8
     health HEALTH_ERR
            1 pgs inconsistent
            3 near full osd(s)
            1 scrub errors
            mon.0 low disk space, shutdown imminent
            mon.4 low disk space
     monmap e7: 7 mons at {0=10.56.56.157:6789/0,1=10.56.56.158:6789/0,2=10.56.56.159:6789/0,3=10.56.56.160:6789/0,4=10.56.56.161:6789/0,5=10.56.56.162:6789/0,6=10.56.56.163:6789/0}
            election epoch 110, quorum 0,1,2,3,4,5,6 0,1,2,3,4,5,6
     osdmap e118: 7 osds: 7 up, 7 in
            flags nearfull,sortbitwise,require_jewel_osds
      pgmap v8493948: 512 pgs, 2 pools, 1051 GB data, 263 kobjects
            3165 GB used, 706 GB / 3871 GB avail
                 511 active+clean
                   1 active+clean+inconsistent
  client io 191 kB/s wr, 0 op/s rd, 16 op/s wr

上表有列出其中有個error,再次檢查並修復它

root@uat163:~# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 3 near full osd(s); 1 scrub errors; mon.0 low disk space, shutdown imminent; mon.4 low disk space
pg 1.64 is active+clean+inconsistent, acting [5,2,4]
osd.3 is near full at 86%
osd.5 is near full at 87%
osd.6 is near full at 90%
1 scrub errors
mon.0 low disk space, shutdown imminent -- 3% avail
mon.4 low disk space -- 21% avail

使用指令修復它

root@uat163:~# ceph pg repair 1.64
instructing pg 1.64 on osd.5 to repair

正常了,除了空間問題

root@uat163:~# ceph health detail
HEALTH_ERR 3 near full osd(s); mon.0 low disk space, shutdown imminent; mon.4 low disk space
osd.3 is near full at 86%
osd.5 is near full at 87%
osd.6 is near full at 90%
mon.0 low disk space, shutdown imminent -- 3% avail
mon.4 low disk space -- 21% avail