Proxmox Add node to cluster and HA

  • Category: 電腦相關
  • Last Updated: Thursday, 06 July 2017 17:21
  • Published: Thursday, 06 July 2017 16:58
  • Written by sam

Now status

root@px157:/etc/pve# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 13:49:32 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/144
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157 (local)
0x00000002          1 10.0.252.158
0x00000003          1 10.0.252.159

Install new node and setup

root@px160:~# apt update &;& apt dist-upgrade

Network

root@px160:~# vi /etc/network/interfaces

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.0.252.160
        netmask 255.255.255.0
        gateway 10.0.252.253
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0

iface eno2 inet manual
auto vmbr1
iface vmbr1 inet static
        address 10.56.56.160
        netmask 255.255.255.0
        bridge_ports eno2
        bridge_stp off
        bridge_fd 0

root@px160:~# ifup vmbr1
root@px160:~# ping 10.56.56.157
PING 10.56.56.157 (10.56.56.157) 56(84) bytes of data.
64 bytes from 10.56.56.157: icmp_seq=1 ttl=64 time=0.098 ms

Add to cluster

root@px160:~# pvecm add 10.0.252.157
The authenticity of host '10.0.252.157 (10.0.252.157)' can't be established.
ECDSA key fingerprint is SHA256:PJRC6MdQfYMlD6IN4u+Wa7JeVJshKFm2okN9XG9Zu1c.
Are you sure you want to continue connecting (yes/no)? yes
This email address is being protected from spambots. You need JavaScript enabled to view it.'s password:
copy corosync auth key
stopping pve-cluster service
backup old database
waiting for quorum...OK
generating node certificates
merge known_hosts file
restart services
successfully added node 'px160' to cluster.

Check pvecm status

root@px157:/etc/pve# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 14:04:23 2017
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1/148
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157 (local)
0x00000002          1 10.0.252.158
0x00000003          1 10.0.252.159
0x00000004          1 10.0.252.160

Install pveceph to add new osd and mon

root@px160:~# pveceph install
root@px160:~# pveceph createmon
ceph-mon: set fsid to 698c4b1b-9010-4dae-ae9e-1d70d43d48e9
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-3 for mon.3
Created symlink /etc/systemd/system/ceph-mon.target.wants/This email address is being protected from spambots. You need JavaScript enabled to view it. -> /lib/systemd/system/[email protected].
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'electing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'electing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'electing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'electing'
INFO:ceph-create-keys:Talking to monitor...
exported keyring for client.admin
updated caps for client.admin
INFO:ceph-create-keys:Talking to monitor...
INFO:ceph-create-keys:Talking to monitor...
INFO:ceph-create-keys:Talking to monitor...

then add new osd

root@px160:~# fdisk -l /dev/sdb
Disk /dev/sdb: 558.4 GiB, 599550590976 bytes, 1170997248 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 36C7DB48-6E50-47EF-986E-E1A1A075B83B

Device      Start        End    Sectors   Size Type
/dev/sdb1    2048     206847     204800   100M Ceph OSD
/dev/sdb2  206848 1170997214 1170790367 558.3G unknown

my /dev/sdb is used before, need delete old partition beforce use

root@px160:~# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Partition number (1,2, default 2):

Partition 2 has been deleted.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Create osd

root@px160:~# pveceph createosd /dev/sdb
The operation has completed successfully.

Check ceph status (because we add new osd to the pool, just watting

root@px160:~# ceph status
  cluster:
    id:     698c4b1b-9010-4dae-ae9e-1d70d43d48e9
    health: HEALTH_WARN
            57 pgs backfill_wait
            26 pgs degraded
            26 pgs recovery_wait
            5 pgs stuck unclean
            recovery 3307/30432 objects degraded (10.867%)
            recovery 4552/30432 objects misplaced (14.958%)

  services:
    mon: 4 daemons, quorum 0,1,2,3
    mgr: 0(active), standbys: 1, 2
    osd: 4 osds: 4 up, 4 in; 57 remapped pgs

  data:
    pools:   1 pools, 128 pgs
    objects: 10144 objects, 39366 MB
    usage:   118 GB used, 2114 GB / 2233 GB avail
    pgs:     3307/30432 objects degraded (10.867%)
             4552/30432 objects misplaced (14.958%)
             57 active+remapped+backfill_wait
             45 active+clean
             26 active+recovery_wait+degraded

  io:
    client:   253 kB/s wr, 0 op/s rd, 42 op/s wr
    recovery: 24191 kB/s, 6 objects/s

from ceph health detail

recovery 1857/30432 objects degraded (6.102%)
recovery 4472/30432 objects misplaced (14.695%)

And keep going

root@px160:~# cp /etc/pve/priv/ceph.client.admin.keyring /etc/pve/priv/ceph/ceph.keyring
root@px160:~# vi /etc/pve/storage.cfg ###add new ip to monhost
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

rbd: ceph
        monhost 10.56.56.157;10.56.56.158;10.56.56.159;10.56.56.160
        content rootdir,images
        krbd 1
        pool ceph
        username admin

nfs: abc
        export /mnt/DATA
        path /mnt/pve/abc
        server 10.0.252.231
        content images,vztmpl,backup,iso,rootdir
        maxfiles 365
        options vers=3

And that's all for new ceph osd and mon to exist cluster.

If done, go to next step.

root@px160:~# ceph status
  cluster:
    id:     698c4b1b-9010-4dae-ae9e-1d70d43d48e9
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum 0,1,2,3
    mgr: 0(active), standbys: 1, 2
    osd: 4 osds: 4 up, 4 in

  data:
    pools:   1 pools, 128 pgs
    objects: 10260 objects, 39850 MB
    usage:   119 GB used, 2113 GB / 2233 GB avail
    pgs:     128 active+clean

  io:
    client:   182 kB/s wr, 0 op/s rd, 16 op/s wr

For test HA,

Ready your mustdie vm, and mustdie node

Here is my

root@px159:/# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
      6543 wanttodie            running    512               10.00 31358

Add to HA

root@px159:~# ha-manager add vm:6543 --group sam
root@px159:~# ha-manager set vm:6543 --state started
root@px159:~# ha-manager config
vm:6543
  state started

root@px159:~# ha-manager status
quorum OK
master px157 (active, Thu Jul  6 15:21:40 2017)
lrm px157 (active, Thu Jul  6 15:21:48 2017)
lrm px158 (active, Thu Jul  6 15:21:41 2017)
lrm px159 (active, Thu Jul  6 15:21:41 2017)
lrm px160 (active, Thu Jul  6 15:21:46 2017)
service vm:109 (px158, started)
service vm:111 (px158, started)
service vm:113 (px157, started)
service vm:114 (px157, started)
service vm:6543 (px160, started)

Now I want to do let node px159 lose power and come back

root@px160:~# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 15:33:57 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000004
Ring ID:          1/152
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157
0x00000002          1 10.0.252.158
0x00000004          1 10.0.252.160 (local)

Now wait ceph ready

root@px160:~# ceph health
HEALTH_ERR 1 host (1 osds) down; 1 osds down; 1 mons down, quorum 0,1,3 0,1,3; 103 pgs are stuck inactive for more than 300 seconds; 103 pgs degraded; 103 pgs stuck degraded; 103 pgs stuck inactive; 103 pgs stuck unclean; 103 pgs stuck undersized; 103 pgs undersized; 158 requests are blocked > 32 sec; 3 osds have slow requests; recovery 8383/31032 objects degraded (27.014%)

ok, working done

root@px160:~# ceph health
HEALTH_WARN 1 mons down, quorum 0,1,3 0,1,3

and our vm 6543 is up, and auto move to px157

Let px159 back up, everything is fine.

root@px160:~# ceph health
HEALTH_OK

Let us do last test, if px160 died and never come back?

root@px159:~# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 16:25:32 2017
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000003
Ring ID:          1/156
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157
0x00000002          1 10.0.252.158
0x00000003          1 10.0.252.159 (local)
0x00000004          1 10.0.252.160

poweroff px160

root@px159:~# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 16:27:13 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          1/160
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157
0x00000002          1 10.0.252.158
0x00000003          1 10.0.252.159 (local)


root@px159:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 px157
         2          1 px158
         3          1 px159 (local)

Delete it

root@px159:~# pvecm delnode px160
Killing node 4

root@px159:~# pvecm status
Quorum information
------------------
Date:             Thu Jul  6 16:29:29 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          1/160
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.252.157
0x00000002          1 10.0.252.158
0x00000003          1 10.0.252.159 (local)

My px160 is mon.3 and osd.3

root@px159:~# ceph osd out osd.3
marked out osd.3.
root@px159:~# ceph osd crush remove osd.3
removed item id 3 name 'osd.3' from crush map
root@px159:~# ceph auth del osd.3
updated
root@px159:~# ceph osd rm osd.3
removed osd.3

mon

root@px159:~# ceph mon remove 3
removing mon.3 at 10.56.56.160:6789/0, there will be 3 monitors

del about px160

root@px159:~# vi /etc/pve/storage.cfg
root@px159:/etc/pve# vi ceph.conf
root@px159:/etc/pve/ha# vi groups.cfg
root@px159:/etc/pve# vi storage.cfg
root@px159:/etc/pve/nodes# rm -rf px160/
root@px159:/etc/pve/priv# vi authorized_keys
root@px159:/etc/pve# ceph -w
  cluster:
    id:     698c4b1b-9010-4dae-ae9e-1d70d43d48e9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: 0(active), standbys: 1, 2
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   1 pools, 128 pgs
    objects: 10345 objects, 40188 MB
    usage:   119 GB used, 1555 GB / 1674 GB avail
    pgs:     128 active+clean

  io:
    client:   283 kB/s wr, 0 op/s rd, 39 op/s wr

Now, we can reinstall node and add to our cluster like article start.