Ceph¶
Contents
Node reboot¶
Disable rebalancing temporarily
$ ceph osd set noout noout is set $ ceph osd set norebalance norebalance is set $ ceph -s cluster: id: xxx health: HEALTH_WARN noout,norebalance flag(s) set [...]
Reboot the node
$ sudo reboot
When the reboot is complete enable cluster rebalancing again
$ ceph osd unset noout noout is unset $ ceph osd unset norebalance norebalance is unset $ ceph -s cluster: id: xxx health: HEALTH_OK [...]
Cluster start and stop¶
http://docs.ceph.com/docs/mimic/rados/operations/operating/#running-ceph-with-systemd
https://www.openattic.org/posts/how-to-do-a-ceph-cluster-maintenanceshutdown/
Stop¶
Ensure that any services/clients using Ceph are stopped and that the cluster is in a healthy state.
Set OSD flags
$ ceph osd set noout $ ceph osd set nobackfill $ ceph osd set norecover $ ceph osd set norebalance $ ceph osd set nodown $ ceph osd set pause
$ ceph -s cluster: [...] health: HEALTH_WARN pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set services: [...] osd: x osds: y up, z in flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
Stop the management services (manager, mds, ..) (node by node)
$ sudo systemctl stop ceph-mgr\*.service
Stop the osd services (node by node)
$ sudo systemctl stop ceph-osd\*.service
Stop the monitor service (node by node)
$ sudo systemctl stop ceph-mon\*.service
Start¶
Start the monitor services (node by node)
$ sudo systemctl start ceph-mon\*.service
Start the osd services (node by node)
$ systemctl start ceph-osd@DEVICE.service
Start the management services (manager, mds, ..) (node by node)
$ sudo systemctl start ceph-mgr\*.service
Unset OSD flags
$ ceph osd unset pause $ ceph osd unset nodown $ ceph osd unset norebalance $ ceph osd unset norecover $ ceph osd unset nobackfill $ ceph osd unset noout
Check¶
$ sudo systemctl status ceph\*.service
$ ceph -s
cluster:
id: x
health: HEALTH_OK
services:
mon: 3 daemons, quorum A,B,C
mgr: A(active), standbys: B, C
mds: cephfs-0/0/1 up
osd: x osds: y up, z in
data:
pools: 7 pools, 176 pgs
objects: 2816 objects, 18856 MB
usage: 69132 MB used, 44643 GB / 44711 GB avail
pgs: 176 active+clean
Deep scrub distribution¶
Distribution per weekday:
$ for date in $(ceph pg dump | grep active | awk '{ print $20 })'; do date +%A -d $date; done | sort | uniq -c
Distribution per hours:
$ for date in $(ceph pg dump | grep active | awk '{ print $21 }'); do date +%H -d $date; done | sort | uniq -c
Set the number of placement groups¶
$ ceph osd pool set {pool-name} pg_num {pg_num}
set pool x pg_num to {pg_num}
$ ceph osd pool set {pool-name} pgp_num {pgp_num}
set pool x pgp_num to {pgp_num}
The new number of PGs should also be updated in environments/ceph/configuration.yml
.
1 pools have many more objects per pg than average¶
Set
mon pg warn max object skew = 0
$ ceph tell mon.* injectargs '--mon_pg_warn_max_object_skew 0'
Restart the active manager service
$ sudo systemctl restart ceph-mgr\*.service
environments/ceph/configuration.yml
##########################
# custom
ceph_conf_overrides:
global:
mon pg warn max object skew: 0
Logging¶
Ceph daemons are configured to log to the console instead of log files. OSDs are configured to log to MONs.
$ docker logs ceph-mon-ceph01
Logs can become very big.
docker logs
provides some useful parameters to only show newest logs and to see new log messages when they appear.$ docker logs --tail 100 --follow ceph-mon-ceph01
Add new OSD¶
Add the new device to the
devices
list in the inventory of the corresponding hostExecute
osism-ceph osds -l HOST
on the manager node
Remove OSD¶
Determine the OSD ID for the OSD to be removed
dragon@testbed-manager:~$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.03918 root default -3 0.01959 host testbed-node-0 1 hdd 0.00980 osd.1 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host testbed-node-1 0 hdd 0.00980 osd.0 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000
Determine the block device serverd by the OSD
dragon@testbed-node-0:~$ docker exec -it ceph-osd-3 ls -la /var/lib/ceph/osd/ceph-3/block lrwxrwxrwx 1 ceph ceph 92 Apr 2 15:10 /var/lib/ceph/osd/ceph-3/block -> /dev/ceph-f27fa071-baa4-4ee5-ba26-3b8a5d7231ec/osd-data-e5d0fe7f-c7dd-443d-9630-bf54ffba443e
dragon@testbed-node-0:~$ docker exec -it ceph-osd-3 lvs -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices osd-data-c5c106dd-7461-40ad-b5cc-28137fb639fc ceph-01de26c3-61fb-4f6c-9fb9-1f3cdfcba444 -wi-ao---- <10.00g /dev/sdb(0) osd-data-e5d0fe7f-c7dd-443d-9630-bf54ffba443e ceph-f27fa071-baa4-4ee5-ba26-3b8a5d7231ec -wi-ao---- <10.00g /dev/sdc(0)
Remove the device from the
devices
list in the inventory of the corresponding hostMark the OSD as out
dragon@testbed-manager:~$ ceph osd out osd.3 marked out osd.3.
Stop the ceph-osd service on the storage node where it is running
dragon@testbed-node-0:~$ sudo systemctl disable --now ceph-osd@3
Make sure it is safe to remove the osd
dragon@testbed-manager:~$ ceph osd safe-to-destroy osd.3 OSD(s) 3 are safe to destroy without reducing data durability.
Purge the OSD
dragon@testbed-manager:~$ ceph osd purge osd.3 --yes-i-really-mean-it purged osd.3
Verify the OSD is removed from the node in the CRUSH map
dragon@testbed-manager:~$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.02939 root default -3 0.00980 host testbed-node-0 1 hdd 0.00980 osd.1 up 1.00000 1.00000 -5 0.01959 host testbed-node-1 0 hdd 0.00980 osd.0 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000
Zap the block device
dragon@testbed-node-0:~$ sudo sgdisk --zap-all /dev/sdc Creating new GPT entries. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities.
Replace defect OSD¶
Locate defect OSD
$ ceph osd metadata osd.22 "bluefs_slow_dev_node": "sdk", "hostname": "ceph04", $ ssh ceph04 $ dmesg -T | grep sdk | grep -i error ... blk_update_request: I/O error, dev sdk, sector 7501476358 Buffer I/O error on dev sdk1, logical block 7470017030, async page read blk_update_request: I/O error, dev sdk, sector 7501476359 Buffer I/O error on dev sdk1, logical block 7470017031, async page read
Find and replace actual hardware
$ sudo udevadm info --query=all --name=/dev/sdk $ sudo hdparm -I /dev/sdk
disable defect OSD/disk
$ ceph osd out 22 $ sudo systemctl stop ceph-osd@sdk.service $ ceph osd purge osd.22
Prepare new OSD
$ docker start -ai ceph-osd-prepare-ceph04-sdk $ sudo systemctl start ceph-osd@sdk.service
Add OSD to tree
$ ceph osd df tree CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME 7.4 - 3709G 2422G 1287G 65.30 1.06 hdd ceph04-hdd hdd 3.7 0 0 0 0 0 0 osd.22 hdd 3.7 1.00000 3709G 2422G 1287G 65.30 1.08 osd.6 ... hdd 0.0 0 0 0 0 0 0 osd.27 $ ceph osd crush create-or-move osd.22 3.7 hdd=ceph04-hdd $ ceph osd df tree CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME 7.4 - 3709G 2422G 1287G 65.30 1.06 hdd ceph04-hdd hdd 3.7 1.00000 3709G 0 3709G 0 0 osd.22 hdd 3.7 1.00000 3709G 2422G 1287G 65.30 1.08 osd.6
Add new pool¶
$ ceph osd pool create sample 32 32
pool 'sample' created
$ ceph osd pool application enable sample rbd
enabled application 'rbd' on pool 'sample'
$ ceph auth get client.cinder
[client.cinder]
key = ...
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images"
exported keyring for client.cinder
$ ceph auth caps client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=sample'
updated caps for client.cinder
$ ceph auth get client.nova
[client.nova]
key = ...
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups"
exported keyring for client.nova
$ ceph auth caps client.nova mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=sample'
updated caps for client.nova
Export image¶
$ rbd export --pool=volumes volume-035f3636-ad68-4562-88f5-11d7e295d03e /home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img
$ docker cp cephclient_cephclient_1:/home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img /tmp
$ docker exec -it cephclient_cephclient_1 rm -f /home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img
$ rm -f /tmp/035f3636-ad68-4562-88f5-11d7e295d03e.img
Repair PGs¶
Health of Ceph cluster
$ sudo ceph status
cluster:
id: 0155072f-6a71-4f5c-8967-f86e5307033f
health: HEALTH_ERR
4 scrub errors
Possible data damage: 1 pg inconsistent
$ sudo ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 54.76 is active+clean+inconsistent, acting [39,6,15]
Repair the PG
$ sudo ceph pg repair 54.76
instructing pg 54.76 on osd.39 to repair
give the Ceph cluster some time for repair and check health
$ sudo ceph health detail
HEALTH_OK
$ sudo ceph status
cluster:
id: 0155072f-6a71-4f5c-8967-f86e5307033f
health: HEALTH_OK
Rebalance the cluster¶
Test what OSDs would be affected by teh reweight
$ sudo ceph osd test-reweight-by-utilization
no change
moved 6 / 4352 (0.137868%)
avg 51.8095
stddev 12.3727 -> 12.3621 (expected baseline 7.15491)
min osd.10 with 30 -> 30 pgs (0.579044 -> 0.579044 * mean)
max osd.68 with 92 -> 92 pgs (1.77574 -> 1.77574 * mean)
oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.4187
overload_utilization 0.5025
osd.14 weight 0.9500 -> 0.9000
osd.27 weight 0.9500 -> 0.9000
osd.37 weight 0.9500 -> 0.9000
osd.29 weight 1.0000 -> 0.9500
If the OSDs match your “fullest” OSDs execute the reweight
$ sudo ceph osd reweight-by-utilization
no change
moved 6 / 4352 (0.137868%)
avg 51.8095
stddev 12.3727 -> 12.3621 (expected baseline 7.15491)
min osd.10 with 30 -> 30 pgs (0.579044 -> 0.579044 * mean)
max osd.68 with 92 -> 92 pgs (1.77574 -> 1.77574 * mean)
oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.4187
overload_utilization 0.5025
osd.14 weight 0.9500 -> 0.9000
osd.27 weight 0.9500 -> 0.9000
osd.37 weight 0.9500 -> 0.9000
osd.29 weight 1.0000 -> 0.9500
Wait for the cluster to rebalance itself and check disk usage again. Repeat above if necessary
HEALTH_WARN application not enabled on 1 pool(s)¶
$ ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'default.rgw.log'
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
$ ceph osd pool application enable default.rgw.log rgw
enabled application 'rgw' on pool 'default.rgw.log'
3 monitors have not enabled msgr2¶
Normal during upgrade from Luminous to Nautilus.
cluster:
id: 11111111-1111-1111-1111-111111111111
health: HEALTH_WARN
3 monitors have not enabled msgr2
HEALTH_WARN mons are allowing insecure global_id reclaim¶
$ ceph health
HEALTH_WARN mons are allowing insecure global_id reclaim
$ ceph config set mon auth_allow_insecure_global_id_reclaim false
$ ceph health
HEALTH_OK
Migrate Ceph network¶
-ceph_public_network: 10.0.5.0/24
+ceph_public_network: 10.0.6.0/24
-public_network: 10.0.5.0/24
-cluster_network: 10.0.5.0/24
+public_network: 10.0.6.0/24
+cluster_network: 10.0.6.0/24
osism-ceph mons
osism-ceph mgrs
osism-ceph osds
To publish this Ceph network change, you have to reconfigure OpenStack services. Also the cephclient
have to have access to the new network, because of the rbd
and partly ceph
commands.
osism-kolla reconfigure
glance
cinder
nova
Monitor Ceph OSD restart¶
watch ceph osd (df) tree