Ceph Monitor
Introduction
Ceph Monitors maintain a “master copy” of the cluster map, which means a Ceph Client can determine the location of all Ceph Monitors, Ceph OSD Daemons, and Ceph Metadata Servers just by connecting to one Ceph Monitor and retrieving a current cluster map. Before Ceph Clients can read from or write to Ceph OSD Daemons or Ceph Metadata Servers, they must connect to a Ceph Monitor first. With a current copy of the cluster map and the CRUSH algorithm, a Ceph Client can compute the location for any object. The ability to compute object locations allows a Ceph Client to talk directly to Ceph OSD Daemons, which is a very important aspect of Ceph’s high scalability and performance. See Scalability and High Availability for additional details.
當要安裝一個 Ceph cluster 的第一步就是必需先建立 Monitor (MON), 一般而言至少需要有3個 MON 以上來確保整個系統的可靠度.
NOTE: MON 最好是分散在不同的Host上面而且必須是奇數個
Ceph Monitor daemon的主要功能如下
- 透過 cluster map 來記錄 Ceph cluster 的所有狀態
- Maintain a master copy of the cluster map
- 確保 monitor之間的資料一致性 (透過Quorum和Paxos演算法)
- 驗證來自用戶端的請求 (但是 data path 並不會經過 monitor)
Cluster map 包含下面5種 map :
- Monitor map
- OSD map
- Placement Group (PG) map
- CRUSH map
- MDS map (若使用 Ceph File System 才會有)
Cluster Maps
Monitor Map
內容包含每個 monitor 的 host name 和 IP, 每一個 monitor daemon 預設的port都是6789
{ "election_epoch": 10,
"quorum": [
0,
1,
2],
"monmap": { "epoch": 1,
"fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
"modified": "2011-12-12 13:28:27.505520",
"created": "2011-12-12 13:28:27.505520",
"mons": [
{ "rank": 0,
"name": "monitor-1",
"addr": "172.17.100.1:6789\/0"},
{ "rank": 1,
"name": "monitor-2",
"addr": "172.17.100.2:6789\/0"},
{ "rank": 2,
"name": "monitor-3",
"addr": "172.17.100.3:6789\/0"},
]
}
}
OSD Map
Dump osd map command: ceph osd dump -f json | python -m json.tool
OSD map 內容包含 pool 和 osd 的狀態, OSD的狀態有 Host IP, UUID, weight ...etc
{
...........
"flags": "",
"fsid": "f0af2fce-313f-4162-9d49-d3a6535a0c24",
"max_osd": 27,
"modified": "2016-01-19 17:13:27.301319",
"osd_xinfo": [
{
"down_stamp": "2016-01-19 17:12:25.032264",
"features": 37154696925806591,
"laggy_interval": 17,
"laggy_probability": 0.3,
"old_weight": 0,
"osd": 0
},
{
"down_stamp": "2016-01-19 17:12:25.032264",
"features": 37154696925806591,
"laggy_interval": 16,
"laggy_probability": 0.3,
"old_weight": 0,
"osd": 1
}],
"osds": [
{
"cluster_addr": "192.168.10.101:6827/1005839",
"down_at": 327,
"heartbeat_back_addr": "192.168.10.101:6830/1005839",
"heartbeat_front_addr": "192.168.10.101:6831/1005839",
"in": 1,
"last_clean_begin": 323,
"last_clean_end": 336,
"lost_at": 0,
"osd": 0,
"primary_affinity": 1.0,
"public_addr": "192.168.10.101:6844/5839",
"state": [
"exists",
"up"
],
"up": 1,
"up_from": 338,
"up_thru": 338,
"uuid": "70152bc9-36c4-425d-b326-1d7b7ee72f86",
"weight": 1.0
},
{
"cluster_addr": "192.168.10.101:6803/1003816",
"down_at": 327,
"heartbeat_back_addr": "192.168.10.101:6807/1003816",
"heartbeat_front_addr": "192.168.10.101:6809/1003816",
"in": 1,
"last_clean_begin": 321,
"last_clean_end": 335,
"lost_at": 0,
"osd": 1,
"primary_affinity": 1.0,
"public_addr": "192.168.10.101:6820/3816",
"state": [
"exists",
"up"
],
"up": 1,
"up_from": 336,
"up_thru": 336,
"uuid": "88296e4c-26b9-4c7a-b3eb-8a7f3d18920b",
"weight": 1.0
}]
}
Placement Group (PG) map
記錄每一個PG的狀態和所對應到的OSD, 例如pgid 0.17d 所對應到的OSD為 OSD.4, OSD.19 (Primary OSD)
{
"pg_stats": [
{
"acting": [
19,
4
],
"acting_primary": 19,
"blocked_by": [],
"created": 1,
"last_active": "2016-01-19 19:40:39.360266",
"last_became_active": "0.000000",
"last_became_peered": "0.000000",
"last_change": "2016-01-19 19:40:39.360266",
"last_clean": "2016-01-19 19:40:39.360266",
"last_clean_scrub_stamp": "2016-01-19 19:40:39.360201",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2016-01-13 19:22:46.541455",
"last_epoch_clean": 333,
"last_fresh": "2016-01-19 19:40:39.360266",
"last_fullsized": "2016-01-19 19:40:39.360266",
"last_peered": "2016-01-19 19:40:39.360266",
"last_scrub": "0'0",
"last_scrub_stamp": "2016-01-19 19:40:39.360201",
"last_undegraded": "2016-01-19 19:40:39.360266",
"last_unstale": "2016-01-19 19:40:39.360266",
"log_size": 0,
"log_start": "0'0",
"mapping_epoch": 327,
"ondisk_log_size": 0,
"ondisk_log_start": "0'0",
"parent": "0.0",
"parent_split_bits": 0,
"pgid": "0.17d",
"reported_epoch": "340",
"reported_seq": "137",
"state": "active+clean",
"stats_invalid": "0",
"up": [
19,
4
],
"up_primary": 19,
"version": "0'0"
}]
}
CRUSH Map
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host controller-1 {
id -2 # do not change unnecessarily
# weight 21.720
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.0
item osd.1 weight 1.0
item osd.2 weight 1.0
}
host controller-2 {
id -3 # do not change unnecessarily
# weight 25.340
alg straw
hash 0 # rjenkins1
item osd.3 weight 1.0
}
root default {
id -1 # do not change unnecessarily
# weight 47.060
alg straw
hash 0 # rjenkins1
item controller-1 weight 3.0
item controller-2 weight 1.0
}
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Create Monitor
流程:
- Admin user 配置好 ceph.conf, 裡面指定 Ceph monitor 的 host name 和 host ip
透過
ceph-authtool
指令去建立 administrator keyring 和 monitor keyringMonitor Keyring: Monitors communicate with each other via a secret key. You must generate a keyring with a monitor secret and provide it when bootstrapping the initial monitor(s).
Administrator Keyring: To use the ceph CLI tools, you must have a client.admin user. So you must generate the admin user and keyring, and you must also add the client.admin user to the monitor keyring.
把 administrator keyring 中的 client.admin user 加入到 monitor keyring 內
因為之後 user 的認證會透過 monitor, 所以要先把 admin user 加進去, 這樣之後擁有admin keyring 的這個 user 向 monitor 認證的時候才會通過, 就可以做 admin 權限的操作 (例如: ceate/delete RBD, add monitor...)
用
monmaptool
指令去建立"第一個" monitor map (目前內容只有一個 monitor 的資料)- 此時 monitor daemon 還沒有起來, 必須用
ceph-mon
指令去建立第一個 monitor daemon, 建立的同時必須把剛剛建好的 monitor keyring 和 monitor map 一起丟到 monitor daemon 上面 - 啟動 monitor daemon
建立完MON的同時, 也會產生其他的 OSD map, PG map, CRUSH map, 裡面除了OSD map是空的以外其他兩個map 都已經有內容了, 這是因為 Ceph 會建立一些預設的 Pool 和 CRUSH rule
Ceph default pool:
RBD pool (RBD 使用)
Data pool (Ceph FS使用)
Metadata (Ceph FS使用)
Ceph default CRUSH rule:
- ruleset 0
此時的 Ceph cluster 狀態是 "HEALTH_WARN", 因為還沒有加入任何的OSD.
(cluster status 圖)
Add Monitor
Delete Monitor
TBD