Tags: #Gentoo Linux #Pacemaker #Corosync #Heartbeat #cluster
Gentoo Linux: Active/Passive Cluster
Linux cluster configuration and management in Gentoo Linux…
Heartbeat is a deprecated cluster messaging layer that was historically used with Pacemaker. Although it is available in portage, today Corosync is the preferred messaging layer and Heartbeat is not recommended for new deployments.
Prerequisites
Pacemaker is based on Python version 2. Therefore, /etc/portage/make.conf
needs to modified accordingly:
USE_PYTHON="2.7 3.2"
For Apache, it is necessary to set python3
, For corosync
, python2
. To list available options, use:
$ eselect python list
Available Python interpreters:
[1] python2.7
[2] python3.2 *
and to select, use:
$ eselect python set 1
Setup requested USE flags in /etc/portage/make.conf
(do not use the heartbeat flag):
USE="-X -gtk -gnome -qt4 -kde -dvd -alsa -cdr -heartbeat
bindist snmp pkcs11 gnutls snmp smtp"
And modify /etc/portage/package.keywords
accordingly:
sys-cluster/corosync ~ARCH
sys-cluster/pacemaker ~ARCH
sys-cluster/libqb ~ARCH
sys-cluster/crmsh ~ARCH
Software Installation
Now, it’s time to install pacemaker
and corosync
:
$ emerge -vp sys-cluster/pacemaker sys-cluster/corosync
And as requested, adding user root
to the haclient
group:
$ usermod -a -G haclient root
Software Configuration
Generate the keys:
$ corosync-keygen
Copy the initial configuration of Corosync:
$ cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
Edit the /etc/corosync/corosync.conf
file and modify as requested, esp. the bindnetaddr
which is the network IP address and mcastaddr
to prevent other multicast services disruption:
totem {
version: 2
token: 5000
token_retransmits_before_loss_const: 20
join: 1000
consensus: 7500
vsftype: none
max_messages: 20
secauth: on
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
Replicate the main device configuration to all cluster members (MEMBER-IP
):
$ scp /etc/corosync/authkey MEMBER-IP:/etc/corosync/authkey
$ scp /etc/corosync/service.d/pcmk MEMBER-IP:/etc/corosync/service.d/pcmk
$ scp /etc/corosync/corosync.conf MEMBER-IP:/etc/corosync/corosync.conf
If not using DNS, it is necessary to update the /etc/hosts
file with the respective domain names:
192.168.0.1 node1.cluster
192.168.0.2 node2.cluster
Software Start
Now, it is time to start corosync
and pacemaker
. Start the main appliance, and after a few seconds (~30sec) start the rest:
$ /etc/init.d/corosync start
$ /etc/init.d/pacemaker start
Verify that all of the nodes are running correctly and they joined the cluster successfully:
$ cmr_mon
It is important that Current DC
is elected and all nodes are Online
.
Last updated: Wed Apr 3 10:03:57 2013
Last change: Wed Apr 3 09:17:54 2013 by hacluster via crmd on Node1
Stack: classic openais (with plugin)
Current DC: Node1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
0 Resources configured.
Online: [ Node1 Node2 ]
All rings can be briefly checked by the following command as well:
$ corosync-cfgtool -s
Result:
Printing ring status.
Local node ID -921392960
RING ID 0
id = 192.168.0.1
status = ring 0 active with no faults
As the members are not visible in the previous result, querying the member list can be done as follows:
$ corosync-objctl | grep member
If runtime.totem.pg.mrp.srp.members.CLUSTERNODEID.status=joined
for all clustered nodes, all of them have successfully joined the cluster.
runtime.totem.pg.mrp.srp.members.-921392960.ip=r(0) ip(192.168.0.1)
runtime.totem.pg.mrp.srp.members.-921392960.join_count=1
runtime.totem.pg.mrp.srp.members.-921392960.status=joined
runtime.totem.pg.mrp.srp.members.-904615744.ip=r(0) ip(192.168.0.2)
runtime.totem.pg.mrp.srp.members.-904615744.join_count=1
runtime.totem.pg.mrp.srp.members.-904615744.status=joined
High-Availability Communication Setup
The following commands can be saved in a script or executed directly using the command-line interface. Either way, since all nodes are successfully integrated in the cluster, only the main appliance will be modified and the configuration will be automatically replicated to other cluster members.
#!/bin/bash
# Setting up Active/Passive Cluster
HAIP=192.168.0.9
HAMASK=255.255.255.0
HAIF=eth0
HAIFINT=20s
# Delete everything
crm configure erase
# Disable Stonith
crm configure property stonith-enabled="false"
# With 2 nodes we cannot attain a quorum
crm configure property no-quorum-policy="ignore"
# Configure Virtual IP resource for nodes in one cluster:
crm configure primitive P_VIP ocf:heartbeat:IPaddr2
params ip="$HAIP" cidr_netmask="$HAMASK" nic="$HAIF"
op monitor interval="$HAIFINT"
# Configure Apache server:
APACHECONF=/etc/apache2/httpd.conf
crm configure primitive P_APACHE ocf:heartbeat:apache
params configfile="$APACHECONF"
op start interval="0s" timeout="60s"
op monitor interval="5s" timeout="20s"
op stop interval="0s" timeout="60s"
# Rule for OpenVPN server
#crm configure primitive P_OPENVPN ocf:heartbeat:anything
# params binfile="/usr/sbin/openvpn"
# cmdline_options="--writepid /var/run/openvpn.pid --config /etc/openvpn/openvpn.conf --cd /etc/openvpn --daemon"
# pidfile="/var/run/openvpn.pid"
# op start timeout="20" op stop timeout="30" op monitor interval="20"
# All services running on the main server
crm configure colocation C_ALL_IN_ONE_PLACE inf: P_VIP P_APACHE
# The order of application startup (Apache once the network is up)
crm configure order O_ORDER inf: P_VIP P_APACHE
# Only in live command-line interface
#crm configure commit
Troubleshooting
To stop a resource, use:
$ crm resource stop P_IP
To see running setup, use:
$ crm_mon -r1
Problem with incorrectly compiled crmsh
(against python3
)
abort: No module named crmsh
(check your install and PYTHONPATH)
… can be easily resolved by changing the currently used version of python
and recompiling sys-cluster/crmsh
(see the Prerequisites section).