Recital

Login Register

HOWTO: Setup a Linux HA cluster for Recital

In this article Barry Mavin explains step by step how to setup a Linux HA (High Availability) cluster for the running of Recital applications on Redhat/Centos 5.3 although the general configuration should work for other linux versions with a few minor changes.

Our cluster will consist of:

  • Two loadbalancers (loadbalancer1, loadbalancer2)
    The Linux Virtual Server as an advanced load balancing solution can be used to build highly scalable and highly available network services. This will be used to spread work load between our application servers that will be running Recital applications. The loadbalancer will schedule connections (ssh terminal and web) to the least utilized application server.
  • Two application servers (appserver1, appserver2)
    As the demand for more interactive users grows, we can incrementally add low-cost application servers.
  • Two data servers (dataserver1, dataserver2)
    The dataservers are configured in a master/slave architecture. Data is automatically replicated in real-time between the master and slave. They are configured as a HA (high Availability) cluster using heartbeat and DRBD.

To setup and configure our Linux HA cluster for Recital we will be using the following:

  • Piranha
    This will be used to spread work load between our application servers that will be running Recital applications. All requests for ssh or web connections from user workstations will be redirected to the least loaded application server.
  • Heartbeat
    Heartbeat is a cluster manager that is used with most DRBD installations. In our cluster it will monitor the back-end dataservers and designate one as primary (master) and one as secondary (slave). When switching between master and slave it will stop the services under it's control and start them up on the (new) master. This provides our cluster with a high degree of redundancy, faukt tolerance and high availability.
  • DRBD
    DRBD refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an dedicated network link to a backup server. DRBD can be understood as network based raid-1.
  • glusterFS
    GlusterFS is a cluster file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. The application servers (appservers) access shared data on the dataservers by mounting a gluster network file system. The network IP that the gluster server runs on is specified as a Virtual IP address under the control of heartbeat.
  • Samba
    Samba is an Open Source/Free Software suite that provides seamless file and print services to SMB/CIFS clients." Samba is freely available, unlike other SMB/CIFS implementations, and allows for interoperability between Linux/Unix servers and Windows-based clients.
  • [[Recital Cluster Edition]]
    The Recital Cluster Edition is a cluster-aware version of Recital that keeps track of open files and active locks internally. When a database I/O operation fails, it will automatically reconnect to the (new master) dataserver, re-open actively open files, apply active locks and continue the execution of the application in its current state and context. This operation is transparent to the end user and provides a high degree of fault tolerance for Recital applications.

Step 1 - Install the required packages

# cd /root
# yum install gcc kernel-devel bison flex
# wget ...drbd-url
# wget ...heartbeat-url
# wget ...gluster-url
#
# cd drbd*
# ./configure
# make
# make install
# cd ..
# cd heart*
# ./configure
# make
# make install
# cd..
# cd gluster*
# ./configure
# make
# make install

Step 2 - Configure DRBD

DRBD refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.

DRBD works on top of block devices, i.e., hard disk partitions or LVM's logical volumes. It mirrors each data block that it is written to disk to the peer node.

Synchronous operation

Mirroring can be done tightly coupled (synchronous). That means that the file system on the active node is notified that the writing of the block was finished only when the block made it to both disks of the cluster.

Synchronous mirroring (called protocol C in DRBD speak) is the right choice for HA clusters where you dare not lose a single transaction in case of the complete crash of the active (primary in DRBD speak) node.

Asynchronous operation

The other option is asynchronous mirroring. That means that the entity that issued the write requests is informed about completion as soon as the data is written to the local disk.

Asynchronous mirroring is necessary to build mirrors over long distances, i.e., the interconnecting network's round trip time is higher than the write latency you can tolerate for your application. (Note: The amount of data the peer node may fall behind is limited by bandwidth-delay product and the TCP send buffer.)

The configuration file for DRBD is in /etc/drbd.conf.

#
# drbd.conf
#
global {
usage-count no;
}

resource drbd0
{
protocol C;

handlers
{
pri-on-incon-degr "echo 'DRBD: primary requested but inconsistent!' | \
wall; service heartbeat stop"; #"halt -f";
pri-lost-after-sb "echo 'DRBD: primary requested but lost!' | \
wall; service heartbeat stop"; #"halt -f";
}

startup
{
degr-wfc-timeout 120; # 2 minutes.
}

disk {
on-io-error detach;
}

syncer
{
rate 100M;
al-extents 257;
}

net
{
timeout 120;
connect-int 20;
ping-int 20;
max-buffers 2048;
max-epoch-size 2048;
ko-count 30;
# Split brain has just been detected, but at this time
# the resource is not in the Primary role on any host.
after-sb-0pri discard-zero-changes;
# Split brain has just been detected, and at this time
# the resource is in the Primary role on one host.
after-sb-1pri discard-secondary;
# Split brain has just been detected, and at this time
# the resource is in the Primary role on both hosts.
after-sb-2pri disconnect;
}

on dataserver1.localdomain
{
device /dev/drbd0;
disk /dev/mapper/VolGroup00-LogVol02;
address 192.168.4.1:7789;
meta-disk internal;
}

on dataserver2.localdomain
{
device /dev/drbd0;
disk /dev/mapper/VolGroup00-LogVol02;
address 192.168.4.2:7789;
meta-disk internal;
}
}

Step 2 - Configure glusterFS

The glusterFS server volume file is located in /etc/glusterfs/glusterfs-server.vol

### file: /etc/glusterfs/glusterfs-server.vol

#####################################
### GlusterFS Server Volume File ##
#####################################

### Export volume "brick"
volume posix
type storage/posix # POSIX FS translator
option directory /replicated # Export the replicated directory from drbd
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume iothreads
type performance/io-threads
option thread-count 16 # default is 1
subvolumes locks
end-volume

volume brick
type performance/write-behind
option aggregate-size 1MB # default is 0bytes
option window-size 3MB # default is 0bytes
option flush-behind on # default is 'off'
subvolumes iothreads
end-volume

### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp
# option transport-type ib-sdp
# option transport-type ib-verbs
# option transport-type unix
# option ib-verbs-work-request-send-size 131072
# option ib-verbs-work-request-send-count 64
# option ib-verbs-work-request-recv-size 131072
# option ib-verbs-work-request-recv-count 64
# option bind-address 192.168.1.125 # Default is to listen on all interfaces
# option listen-port 6996 # Default is 6996
# option client-volume-filename /etc/glusterfs/glusterfs-client.vol
subvolumes brick
option auth.addr.brick.allow * # Allow access to "brick" volume
end-volume

The corresponding glusterFS client file is located in /etc/glusterfs/glusterfs-client.vol on the appservers.

### file: /etc/glusterfs/glusterfs-client.vol

volume client
type protocol/client
option transport-type tcp/client
option remote-host 192.168.2.40 # Virtual IP adress specified
# in /etc/ha.d/haresources file on the dataservers
option remote-subvolume brick
end-volume

Step 3 - Configure Heartbeat cluster manager

Heartbeat is designed to monitor your servers, and if the master server fails, it will start up specified services on the slave server, turning it into the master. To configure it, we need to specify which servers it should monitor and which services it should manage.

1. Setup the Heartbeat configuration file

The heartbeat configuration file is located in /etc/ha.d/ha.cf. On my systems I have two NICs. Once is used for drbd replication (eth1) and the other is used for networked file system access (eth0).

## /etc/ha.d/ha.cf on dataserver1
## This configuration is to be the same on both machines
crm no

# heartbeat logging information
logfile /var/log/ha-log.txt
logfacility local0

# heartbeat communication timing
keepalive 1
deadtime 5
warntime 3
initdead 20

# heartbeat communication path
udpport 694
ucast eth0 192.168.2.41
ucast eth0 192.168.2.42
ucast eth1 192.168.4.1
ucast eth1 192.168.4.2

# fail back automatically
auto_failback on

# heartbeat cluster members
node dataserver1.localdomain
node dataserver2.localdomain

# monitoring of network connection to default gateway (optional)
#ping 192.168.2.50
#respawn hacluster /usr/lib64/heartbeat/ipfail

2. Configure the Heartbeat services

Go to the directory /etc/ha.d/resource.d. This directory holds all the startup scripts for the services Heartbeat will manage.

add a symlink from /etc/init.d/recital to /etc/ha.d/resource.d. (Recital Server)

add a symlink from /etc/init.d/glusterfsd to /etc/ha.d/resource.d. (GlusterFS)

add a symlink from /etc/init.d/smb to /etc/ha.d/resource.d. (Samba)

# ln -s /etc/init.d/recital /etc/ha.d/resource.d/recital
# ln -s /etc/init.d/glusterfsd /etc/ha.d/resource.d/glusterfsd
# ln -s /etc/init.d/smb /etc/ha.d/resource.d/smb

If you installed gluster from source as detailed previously in this article, then the service script in /etc/init.d/glusterfsd needs edited to change the GSERVER line  to reference /usr/local/sbin:

GSERVER="/usr/local/sbin/$BASE -f /etc/glusterfs/glusterfs-server.vol"

3. Setup Heartbeat to manage our cluster services

Heartbeat can automatically start up services on the master computer, and failover to the slave to the master if it fails. Listing 2 shows the file that does that, and in it, you can see we have only one line, which has different resources to be started on the given server, separated by spaces.

 

Elements of the haresources file
dataserver1.localdomain

defines which server is the master server

drdbdisk::drbd0
tells Heartbeat to start drbd to perform data replication for resource drb0 (this is specified in /etc/drbd.conf)
IPaddr::192.168.2.40/24
tells Heartbeat to configure this as an additional IP address
(the Virtual IP Address that the gluster client will mount) on the master server
Filesystem::/dev/drbd0::/replicated::ext3
will cause the directory /replicated to be mounted on /dev/drb0 which is being replicated to the standby system by drbd
glusterfsd
tells heartbeat to start the glusterFS server
smb
tells heartbeat to start the samba server
recital
tells heartbeat to start the Recital Server

Note: these names should be the same as the filename for their associated script in /etc/ha.d/resource.d. If you need to control other services on the system then just create symbol links in this directory and then add the names of the services to the haresources file.

Step 4 - Configure to startup the HA cluster on system boot

# chkconfig drbd on
# chkconfig heartbeat on
# chkconfig glusterfsd off
# chkconfig smb off
# chkconfig recital off

Note that glusterfsd, smb, and recital are disabled from starting automatically on system boot as these services are managed by heartbeat.

Cluster Performance tuning and optimization

  1. Turn off avahi-daemon to optimize NIC throughput
    The Avahi daemon is present as default and will automatically discover network resources and get connected to it . This daemon provides the following services:
    • Assign an IP address automatically even without the presence of a DHCP server.
    • Act as DNS (each machine is accessible by the name nameMachine.local).
    • Publish services and facilitates access to these services (the local network machines are warned of the pending and closing up a service, facilitating the sharing of files, printers, etc.
    • Avahi is an implementation of Zeroconf protocol compatible with Apple services.
    • It decreases network performance.
    You can disable this:
    # service avahi-daemon stop
    # chkconfig avahi-daemon off
  2. Mount options
    TODO
  3. Filesytem journaling
    TODO
  4. DRBD journaling
    TODO
  5. DRBD Existing file systems can be integrated into new DRBD setups without the need of copying
    TODO

Monitoring the operation of the cluster

You can monitor the state of the cluster on both the primary and secondary servers. login to a terminal window and issue the following command:

# watch service drbd status

Now in  another terminal window issue the following command on the primary server:

# service heartbeat stop

Looking at the secondary (backup) server you will notice that heartbeat  has performed a failover to the secondary (backup) server.

Switch back to the master server and issue a service hearbeat start command causes the master to take control again, and the secondary to resort to operating as a backup once again.

On performing a failover or failback operation, heartbeat will assign the Virtual IP address (VIP) that is specified in the /etc/ha.d/haresources file to the primary (master) server. This Virtual IP address is what the gluster (or samba) networked file system servers will bind to, so the client computers (appservers) should mount to the VIP, not the IP address of the actual data servers. This allows the appservers to perform file I/O operations independent of what dataserver is in master server state.

Installing and configuring the loadbalancer




Twitter

Copyright © 2024 Recital Software Inc.

Login

Register

User Registration
or Cancel