Difference between revisions of "Recital Replication Configuration"

From Recital Documentation Wiki
Jump to: navigation, search
(Replication Services Configuration)
(Initial data setup)
Line 392: Line 392:
  
 
== Initial data setup  ==
 
== Initial data setup  ==
Before starting the replication services, the best way to initially load the data from the master server onto the peers or slaves is to use the Recital [[recitaldump]] and [[recitalrestore]] commands:
+
Before starting the replication services, the best way to initially load the data from the master server onto the slaves is to use the Recital [[recitaldump]] and [[recitalrestore]] commands:
  
  

Revision as of 14:53, 8 June 2010

Template:YLM remove peer to peer

Recital Engine Configuration

Queue Table

In Recital replication works by storing transactions for publication in the queue table which is stored in the replication database. The replication queue table meta data is defined below.

Queue Tables
Column Name Type Width Description
PUBLISHER Character 20 Publisher name
PUBLICATION Character 60 Publication number
TRANID Integer 8 Transaction ID
DATA Object 8 Transaction XML file
CMDTYPE Integer 4 Command type number
INSERTDATE DateTime 8 Insert date
CMDSTATE Character 1 Current transaction state
TRANKEY Character 73 Transaction key
TRANTYPE Character 1 Transaction type


Publisher column

This is the node name of the system that the transaction was processed on.


Publication column

This is the publication ID of the system that the transaction was processed on. This ID is specified during installation and is unique between all the systems that the replication services are installed on. It can be found in the recital.conf file and is defined by the DB_PUBLICATIONID variable. This ID should not be changed once the replication service has been started on the system as it is used to determine which system the transaction was processed on.


TranID column

The transaction ID is unique sequence number of each transaction in the queue table.


Data column

Each transaction is contained in a XML file which is stored in this column. The XML format of the transaction is explained in the XML format section.


CMDType column

This is the numeric value of the command for this transaction. A full list of supported commands and their numeric values can be found in the commands section.


InsertDate column

This contains the date and time that the transaction was performed.


CMDState column

The current state of the transaction. When a transaction is first inserted into the queue table the state is set to W. This indicates that this transaction is in a wait state. When new transactions are performed on the same row from the same table this flag is checked to see if the transaction can be optimized. When this transaction has been processed by a Slave service the state is set to P. Once a transaction's state has been set to Processed no more optimization can be performed on this row. See the section on optimization for more information.

TranKey column

A unique key to identify transactions performed on a row in a table. This key is made up of the table name and any open single index files plus the sequence of the row that this transaction was performed on.


TranType column

This is used to specify the type of transaction in the queue table. This value is always set to Q for the replication service.

Queue table indexes

The queue table contains the following indexes

Queue Tables Indexes
Expression Tag Name
CMDSTATE+TRANKEY+SYNCNUM CMDSTATE
SYNCNUM SYNCNUM

XML format

Each transaction is stored in a XML file in the data column which is a Large binary object type. The XML file is defined by the following elements and attributes.

_rep_tn attribute

The RECITAL_DATAEXCHANGE element has an attribute called _rep_tn which will contain the table name for the transaction. The target name is defined when the table being replicated is opened. The format of the target attribute is dependent of how the table is opened. Tables with the same name on the same system can be replicated because the table name is prefixed to either the database name or directory path. The following table explains more about the target attribute format

Target Attribute
Format Description
databasename ! tablename This is the format if the table is in a database. Tables being replicated this way must be located in the same database on each system.
directorypath tablename This is the format if the table is in a directory. Tables being replicated this way must be located in the same directory in each system. Directory links can be used to match the directory paths.
tablename,indexname,... If single index files are open on the table, then their name(s) are sorted in ascending order before being affixed in a comma separated list to the end of the table name.

RECORD content element

This element contains the record contents.


_rep_tt attribute

Each record element must contain a transaction type attribute. Transactions attributes can be one of the following types;

Transaction Types
Transaction Type
INSERT
PACK
DELETE
RECALL
ZAP
UPDATE
ALTER


_rep_wc attribute

If the _rep_tn attribute is an UPDATE then it must also contain a _rep_wc attribute. The format of this attribute is SYNCNUM=syncnum. The where condition is used when the slave replication service is performing the update on the target table.

_rep_id text element

Each record element must contain a text element called _rep_id. The text contained in this element is the unique publication ID from the system that the transaction originated from.

This ID is specified during installation and is unique between all the systems that the replication services are installed on. It can be found in the recital.conf and is defined by the DB_PUBLICATIONID variable. This ID should not be changed once the replication service has been started on the system as it is used to determine which system the transaction was processed on.

_rep_sn text element

Each record must contain a text element called _rep_sn. This value can be used to identify the row on which to perform the transaction.


Field name text elements

If the transaction type in the record element is an INSERT or UPDATE then an element for each column that is being changed will be included. Only columns that have been changed will be added, see the optimization section for more information. Each field name text element will contain a pair of data separated by a vertical bar |. For example a change to a column named balance from 350 to 125 would be formated like this;

<BALANCE>125|350</BALANCE>

All text stored in field name elements is converted to hex so that if will not conflict with any XML directives.


The following is an example XML file for the SQL command;

INSERT INTO accounts (last_name, first_name) VALUES (''Jefferson'', ''Frank'');
<_rep_tn>=''finance!accounts''</_rep_tn>
<_rep_tt>=''INSERT''</_rep_tt>
    <_rep_id>01</_rep_id>
    <_rep_sn>01000000000000A2</_rep_sn>
    <LAST_NAME>|02A9E02</LAST_NAME>
    <FIRST_NAME>|AB075F1</FIRST_NAME>

SYNCNUM

When replication is enable in the Recital Engine a unique sequence number for each row in every table being replicated must be added. The first time a table is flagged for replication all rows must be updated with a sequence number, if the table contains a large number of row this process may take some time. You can also issue the recitalconvert replicate command at the OS prompt to convert all the tables in the current directory. Once it's been converted, the table is flagged so this process will not be done again. All new rows inserted after this point will automatically contain their own unique sequence number. For the life of a table these numbers will never be duplicated.

Optimization

There are several specific optimization procedures enabled for the replication process.


Record buffer

When replication is enabled on a table, a before image record of the row is kept. After each update on the row, this image is compared with the new data and if they are both the same the transaction is not added to the queue. This prevents multiple updates on the same row with the same data being added to the replication queue.


XML attributes

When replication is enabled on a table, a before image record of the row is kept. If the transaction type in the record element is an INSERT or UPDATE than an element for the columns are included in the XML file. Only columns that are different between the before image record and the new data are added to the transaction.


The XML file that contains the replication transaction will only contain data being changed, so generally it will be smaller. This will help with storage size of the queue tables and network traffic from the slave subscription service.

Multiple transactions

The queue table contains a column used to store the current transaction state. When the first transaction is added for a row on a table, this state is set to wait. It will remain this way until a slave service processes the transaction and sets the state of the transaction to Processed. When new UPDATE transactions are added to the queue, existing transactions for that row which are in a wait state are searched for. If a transaction is found for that row in a wait state and the transaction is an INSERT or UPDATE then the new transaction is merged in. If the column is not found in the current XML file it will be added. If the column is already in the XML then the data is updated with the new data.

If the transaction that is found is an INSERT then it will remain as an INSERT transaction and the data from the UPDATE transaction will be merge into it. In this case the transactions queue will have no record of an UPDATE being done.

Statements

Transactions that are performed by the Recital Database Engine are stored in the queue table for publication. The following table is a list of statements and commands that currently supported for replication.

Supported Statements
Statement Type Number
APPEND 5042
DELETE 5022
EDIT 5082
INSERT 5042
MERGE 5042
PACK 33
ZAP 82
REPLACE 5082
RECALL 36
READ 5082
BROWSE 5082
SQL INSERT 4042
SQL UPDATE 5082
SQL DELETE 5022

Enabling Replication

IMPORTANT You must add the directory path or database name that contains the tables you want to replicate into the replication.allow file. Only tables that match in this file and are not matched in the replication.deny file will be replicated.

In order for transactions to be published in the queue table, replication must be set on first. This can be done by adding the SET REPLICATION ON command into a local or systemwide configuration file. The config.db stored in the conf directory, which is in the root recital installation directory, is used for systemwide settings, or you can update the config.db in the local directory. You may also add the command to any 4GL program file.


When replication is turned on all tables already open and all tables opened afterwards are flagged for replication. You may turn replication off to disable replication, but if you want to filter out tables from replication it's better to use the allow and deny access control lists. Index files associated with the tables are also updated. Multiple index tag files are handled automatically. Single index files are sorted by name and then added onto the table name when stored with the transaction.


To turn replication on you must have installed the Replication service on the system already. If this hasn't been done then a You must configure the replication service first with 'recitalreplication rrs' error will be returned.


The first time a table is flagged for replication all rows must be updated with a sequence number, if the table contains a large number of row this process may take some time. Once it's been the table is flagged so this process will not be done again. All new rows inserted after this point will automatically contain their own unique sequence number. For the life of a table these numbers will never be duplicated.

Access Control Lists

You must add the directory path or database name that contains the tables you want to replicate into the replication.allow file. Only tables that match in this file and are not matched in the replication.deny file will be replicated.

If you wish to limit which tables are replicated you may do this with access control lists. Access control lists are defined in two files stored in the conf directory which is in the root recital installation directory. Comment lines can be added to these file by starting the line with # symbol. The access control specification is "directory path" or [database name!]tablename, where an optional database name may be specified. Wild cards may also be specified with ''*'' for all text matches and ''%'' for single character matches. Name expansion from environment variables can also be used.

Example controls
Example Description
southwind!* This would include all tables in the southwind database.
/tmp/* This would include all tables in the directory "/tmp/"
${DB_TMPDIR}/* This would include all tables in the directory expanded from the environment variable DB_TMPDIR


replication.allow

This file describes the names of the tables which will be replicated if REPLICATION is SET ON. If this list is not empty then only tables matching will be replicated.

If this file is empty then no tables will be replicated even if SET REPLICATION is ON.

replication.deny

This file describes the names of the tables which will NOT be replicated if REPLICATION is SET ON. If this list is not empty then any tables matching will be not be replicated.

Replication Services Configuration

Subscriber

The subscriber service is used by any system that needs to update its databases from published data. Each slave needs to start a subscriber service to connect to the published data on the Master server.

Recital Database Server

In order for either of these services to work the Recital Database Server must be installed and running on the system publishing the data.

ODBC Data Source

In order for the replication services to connect to the Recital Database Server you must define a ODBC data source in the odbc.ini file. The Data Source name format is Recital Replication Service on plus the node name. The Driver name is Recital and the Database format is ODBC:RECITAL: The following options can also be added;

ServerName

This is the publisher's server name, which has the Recital Database Server running.

UserName

This is the username to used to login on the publisher server. This is optional here as it can be specified in the argument command line when starting the service.

Password

This is the password for the username used to login on the publisher server. This is optional here as it can be specified in the argument command line when starting the service.


Database

This is the default data database that will opened when the service has connected to the Recital Database Server. This should be set to replication

Exclusive

This specifies how the replication database is opened and should be set to false so it's opened shared.


odbc.ini example

[Recital Replication Service on jazz]
Driver = Recital
DATABASE=ODBC:RECITAL:SERVERNAME=jazz.recital.local;USERNAME=replication;PASSWORD=recital;DATABASE=replication;EXCLUSIVE=FALSE

Initial data setup

Before starting the replication services, the best way to initially load the data from the master server onto the slaves is to use the Recital recitaldump and recitalrestore commands:


# recitaldump -d /directory/path

recitalreplication

The replication services are administrated with the recitalreplication command. See recitalreplication for full details.

Master Slave Configuration

A Master Slave configuration in Recital works with the published data stored in the queue table of the replication database. The Recital engine on the master system (the source of the database changes) writes all updates, inserts, deletes, recalls, packs and zaps into the queue table of the replication database. These transactions are stored in a XML format in the queue table.


Slaves are configured to subscribe to the master and to execute the published events on the slave's local databases. The Master is dumb in this scenario. Once replication has been enabled, all statements are published in the replication queue. If required, you can configure the Master to only publish events that apply to particular databases or tables


Each subscribed slave will get a copy of the changes published since the last time it connected. Slaves keep a record of the position within the published queue that they have read and processed. This means that multiple slaves can be connected to the master and executing different parts of the same published data. Because the slaves control this process, individual slaves can be connected and disconnected from the server without affecting the master's operation. Also, because each slave remembers its position within the published queue, it is possible for slaves to be disconnected, reconnect and then 'catch up' by continuing from the recorded position.


Both the master and each slave must be configured with a unique id. In addition, the slave must be configured with information about the master host name, RTQ database name and position within that file.


Starting MSR

You should start the Master Slave Replication processes in the following order:

Starting Master

On the system specified as the Master you must start the Recital Database Server first with the command;

# recitalserver start

Starting Slaves

On the systems specified as Slaves you must start the Subscriber Service with the command:

# recitalreplication start

Checking status

Once you have started the subscriber services on the Slaves you can check their current status with the command:

# recitalreplication status

An example out put for the Slave Subscriber Service would be:

Subscriber service 'port058' is running on a 60 second delay.
Connected to publisher 'jazz' in a Master Slave configuration.
Current transaction queue size is 0
Last transaction processed was 0 at
Remaining transactions in queue None

Processing Transactions

The subscriber service will sleep the specified number of seconds between each set of transaction. Then it will connect to the master publisher and retrieve all published data since it last connected. For each transaction that it processes it will attempt to open the table in the required mode. For ZAP and PACK transactions it must open the table exclusively, for all others it will open it shared. If it can't find the table specified via the path or can't open it in the required mode an error will be returned.


When the subscriber has to perform changes to existing rows in tables it will use the sync number specified in the transaction to locate the row in the specified table. As all sync number as ascending in order it uses a algorithm to locate the row by truncating the search into sections.


Maintenance

Clearing the data in the queue that has been replicated.