It's Easy with AJEET: CLUSTER DE-CONFIGURING AND RE-CONFIGURING

CLUSTER DE-CONFIGURING

AND

RE-CONFIGURING

It's Easy with AJEET,

Below section I have provided complete steps to de-config and re-config the cluster on 3 node RAC.

It has been done when the host-name and IP address of all the 3 nodes have been changed.

When the host-name and IP address is changed the existing cluster will stop working so its very important to make your cluster to start so that your database works fine and good as earlier it use to be.

So this can be done in two ways.

1. Delete and add node (which I am going to post in another section)

2. De-configuring and Re-configuring cluster.

I perter doing it the 2nd way, you may ask why?

the reason is very simple, think of a RAC environment where you have more node (may be 5 to 10),

how much time it will take to delete the node and add the node again.

Grid Infrastructure Cluster - Entire Cluster

Deconfigure and reconfigure entire cluster will rebuild OCR and Voting Disk, user resources (database, instance, service, listener etc) will need to be added back to the cluster manually after reconfigure finishes.

Why is deconfigure needed?

Deconfigure is needed when:

OCR is corrupted without any good backup

Or GI stack will not come up on any nodes due to missing Oracle Clusterware related files in /etc or /var/opt/oracle, i.e. init.ohasd missing etc. If GI is able to come up on at least one node, refer to next Section "B. Grid Infrastructure Cluster - One or Partial Nodes".

$GRID_HOME should be intact as deconfigure will NOT fix $GRID_HOME corruption

In below case we consider IP addresses and hostname of all the node have been changed.

PRECHECK BEFORE PERFORMING DE-CONFIGURE OF CLUSTER

1. Before de-configuring a node, ensure it's not pinned, i.e

$GI_HOME/bin/olsnodes -s -t

Node1 Active Unpinned
node2 Active Unpinned
node3 Active Unpinned

2. If a node is pinned, unpin it first, i.e. as root user:

/oracle/grid/product/11.2.0/grid/bin/crsctl unpin css -n <node_name>

3. Before de-configuring, collect the following as grid user if possible to generate a list of user resources to be added back to the cluster after reconfigure finishes:

$GRID_HOME/bin/crsctl stat res -t
$GRID_HOME/bin/crsctl stat res -p
$GRID_HOME/bin/crsctl query css votedisk
$GRID_HOME/bin/ocrcheck
$GRID_HOME/bin/oifcfg getif
$GRID_HOME/bin/srvctl config nodeapps -a
$GRID_HOME/bin/srvctl config scan
$GRID_HOME/bin/srvctl config asm -a
$GRID_HOME/bin/srvctl config listener -l <listener-name> -a
$DB_HOME/bin/srvctl config database -d <dbname> -a
$DB_HOME/bin/srvctl config service -d <dbname> -s <service-name> -v

TO DECONFIG THE CLUSTERWARE

If OCR and Voting Disks are NOT on ASM, or If OCR and Voting Disks are on ASM but there's NO user data in OCR/Voting Disk ASM diskgroup:

On all remote nodes, as root execute:

# /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -verbose

Once the above command finishes on all remote nodes, on local node, as root execute:

# /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode

If there is user data in OCR/Voting Disk ASM diskgroup

# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -keepdg –lastnode

We do not had any user data on OCR VOTING DISK so we followed

# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode

Once de-configuration completed follow the below steps before re- configuration of cluster

Clean up the profile.xml files

The profile.xml files will contains the old ip address so to update the new ip address to profile.xml we have to clean the old profile.xml file.

$GRID_HOME/gpnp/node1/profiles/peer#

In case if we are not cleaning the profile.xml file we may have to face the issue has below.

The below will occur while executing root.sh script.

ERROR:

CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
Start of resource "ora.cluster_interconnect.haip" failed
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/oracle/grid/product/11.2.0/grid/log/node1/agent/ohasd/orarootagent_root/orarootagent_root.log".
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'node1' failed
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-4000: Command Start failed, or completed with errors.
HAIP startup failure considered fatal, terminating at /oracle/grid/product/11.2.0/grid/crs/install/crsconfig_lib.pm line 1330.
/oracle/grid/product/11.2.0/grid/perl/bin/perl -I/oracle/grid/product/11.2.0/grid/perl/lib -I/oracle/grid/product/11.2.0/grid/crs/install /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl execution failed

****************************************************************************

TO CLEAN PROFILE.XML AND CHECKPOINT FILE USE THE FOLLOWING COMMAND

a. Step1 and 2 can be skipped on node(s) where root.sh haven't been executed this time.

    b. Step1 and 2 should remove checkpoint file. To verify:

          ls -l $ /oracle/grid/oracle_base/Clusterware/ckptGridHA_.xml

    If it's still there, please remove it manually with "rm" command on all nodes

    c. If GPNP profile is different between nodes/setup, clean it up on all nodes as grid user

          $ find /oracle/grid/product/11.2.0/grid/gpnp/* -type f -exec rm -rf {} \;

Clean up the OCR_VOTE disk or the disk which contains voting files.

If we are not cleaning OCR_VOTE1 disk we may have to face the error as below.

The error will occur at the time of executing root.sh.

ERROR:

bash: /root/.bashrc: Permission denied

Disk Group OCR_VOTE mounted successfully.

Existing OCR configuration found, aborting the configuration. Rerun configuration setup after deinstall at /oracle/grid/product/11.2.0/grid/crs/install/crsconfig_lib.pm line 10302.
/oracle/grid/product/11.2.0/grid/perl/bin/perl -I/oracle/grid/product/11.2.0/grid/perl/lib -I/oracle/grid/product/11.2.0/grid/crs/install /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl execution failed

dd if=/dev/zero of=/dev/xvdd1 bs=1048576 count=10

dd if=/dev/zero of=/dev/xvde1 bs=1048576 count=10

dd if=/dev/zero of=/dev/xvdf1 bs=1048576 count=10

To clear the asm OCR_VOTE1 disk follow below steps

NOTE:

Befor performing below steps make sure you are using proper disk name.

Clear the details of disk from asm binary

dd if=/dev/zero of=/dev/xvdd1 bs=1048576 count=10

dd if=/dev/zero of=/dev/xvde1 bs=1048576 count=10

dd if=/dev/zero of=/dev/xvdf1 bs=1048576 count=10

Delete the OCR_VOTE DISKS

oracleasm deletedisk OCR_VOTE3

oracleasm deletedisk OCR_VOTE2

oracleasm deletedisk OCR_VOTE1

Create the OCR_VOTE DISKS

/etc/init.d/oracleasm createdisk OCR_VOTE1 /dev/xvdd1

/etc/init.d/oracleasm createdisk OCR_VOTE2 /dev/xvde1

etc/init.d/oracleasm createdisk OCR_VOTE3 /dev/xvdf1

TO CONFIGURE CLUSTER

export DISPLAY=HOSTNAME

$$GRID_HOME /crs/config/config.sh

Follow the instruction on GUI

Run the root.sh first on node1 after successful execution on NEW_NODE1 execute it on NEW_NODE2 and NEW_NODE3.

ISSUES FACE AT GUI.

*****************

If You face errors at GUI as below.

Just stop at this step and execute the below commands and click on RETRY

$GRID_HOME/oui/bin/runInstaller -nowait -noconsole -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true "CLUSTER_NODES={node1,node2,node3}" ORACLE_HOME=GRID_HOME_path

Even after executing above command if error still exists then click on ignore and click on next.

Once the configuration completed execute below command to update oraInventory.

1. remove the old, incorrect CRS home entry from the inventory.xml:

$GRID_HOME/oui/bin/runinstaller -detachHome -local ORACLE_HOME GRID_HOME_path

2. rerun the failed "attachhome" command:

$GRID_HOME/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME GRID_HOME_path ORACLE_HOME_NAME=Ora11g_gridinfrahome1 "CLUSTER_NODES={node1,node2,node3}" "INVENTORY_LOCATION=/oracle/app/oraInventory" LOCAL_NODE=node1

./runInstaller -attachHome -noClusterEnabled ORACLE_HOME=GRID_HOME_path ORACLE_HOME_NAME=Ora11g_gridinfrahome1 "CLUSTER_NODES={node1,node2,node3}" "INVENTORY_LOCATION=/oracle/app/oraInventory" LOCAL_NODE=node3

3. mark the new home as the CRS home (CRS=true):

GRID_HOME/oui/bin/./runinstaller -local -updateNodeList ORACLE_HOME GRID_HOME_path "CLUSTER_NODES={node1,node2,node3}" CRS="true"

If you face issue at GUI as below.

The cluvfy can be ignored as this is just to get verification of how your cluster is.

Mount the diskgruops

. oraenv

+ASM1

Sqlplus ‘/as sysasm’

SQL> desc v$asm_diskgroup

Name Null? Type

----------------------------------------- -------- ----------------------------

GROUP_NUMBER NUMBER

NAME VARCHAR2(30)

SECTOR_SIZE NUMBER

BLOCK_SIZE NUMBER

ALLOCATION_UNIT_SIZE NUMBER

STATE VARCHAR2(11)

TYPE VARCHAR2(6)

TOTAL_MB NUMBER

FREE_MB NUMBER

HOT_USED_MB NUMBER

COLD_USED_MB NUMBER

REQUIRED_MIRROR_FREE_MB NUMBER

USABLE_FILE_MB NUMBER

OFFLINE_DISKS NUMBER

COMPATIBILITY VARCHAR2(60)

DATABASE_COMPATIBILITY VARCHAR2(60)

VOTING_FILES VARCHAR2(1)

SQL> select NAME,STATE from v$asm_diskgroup;

NAME STATE

------------------------------ -----------

OCR_VOTE MOUNTED

DG_ARCHIVE1 DISMOUNTED

DG_DATA1 DISMOUNTED

DG_REDO1 DISMOUNTED

SQL> alter diskgroup DG_ARCHIVE1 mount;

Diskgroup altered.

SQL> select NAME,STATE from v$asm_diskgroup;

NAME STATE

------------------------------ -----------

OCR_VOTE MOUNTED

DG_ARCHIVE1 MOUNTED

DG_DATA1 DISMOUNTED

DG_REDO1 DISMOUNTED

SQL> alter diskgroup DG_DATA1 mount;

Diskgroup altered.

SQL> alter diskgroup DG_REDO1 mount;

Diskgroup altered.

SQL> select NAME,STATE from v$asm_diskgroup;

NAME STATE

------------------------------ -----------

OCR_VOTE MOUNTED

DG_ARCHIVE1 MOUNTED

DG_DATA1 MOUNTED

DG_REDO1 MOUNTED

Try to startup the database

root@NEW_node1:$ORACLE_HOME/bin# ./srvctl start database -d accup11
PRCD-1120 : The resource for database accup11 could not be found.
PRCR-1001 : Resource ora.accup11.db does not exist

If you face issue like below

check for the database resources are added in cluster or not.

crsctl stat res –t

if there is no database resources added in it try to startup database connecting to sqlplus.

Startup the instance on each node

. oraenv

NEW_NODE1 ACCUP111

NEW_NODE1 ACCUP112

NEW_NODE1 ACCUP113

Startup

If it’s get startup without any issue.

Then just add the data base resource to cluster by following command.

srvctl add database -d accup11 -o $ORACLE_HOME +DG_DATA1/accup11/spfileaccup11.ora -a DG_ARCHIVE1,DG_DATA1,DG_REDO1

($ORACLE_HOME is the path of your oracle home)

crsctl stat res –t

ora.dbname.db

1 OFFLINE OFFLINE

1 ONLINE ONLINE node1

If the instance are not added in the cluster add them using below command.

1. srvctl add instance -d dbname -i instance1 -n node1

2. srvctl add instance -d dbname -i instance2 -n node2

3. srvctl add instance -d dbname -i instance3 -n node3

srvctl config database -d dbname

Check again

crsctl stat res –t

1 ONLINE ONLINE node1
ora.dbname.db
1 OFFLINE OFFLINE
2 OFFLINE OFFLINE
3 OFFLINE OFFLINE

Just shutdown database connecting to all the nodes.

Startup using following command

./srvctl start database -d dbname

oracle@new_node1:/oracle/grid/product/11.2.0/grid/bin$ ps -aef |grep smon

root 18023 1 2 04:31 ? 00:01:34 /oracle/grid/product/11.2.0/grid/bin/osysmond.bin

grid 18162 1 0 04:31 ? 00:00:00 asm_smon_+ASM1

oracle 26572 1 0 05:29 ? 00:00:00 ora_smon_dbname

oracle 26835 25608 0 05:29 pts/0 00:00:00 grep smon

Check again

crsctl stat res –t

oracle 26835 25608 0 05:29 pts/0 00:00:00 grep smon

oracle@new_node1:$GRID_HOME/bin$ crsctl stat res -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DG_ARCHIVE1.dg

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.DG_DATA1.dg

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.DG_REDO1.dg

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.LISTENER.lsnr

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.OCR_VOTE.dg

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.asm

ONLINE ONLINE new_node1 Started

ONLINE ONLINE new_node2 Started

ONLINE ONLINE new_node3 Started

ora.gsd

OFFLINE OFFLINE new_node1

OFFLINE OFFLINE new_node2

OFFLINE OFFLINE new_node3

ora.net1.network

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.ons

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

ora.registry.acfs

ONLINE ONLINE new_node1

ONLINE ONLINE new_node2

ONLINE ONLINE new_node3

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE new_node2

ora.LISTENER_SCAN2.lsnr

1 ONLINE ONLINE new_node3

ora.LISTENER_SCAN3.lsnr

1 ONLINE ONLINE new_node1

ora.accup11.db

1 ONLINE ONLINE new_node1 Open

2 ONLINE ONLINE new_node2 Open

3 ONLINE ONLINE new_node3 Open

ora.cvu

1 ONLINE ONLINE new_node1

ora.new_node1.vip

1 ONLINE ONLINE new_node1

ora.new_node2.vip

1 ONLINE ONLINE new_node2

ora.new_node3.vip

1 ONLINE ONLINE new_node3

ora.oc4j

1 ONLINE ONLINE new_node1

ora.scan1.vip

1 ONLINE ONLINE new_node2

ora.scan2.vip

1 ONLINE ONLINE new_node3

ora.scan3.vip

1 ONLINE ONLINE new_node1

oracle@new_node1:$GRID_HOME/bin$

To stop database and cluster resources on RAC.

$ORACLE_HOME/bin/srvctl stop database -d DBNAME

$GRID_HOME/bin/crsctl stop cluster -all.

That's It

There you good to go.................................................

Please subscribe for latest updates.

It's Easy with AJEET

Sunday, 1 November 2015

CLUSTER DE-CONFIGURING AND RE-CONFIGURING

CLUSTER DE-CONFIGURING

AND

RE-CONFIGURING

9 comments:

RemoteHostExecutor.pl The file access permissions while patching

Wikipedia