Sunday 1 November 2015

CLUSTER DE-CONFIGURING AND RE-CONFIGURING

CLUSTER DE-CONFIGURING 

AND 

RE-CONFIGURING



It's Easy with AJEET,

Below section I have provided complete steps to de-config and re-config the cluster on 3 node RAC.
It has been done when the host-name and IP address of all the 3 nodes have been changed.
When the host-name and IP address is changed the existing cluster will stop working so its very important to make your cluster to start so that your database works fine and good as earlier it use to be.

So this can be done in two ways.

1. Delete and add node (which I am going to post in another section)
2. De-configuring and Re-configuring cluster.

I perter doing it the 2nd way, you may ask why? 

the reason is very simple,  think of a RAC environment where you have more node (may be 5 to 10),
how much time it will take to delete the node and add the node again.


Grid Infrastructure Cluster - Entire Cluster

Deconfigure and reconfigure entire cluster will rebuild OCR and Voting Disk, user resources (database, instance, service, listener etc) will need to be added back to the cluster manually after reconfigure finishes.
Why is deconfigure needed?
Deconfigure is needed when:
  • OCR is corrupted without any good backup
  • Or GI stack will not come up on any nodes due to missing Oracle Clusterware related files in /etc or /var/opt/oracle, i.e. init.ohasd missing etc. If GI is able to come up on at least one node, refer to next Section "B. Grid Infrastructure Cluster - One or Partial Nodes".
  • $GRID_HOME should be intact as deconfigure will NOT fix $GRID_HOME corruption


In below case we consider IP addresses and hostname of all the node have been changed.

PRECHECK BEFORE PERFORMING DE-CONFIGURE OF CLUSTER
1.       Before de-configuring a node, ensure it's not pinned, i.e
$GI_HOME/bin/olsnodes -s -t
Node1 Active Unpinned
node2 Active Unpinned
node3 Active Unpinned
2.       If a node is pinned, unpin it first, i.e. as root user: 
/oracle/grid/product/11.2.0/grid/bin/crsctl unpin css -n <node_name>



3.      Before de-configuring, collect the following as grid user if possible to generate a list of user resources to be added back to the cluster after reconfigure finishes:
$GRID_HOME/bin/crsctl stat res -t
$GRID_HOME/bin/crsctl stat res -p
$GRID_HOME/bin/crsctl query css votedisk
$GRID_HOME/bin/ocrcheck
$GRID_HOME/bin/oifcfg getif
$GRID_HOME/bin/srvctl config nodeapps -a
$GRID_HOME/bin/srvctl config scan
$GRID_HOME/bin/srvctl config asm -a
$GRID_HOME/bin/srvctl config listener -l <listener-name> -a
$DB_HOME/bin/srvctl config database -d <dbname> -a
$DB_HOME/bin/srvctl config service -d <dbname> -s <service-name> -v

TO DECONFIG THE CLUSTERWARE
  • If OCR and Voting Disks are NOT on ASM, or If OCR and Voting Disks are on ASM but there's NO user data in OCR/Voting Disk ASM diskgroup:
On all remote nodes, as root execute:
/oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -verbose

Once the above command finishes on all remote nodes, on local node, as root execute:
/oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode

If there is user data in OCR/Voting Disk ASM diskgroup

# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -keepdg –lastnode

We do not had any user data on OCR VOTING DISK so we followed
$GRID_HOME/crs/install/rootcrs.pl -deconfig -force -verbose -lastnode


Once de-configuration completed follow the below steps before re- configuration of cluster


Clean up the profile.xml files
The profile.xml files will contains the old ip address so to update the new ip address to profile.xml we have to clean the old profile.xml file.
$GRID_HOME/gpnp/node1/profiles/peer#
In case if we are not cleaning the profile.xml file we may have to face the issue has below.
The below will occur while executing root.sh script.
ERROR:
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
Start of resource "ora.cluster_interconnect.haip" failed
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted. For details refer to "(:CLSN00107:)" in "/oracle/grid/product/11.2.0/grid/log/node1/agent/ohasd/orarootagent_root/orarootagent_root.log".
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'node1' failed
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-4000: Command Start failed, or completed with errors.
HAIP startup failure considered fatal, terminating at /oracle/grid/product/11.2.0/grid/crs/install/crsconfig_lib.pm line 1330.
/oracle/grid/product/11.2.0/grid/perl/bin/perl -I/oracle/grid/product/11.2.0/grid/perl/lib -I/oracle/grid/product/11.2.0/grid/crs/install /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl execution failed
****************************************************************************
TO CLEAN PROFILE.XML  AND CHECKPOINT FILE USE THE FOLLOWING COMMAND
a. Step1 and 2 can be skipped on node(s) where root.sh haven't been executed this time.

    b. Step1 and 2 should remove checkpoint file. To verify:

          ls -l $
 /oracle/grid/oracle_base/Clusterware/ckptGridHA_.xml

    If it's still there, please remove it manually with "rm" command on all nodes

    c. If GPNP profile is different between nodes/setup, clean it up on all nodes as grid user

          $ find /oracle/grid/product/11.2.0/grid/gpnp/* -type f -exec rm -rf {} \;          


Clean up the OCR_VOTE disk or the disk which contains voting files.

If we are not cleaning OCR_VOTE1 disk we may have to face the error as below.
The error will occur at the time of executing root.sh.

ERROR:
bash: /root/.bashrc: Permission denied

Disk Group OCR_VOTE mounted successfully.

Existing OCR configuration found, aborting the configuration. Rerun configuration setup after deinstall at /oracle/grid/product/11.2.0/grid/crs/install/crsconfig_lib.pm line 10302.
/oracle/grid/product/11.2.0/grid/perl/bin/perl -I/oracle/grid/product/11.2.0/grid/perl/lib -I/oracle/grid/product/11.2.0/grid/crs/install /oracle/grid/product/11.2.0/grid/crs/install/rootcrs.pl execution failed

dd if=/dev/zero of=/dev/xvdd1 bs=1048576 count=10
dd if=/dev/zero of=/dev/xvde1 bs=1048576 count=10
dd if=/dev/zero of=/dev/xvdf1 bs=1048576 count=10
To clear the asm OCR_VOTE1 disk follow below steps

NOTE:
Befor performing below steps make sure you are using proper disk name.
Clear the details of disk from asm binary
dd if=/dev/zero of=/dev/xvdd1 bs=1048576 count=10
dd if=/dev/zero of=/dev/xvde1 bs=1048576 count=10
dd if=/dev/zero of=/dev/xvdf1 bs=1048576 count=10
Delete the OCR_VOTE DISKS
oracleasm deletedisk OCR_VOTE3
oracleasm deletedisk OCR_VOTE2
oracleasm deletedisk OCR_VOTE1

Create the OCR_VOTE DISKS
/etc/init.d/oracleasm createdisk OCR_VOTE1 /dev/xvdd1
/etc/init.d/oracleasm createdisk OCR_VOTE2 /dev/xvde1
etc/init.d/oracleasm createdisk OCR_VOTE3 /dev/xvdf1


TO CONFIGURE CLUSTER

export DISPLAY=HOSTNAME
$$GRID_HOME /crs/config/config.sh

Follow the instruction on GUI
Run the root.sh first on node1 after successful execution on NEW_NODE1 execute it on NEW_NODE2 and NEW_NODE3.


ISSUES FACE AT GUI.
*****************
If You face errors at GUI as below.





Just stop at this step and execute the below commands and click on RETRY
$GRID_HOME/oui/bin/runInstaller -nowait -noconsole -waitforcompletion -ignoreSysPrereqs -updateNodeList -silent CRS=true "CLUSTER_NODES={node1,node2,node3}" ORACLE_HOME=GRID_HOME_path
Even after executing above command if error still exists then click on ignore and click on next.
Once the configuration completed execute below command to update oraInventory.
1. remove the old, incorrect CRS home entry from the inventory.xml:
$GRID_HOME/oui/bin/runinstaller -detachHome -local ORACLE_HOME GRID_HOME_path
 2. rerun the failed "attachhome" command:
$GRID_HOME/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME GRID_HOME_path ORACLE_HOME_NAME=Ora11g_gridinfrahome1 "CLUSTER_NODES={node1,node2,node3}" "INVENTORY_LOCATION=/oracle/app/oraInventory" LOCAL_NODE=node1

$GRID_HOME/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME GRID_HOME_path ORACLE_HOME_NAME=Ora11g_gridinfrahome1 "CLUSTER_NODES={node1,node2,node3}" "INVENTORY_LOCATION=/oracle/app/oraInventory" LOCAL_NODE=node2

./runInstaller -attachHome -noClusterEnabled ORACLE_HOME=GRID_HOME_path ORACLE_HOME_NAME=Ora11g_gridinfrahome1 "CLUSTER_NODES={node1,node2,node3}" "INVENTORY_LOCATION=/oracle/app/oraInventory" LOCAL_NODE=node3

3. mark the new home as the CRS home (CRS=true):
GRID_HOME/oui/bin/./runinstaller -local -updateNodeList ORACLE_HOME GRID_HOME_path "CLUSTER_NODES={node1,node2,node3}" CRS="true"




If you face issue at GUI as below.


The cluvfy can be ignored as this is just to get verification of how your cluster is.


Mount the diskgruops
. oraenv
+ASM1
Sqlplus ‘/as sysasm’
SQL> desc v$asm_diskgroup
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 GROUP_NUMBER                                       NUMBER
 NAME                                               VARCHAR2(30)
 SECTOR_SIZE                                        NUMBER
 BLOCK_SIZE                                         NUMBER
 ALLOCATION_UNIT_SIZE                               NUMBER
 STATE                                              VARCHAR2(11)
 TYPE                                               VARCHAR2(6)
 TOTAL_MB                                           NUMBER
 FREE_MB                                            NUMBER
 HOT_USED_MB                                        NUMBER
 COLD_USED_MB                                       NUMBER
 REQUIRED_MIRROR_FREE_MB                            NUMBER
 USABLE_FILE_MB                                     NUMBER
 OFFLINE_DISKS                                      NUMBER
 COMPATIBILITY                                      VARCHAR2(60)
 DATABASE_COMPATIBILITY                             VARCHAR2(60)
 VOTING_FILES                                       VARCHAR2(1)

SQL> select NAME,STATE from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
OCR_VOTE                       MOUNTED
DG_ARCHIVE1                    DISMOUNTED
DG_DATA1                       DISMOUNTED
DG_REDO1                       DISMOUNTED

SQL> alter diskgroup DG_ARCHIVE1 mount;

Diskgroup altered.

SQL> select NAME,STATE from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
OCR_VOTE                       MOUNTED
DG_ARCHIVE1                    MOUNTED
DG_DATA1                       DISMOUNTED
DG_REDO1                       DISMOUNTED

SQL> alter diskgroup DG_DATA1 mount;

Diskgroup altered.

SQL> alter diskgroup DG_REDO1 mount;

Diskgroup altered.

SQL> select NAME,STATE from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
OCR_VOTE                       MOUNTED
DG_ARCHIVE1                    MOUNTED
DG_DATA1                       MOUNTED
DG_REDO1                       MOUNTED


Try to startup the database

root@NEW_node1:$ORACLE_HOME/bin# ./srvctl start database -d accup11
PRCD-1120 : The resource for database accup11 could not be found.
PRCR-1001 : Resource ora.accup11.db does not exist

If you face issue like below

check for the database resources are added in cluster or not.

crsctl stat res –t

if there is no database resources added in it try to startup database connecting to sqlplus.

Startup the instance on each node

. oraenv
NEW_NODE1 ACCUP111
NEW_NODE1 ACCUP112
NEW_NODE1 ACCUP113

Startup

If it’s get startup without any issue.
Then just add the data base resource to cluster by following command.
srvctl add database -d accup11 -o $ORACLE_HOME +DG_DATA1/accup11/spfileaccup11.ora -a DG_ARCHIVE1,DG_DATA1,DG_REDO1
                                                          ($ORACLE_HOME is the path of your oracle home)
crsctl stat res –t
ora.dbname.db
      1        OFFLINE OFFLINE
1 ONLINE ONLINE node1
If the instance are not added in the cluster add them using below command.

1. srvctl add instance -d dbname -i instance1 -n node1
2. srvctl add instance -d dbname -i instance2 -n node2
3. srvctl add instance -d dbname -i instance3 -n node3

srvctl config database -d dbname

Check again

crsctl stat res –t
1 ONLINE ONLINE node1
ora.dbname.db
1 OFFLINE OFFLINE
2 OFFLINE OFFLINE
3 OFFLINE OFFLINE

Just shutdown database connecting to all the nodes.
Startup using following command

./srvctl start database -d dbname


oracle@new_node1:/oracle/grid/product/11.2.0/grid/bin$ ps -aef |grep smon
root     18023     1  2 04:31 ?        00:01:34 /oracle/grid/product/11.2.0/grid/bin/osysmond.bin
grid     18162     1  0 04:31 ?        00:00:00 asm_smon_+ASM1
oracle   26572     1  0 05:29 ?        00:00:00 ora_smon_dbname
oracle   26835 25608  0 05:29 pts/0    00:00:00 grep smon

Check again

crsctl stat res –t
oracle   26835 25608  0 05:29 pts/0    00:00:00 grep smon
oracle@new_node1:$GRID_HOME/bin$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DG_ARCHIVE1.dg
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.DG_DATA1.dg
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.DG_REDO1.dg
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.LISTENER.lsnr
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.OCR_VOTE.dg
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.asm
               ONLINE  ONLINE       new_node1                Started
               ONLINE  ONLINE       new_node2                Started
               ONLINE  ONLINE       new_node3                Started
ora.gsd
               OFFLINE OFFLINE      new_node1
               OFFLINE OFFLINE      new_node2
               OFFLINE OFFLINE      new_node3
ora.net1.network
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.ons
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
ora.registry.acfs
               ONLINE  ONLINE       new_node1
               ONLINE  ONLINE       new_node2
               ONLINE  ONLINE       new_node3
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       new_node2
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       new_node3
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       new_node1
ora.accup11.db
      1        ONLINE  ONLINE       new_node1                Open
      2        ONLINE  ONLINE       new_node2                Open
      3        ONLINE  ONLINE       new_node3                Open
ora.cvu
      1        ONLINE  ONLINE       new_node1
ora.new_node1.vip
      1        ONLINE  ONLINE       new_node1
ora.new_node2.vip
      1        ONLINE  ONLINE       new_node2
ora.new_node3.vip
      1        ONLINE  ONLINE       new_node3
ora.oc4j
      1        ONLINE  ONLINE       new_node1
ora.scan1.vip
      1        ONLINE  ONLINE       new_node2
ora.scan2.vip
      1        ONLINE  ONLINE       new_node3
ora.scan3.vip
      1        ONLINE  ONLINE       new_node1

oracle@new_node1:$GRID_HOME/bin$
To stop database and cluster resources on RAC.
$ORACLE_HOME/bin/srvctl stop database -d DBNAME
$GRID_HOME/bin/crsctl stop cluster -all.



That's It 


There you good to go.................................................










Please subscribe for latest updates.

6 comments:

Please leave your feedback, that improve me.............

RemoteHostExecutor.pl The file access permissions while patching

Hi, In this article, I am going to share one the issue which we usually face while patching. Here our DB and Grid home are 12.1.0.2, an...