Oracle RAC 移除node

发布于:2025-08-09 ⋅ 阅读:(16) ⋅ 点赞:(0)

Applies to:

Oracle Database Backup Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]
Oracle Database - Standard Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.2.0.1 to 11.1.0.6
Oracle Clusterware



 

Goal

This document is intended to provide the steps to be taken to remove a node from the Oracle cluster. The node itself is unavailable due to some OS issue or hardware issue which prevents the node from starting up. This document will provide the steps to remove such a node so that it can be added back after the node is fixed.

The steps to remove a node from a Cluster is already documented in the Oracle documentation at

Version Documentation Link
10gR2 https://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/adddelunix.htm#BEIFDCAF
11gR1 https://download.oracle.com/docs/cd/B28359_01/rac.111/b28255/adddelclusterware.htm#BEIFDCAF

This note is different because the documentation covers the scenario where the node is accessible and the removal is a planned procedure. This note covers the scenario where the Node is unable to boot up and therefore it is not possible to run the cluster-ware commands from this node.

For 11gR2, refer to note 1262925.1

Solution

Summary

Basically all the steps documented in the Oracle Cluster-ware Administration and Deployment Guide must be followed. The difference here is that we skip the steps that are to be executed on the node which is not available and we run some extra commands on the other node which is going to remain in the cluster to remove the resources from the node that is to be removed.

Example Configuration

 All steps outlined in this document were executed on a cluster with the following configuration:

Item Value
Node Names <HOSTNAME1>, <HOSTNAME2>, <HOSTNAME3>
Operating System Oracle Enterprise Linux 5 Update 4
Oracle Clusterware Release 10.2.0.5.0
ASM and Database Release 10.2.0.5.0
Clusterware Home ($CRS_HOME)
ASM Home <ASM_HOME>
Database Home <DB_HOME>
 Cluster Name  <CLUSTER_NAME>

 Assume that node <HOSTNAME3> is down due to a hardware failure and cannot even boot up. The plan is to remove it from the clusterware, fix the issue and then add it again to the Clusterware. In this document, we will cover the steps to remove the node from the clusterware

Please note that for better readability instead of 'crs_stat -t' the sample script 'crsstat' from
  Doc ID 259301.1 CRS and 10g/11.1 Real Application Clusters
was used to query the state of the CRS resources. This script is not part of a standard CRS installation.

Initial Stage

At this stage, the Oracle Clusterware is up and running on nodes <HOSTNAME1> & <HOSTNAME2> (good nodes) . Node <HOSTNAME3> is down and cannot be accessed. Note that the Virtual IP of <HOSTNAME3> is running on Node 1. The rest of the <HOSTNAME3> resources are OFFLINE:

[oracle@<HOSTNAME1> ~]$ crsstat
Name                                     Target     State      Host      
-------------------------------------------------------------------------------
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME1>.inst                  ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME2>.inst                  ONLINE     ONLINE     <HOSTNAME2>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME3>.inst                  ONLINE     OFFLINE              
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME1>.srv       ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME2>.srv       ONLINE     ONLINE     <HOSTNAME2>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME3>.srv       ONLINE     OFFLINE              
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.cs                ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.db                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME1>.ASM1.asm                       ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.LISTENER_<CLUSTER_NAME>N1.lsnr            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.gsd                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.ons                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.vip                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME2>.ASM2.asm                       ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.LISTENER_<CLUSTER_NAME>N2.lsnr            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.gsd                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.ons                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.vip                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME3>.ASM3.asm                       ONLINE     OFFLINE              
ora.<HOSTNAME3>.LISTENER_<CLUSTER_NAME>N3.lsnr            ONLINE     OFFLINE              
ora.<HOSTNAME3>.gsd                            ONLINE     OFFLINE              
ora.<HOSTNAME3>.ons                            ONLINE     OFFLINE              
ora.<HOSTNAME3>.vip                            ONLINE     ONLINE     <HOSTNAME1>     
[oracle@<HOSTNAME1> ~]$

Step 1 Remove oifcfg information for the failed node

Generally most installations use the global flag of the oifcfg command and therefore they can skip this step. They can confirm this using:

[oracle@<HOSTNAME1> bin]$ $CRS_HOME/bin/oifcfg getif
eth0  <PUBLIC_IP_SUBNET>  global  public
eth1  <PRIVATE_IP_SUBNET>  global  cluster_interconnect

If the output of the command returns global as shown above then you can skip the following step (executing the command below on a global defination will return an error as shown below.

If the output of the oifcfg getif command does not return global then use the following command

[oracle@<HOSTNAME1> bin]$ $CRS_HOME/bin/oifcfg delif -node <HOSTNAME3>
PROC-4: The cluster registry key to be operated on does not exist.
PRIF-11: cluster registry error

Step 2 Remove ONS information

Execute the following command to find out the remote port number to be used

[oracle@<HOSTNAME1> bin]$ cat $CRS_HOME/opmn/conf/ons.config
localport=6113
remoteport=6200
loglevel=3
useocr=on

and remove the information pertaining to the node to be deleted using:

[oracle@<HOSTNAME1> bin]$ $CRS_HOME/bin/racgons remove_config <HOSTNAME3>:6200

Step 3 Remove resources

In this step, the resources that were defined on this node have to be removed. These resources include Database Instances, ASm, Listener and Nodeapps resources. A list of these can be acquired by running crsstat (crs_stat -t) command from any node

[oracle@<HOSTNAME1> ~]$ crsstat |grep OFFLINE
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME3>.inst                  ONLINE     OFFLINE              
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME3>.srv       ONLINE     OFFLINE              
ora.<HOSTNAME3>.ASM3.asm                       ONLINE     OFFLINE              
ora.<HOSTNAME3>.LISTENER_<CLUSTER_NAME>N3.lsnr            ONLINE     OFFLINE              
ora.<HOSTNAME3>.gsd                            ONLINE     OFFLINE              
ora.<HOSTNAME3>.ons                            ONLINE     OFFLINE             

 Before removing any resource it is recommended to take a backup of the OCR:

[root@<HOSTNAME1> ~]# cd $CRS_HOME/cdata/<CLUSTER_NAME>
[root@<HOSTNAME1> <CLUSTER_NAME>]# $CRS_HOME/bin/ocrconfig -export ocr_before_node_removal.exp
[root@<HOSTNAME1> <CLUSTER_NAME>]# ls -l ocr_before_node_removal.exp
-rw-r--r-- 1 root root 151946 Nov 15 15:24 ocr_before_node_removal.exp

 Use 'srvctl' from the database home to delete the database instance on node 3:

[oracle@<HOSTNAME1> ~]$ . oraenv
ORACLE_SID = [oracle] ? <CLUSTER_NAME>DB_NAME1
[oracle@<HOSTNAME1> ~]$ $ORACLE_HOME/bin/srvctl remove instance -d <CLUSTER_NAME>DB_NAME1 -i <CLUSTER_NAME><DB_NAME3>
Remove instance <CLUSTER_NAME><DB_NAME3> from the database <CLUSTER_NAME>DB_NAME1? (y/[n]) y

 Use 'srvctl' from the ASM home to delete the ASM instance on node 3:

[oracle@<HOSTNAME1> ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
[oracle@<HOSTNAME1> ~]$ $ORACLE_HOME/bin/srvctl remove asm -n <HOSTNAME3>

Next remove the listener resource.

Please note that there is no 'srvctl remove listener' subcommand prior to 11.1 so this command will not work in 10.2. Using 'netca' to delete the listener from a down node also is not an option as netca needs to remove the listener configuration from the listener.ora.

10.2 only:

The only way to remove the listener resources is to use the command 'crs_unregister', please use this command only in this particular scenario:

[oracle@<HOSTNAME1> <CLUSTER_NAME>]$ $CRS_HOME/bin/crs_unregister ora.<HOSTNAME3>.LISTENER_<CLUSTER_NAME>N3.lsnr

 11.1 only:

 Set the environment to the home from which the listener runs (ASM or database):

[oracle@<HOSTNAME1> ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
[oracle@<HOSTNAME1> <CLUSTER_NAME>]$ $ORACLE_HOME/bin/srvctl remove listener -n <HOSTNAME3>

  As user root stop the nodeapps resources:

[root@<HOSTNAME1> oracle]# $CRS_HOME/bin/srvctl stop nodeapps -n <HOSTNAME3>
[root@<HOSTNAME1> oracle]# crsstat |grep OFFLINE
ora.<HOSTNAME3>.LISTENER_<CLUSTER_NAME>N3.lsnr            OFFLINE    OFFLINE              
ora.<HOSTNAME3>.gsd                            OFFLINE    OFFLINE              
ora.<HOSTNAME3>.ons                            OFFLINE    OFFLINE              
ora.<HOSTNAME3>.vip                            OFFLINE    OFFLINE        

 Now remove them:

[root@<HOSTNAME1> oracle]#  $CRS_HOME/bin/srvctl remove nodeapps -n <HOSTNAME3>
Please confirm that you intend to remove the node-level applications on node <HOSTNAME3> (y/[n]) y

 At this point all resources from the bad node should be gone:

[oracle@<HOSTNAME1> ~]$ crsstat
Name                                     Target     State      Host      
-------------------------------------------------------------------------------
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME1>.inst                  ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME><DB_NAME2>.inst                  ONLINE     ONLINE     <HOSTNAME2>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME1>.srv       ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.<CLUSTER_NAME><DB_NAME2>.srv       ONLINE     ONLINE     <HOSTNAME2>     
ora.<CLUSTER_NAME>DB_NAME1.<CLUSTER_NAME>DB_NAME1_SRV1.cs                ONLINE     ONLINE     <HOSTNAME1>     
ora.<CLUSTER_NAME>DB_NAME1.db                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME1>.ASM1.asm                       ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.LISTENER_<CLUSTER_NAME>N1.lsnr            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.gsd                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.ons                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME1>.vip                            ONLINE     ONLINE     <HOSTNAME1>     
ora.<HOSTNAME2>.ASM2.asm                       ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.LISTENER_<CLUSTER_NAME>N2.lsnr            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.gsd                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.ons                            ONLINE     ONLINE     <HOSTNAME2>     
ora.<HOSTNAME2>.vip                            ONLINE     ONLINE     <HOSTNAME2>  

Step 4 Execute rootdeletenode.sh

From the node that you are not deleting execute as root the following command which will help find out the node number of the node that you want to delete

[oracle@<HOSTNAME1> ~]$ $CRS_HOME//bin/olsnodes -n
<HOSTNAME1>   1
<HOSTNAME2>   2
<HOSTNAME3>   3

this number can be passed to the rootdeletenode.sh command which is to be executed as root from any node which is going to remain in the cluster.

[root@<HOSTNAME1> ~]# cd $CRS_HOME/install
[root@<HOSTNAME1> install]# ./rootdeletenode.sh <HOSTNAME3>,3
CRS-0210: Could not find resource 'ora.<HOSTNAME3>.ons'.
CRS-0210: Could not find resource 'ora.<HOSTNAME3>.vip'.
CRS-0210: Could not find resource 'ora.<HOSTNAME3>.gsd'.
CRS-0210: Could not find resource ora.<HOSTNAME3>.vip.
CRS nodeapps are deleted successfully
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully deleted 14 values from OCR.
Key SYSTEM.css.interfaces.node<CLUSTER_NAME>n3 marked for deletion is not there. Ignoring.
Successfully deleted 5 keys from OCR.
Node deletion operation successful.
'<HOSTNAME3>,3' deleted successfully
[root@<HOSTNAME1> install]# $CRS_HOME/bin/olsnodes -n
<HOSTNAME1>   1
<HOSTNAME2>   2

Step 5 Update the Inventory

From one of the remaining cluster nodes run the following command as owner of the CRS_HOME. The argument to be passed to the CLUSTER_NODES is a comma seperated list of node names of the cluster which are going to remain in the cluster. This step needs to be performed once per home (Clusterware, ASM and RDBMS homes).

[oracle@<HOSTNAME1> install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME= "CLUSTER_NODES={<HOSTNAME1>,<HOSTNAME2>}" CRS=TRUE 
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.

[oracle@<HOSTNAME1> install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=<ASM_HOME> "CLUSTER_NODES={<HOSTNAME1>,<HOSTNAME2>}"
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.
[oracle@<HOSTNAME1> install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=<DB_HOME> "CLUSTER_NODES={<HOSTNAME1>,<HOSTNAME2>}"
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
'UpdateNodeList' was successful.

Solution

Removing a Node from an 11.2 Cluster where the node no longer exists.


The following steps demonstrate the process for removing a node from a Grid Infrastructure Cluster:

Please Note this demonstrates the process in a test environment using the conditions specified in the environment section below which may not necessarily match those of your environment.


Please Note this demonstrates the process in a test environment using the conditions specified in the environment section below which may not necessarily match those of your environment.

The environment
------------------------

2 Node 11.2 Grid Cluster, Nodes *Node1* and *Node2*
ASM based Storage
2 Node RAC Database where the Instances are Administrator Managed (not policy managed), *RAC INST 1* and *RAC INST 2*
1 Service *RAC SERVICE1* running as preferred on both RAC Instances


The Issue
-----------------

*Node2* is lost, cannot be recovered and needs to be removed from the cluster.


The process for performing the removal of a failed node has been based on the node deletion processes documented in the Grid and RAC administration guides.

Documentation Reference:

https://docs.oracle.com/cd/E11882_01/rac.112/e41960/adddelunix.htm#BEIEEAFC

Oracle Real Application Clusters Administration and Deployment Guide
11g Release 2 (11.2)
Part Number E16795-08

10 Adding and Deleting Oracle RAC from Nodes on Linux and UNIX Systems

- Deleting Oracle RAC from a Cluster Node

1. Reconfigure the RDBMS Services in the cluster to take into account node 2 is gone.

1.1 Reconfigure the Service *RAC SERVICE1* so that it is only running on the remaining instance.

[oracle@*Node1* ~]$ srvctl modify service -d *RAC DB* -s *RAC SERVICE1* -n -i *RAC INST 1* -f


1.2 Examine the configuration to ensure the service is removed from instance *RAC INST 2* and node *Node2*.

[oracle@*Node1* ~]$ srvctl status service -d *RAC DB* -s *RAC SERVICE1*
Service *RAC SERVICE1* is running on instance(s) *RAC INST 1*

[root@*Node1* ~]# /opt/app/oracle/product/grid/bin/crsctl stat res -t
..
ora.*RAC DB*.*RAC SERVICE1*.svc
1 ONLINE ONLINE *Node1*
..

2. Reconfigure the RDBMS Instances in the cluster to take into account node 2 is gone.

2.1. Remove the database instances. As this is an Administrator Managed database this can be performed through dbca. From the RAC Instance Management section in dbca follow the wizard to remove the Instance *RAC INST 2* from *Node2*.

[oracle@*Node1* ~]$ dbca
[oracle@*Node1* ~]$

[oracle@*Node1* ~]$ srvctl config database -d *RAC DB*
Database unique name: *RAC DB*
Database name: *RAC DB*
Oracle home: /opt/app/oracle/database/11.2/db_1
Oracle user: oracle
Spfile: +DATA1/*RAC DB*/spfile*RAC DB*.ora
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: *RAC DB*
Database instances: *RAC INST 1*
Disk Groups: DATA1
Services: *RAC SERVICE1*
Database is administrator managed

[root@*Node1* ~]# /opt/app/oracle/product/grid/bin/crsctl stat res -t
..
ora.*RAC DB*.db
1 ONLINE ONLINE *Node1* Open
ora.*RAC DB*.*RAC SERVICE1*.svc
1 ONLINE ONLINE *Node1*
..

3. Remove the Node from the RAC Cluster


3.1 Using the Installer remove the failed node from Inventory of the Remaining Node(s)

[oracle@*Node1* ~]$ cd $ORACLE_HOME/oui/bin
[oracle@*Node1* bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/database/11.2/db_1 "CLUSTER_NODES={*Node1*}"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB. Actual 2601 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oracle/oraInventory
'UpdateNodeList' was successful.

4. Remove the Node from the Grid Cluster

The process for performing the removal of a failed node has been based on the node deletion processes documented in the Grid and RAC administration guides.

Documentation Reference:

Oracle Clusterware Administration and Deployment Guide
11g Release 2 (11.2)
Part Number E16794-08

4 Adding and Deleting Cluster Nodes

https://docs.oracle.com/cd/E11882_01/rac.112/e41959/adddelclusterware.htm#BEIFDCAF


From any node that you are not deleting,  run the following commands from the Grid_home/bin directory as root to delete the node from the cluster:

4.1 Stop the VIP resource for the node *Node2*

[root@*Node1* bin]# ./srvctl stop vip -i *Node2-vip*


4.2 Remove the VIP for the node *Node2*


[root@*Node1* bin]# ./srvctl remove vip -i *Node2-vip* -f


4.3 Check the state of the environment and ensure the VIP for node *Node2* is removed.

[root@*Node1* bin]# ./crsctl stat res -t

..
ora.*Node1-vip*.vip
1 ONLINE ONLINE *Node1*
..

4.4 Remove *Node2* from the Grid Infrastructure/clusterware

If the node is pinned, try to unpin it first:

olsnodes -s -t
crsctl unpin css -n <node>

# crsctl delete node -n *Node2*


4.5 As the owner of the Grid Infrastructure Installation perform the following to clean up the Grid Infrastructure inventory on the remaining nodes (in this case, *Node1*).

[root@*Node1* bin]# su - oracle
[oracle@*Node1* ~]$ . oraenv *RAC INST 1*

[oracle@*Node1* ~]$ cd $ORACLE_HOME/oui/bin

[oracle@*Node1* ~]$ ./runInstaller -updateNodeList ORACLE_HOME=/opt/app/oracle/product/grid "CLUSTER_NODES={*Node1*}" CRS=TRUE -silent


4.6  As root now list the nodes that are a part of the cluster to confirm the node required (*Node2*) has been removed successfully and the only remaining node in this case is node *Node1*.

At the end of this process only the node *Node1* remains as a part of the cluster.

[root@*Node1* bin]# ./olsnodes
*Node1*

Community Discussions
 

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject.

Note: Window is the LIVE community not a screenshot.

Click here to open in main browser window.


网站公告

今日签到

点亮在社区的每一天
去签到