Oracle-RAC服务service切换失败问题合集

发布于:2022-11-28 ⋅ 阅读:(429) ⋅ 点赞:(0)

前言:

        Oracle-RAC服务service是集群高可用的特性之一,通过service可以控制用户连接集群的节点、集群连接的负载,并且在集群出现故障时,服务可以进行自动切换failover到其他的存活节点,但在服务切换的过程中,可能会遇到服务切换失败的问题,因此,本文接下来将介绍作者遇到过的切换失败问题场景以及解决方法,希望对读者有所帮助。

问题场景一:service配置导致

        service服务只指定主节点,没有指定备节点,导致在进行故障切换时,服务没有在备节点online 

---通过srvctl查看服务配置
srvctl config service -d -s

        这种情况下,进行服务切换,无法在备节点online

        需要添加修改服务添加备节点

srvctl modify service -d dbocs -s ocsdbsrv -n -i dbocs2 -a dbocs1

问题场景二:数据库禁用SYSDBA权限登陆

        服务在进行故障切换时,会启动失败,并且报以下错误ora-01017

        注:错误可以通过crsd的oraagent_oracle日志去查看

        查看sqlnet.ora发现配了禁用sysdba登陆

问题场景三:srvctl使用grid用户添加服务配置

        使用srvctl add service添加服务时,要使用Oracle用户,不要使用集群用户grid,否则会出现以下报错

问题场景四:使用srvctl stop instance服务无法切换

        使用srvctl stop instance 关闭实例时,服务没有自动切换到其他节点

srvctl stop instance -d dbocs -i dbocs1

        这个问题通过Oracle官方文档1324574.1确认,属于期望的正常程序步骤

        可以通过以下方式进行规避

        11G

---11G
Use -f option with srvctl to have services on going down instance fail over to available instance.
​
srvctl stop instance -d <RAC> -i <SID1> -f
srvctl status service -d <RAC>
==>
Service <RAC>_test01 is running on instance(s) <SID2>

        12C之后

---12C
In 12c, the syntax/behaviour is changed.
1) If stopping instance without -force or -failover option while you have service running on the stopping instance, errors (PRCD-1315,PRCR-1014, PRCR-1065, CRS-2529) are reported
​
ie)
$srvctl config service -d <RAC> -s <RAC>_test01
​
Service name: <RAC>_test01
...
....
Preferred instances: <SID1>
Available instances: <SID2>
​
$srvctl status service -d <RAC>
Service <RAC>_test01 is running on instance(s) <SID1>
​
$ srvctl stop instance -d <RAC> -i <SID1>
PRCD-1315 : failed to stop instances for database <RAC>
PRCR-1014 : Failed to stop resource ora.<RAC>.db
PRCR-1065 : Failed to stop resource ora.<RAC>.db
CRS-2529: Unable to act on 'ora.<RAC>.db' because that would require stopping or relocating 'ora.<RAC>.<RAC>_test01.svc', but the force option was not specified
​
2)If you want to stop instance and failover the services to another instance, you need to use '-failover' option and '-force'
​
ie)
$ srvctl status service -d <RAC>
Service <RAC>_test01 is running on instance(s) <SID1>
$ srvctl stop instance -d <RAC> -i <SID1> -failover -f
$ srvctl status service -d <RAC>
Service <RAC>_test01 is running on instance(s) <SID2>
​
3) If you want to stop both instance AND services running on the instance, use '-force' option
​
srvctl status service -d <RAC> -s <RAC>_test01
Service <RAC>_test01 is running on instance(s) <SID1>
$ srvctl stop instance -d <RAC> -i <SID1> -force
$ srvctl status service -d <RAC> -s <RAC>_test01
Service <RAC>_test01 is not running.

问题场景五:bug问题导致

        从Oracle官方查看,关于service切换失败的bug还是不少的,可以根据失败的场景以及数据库版本去匹配是否命中了bug