Database instance cannot startup automatically when crs start
database September 9th, 2010事情比较蹊跷,数据库是oracle 10.2.0.4的版本,CRS版本也一样,运行平台是IBM P570(power5+), 节点数为四个。
问题:由于业务的需要,客户每周定时重启数据库,有时就发现某个数据库实例(非ASM实例)无法自动启动,然后用SQL*PLUS启动或者srvctl启动都可以成功启动。
本次遇到的问题是swdb2实例没有自动启动。
首先检查oracle alert log,发现其中根本没有记录启动的信息(也就是说,似乎database根本没被启动)。
检查crsd.log
2010-08-28 09:41:48.640: [ CRSRES][1]32 startup = 1 2010-08-28 09:41:48.755: [ CRSRES][11338]32startRunnable: setting CLI values 2010-08-28 09:41:48.768: [ CRSRES][11081]32Attempting to start `ora.sisu.ASM1.asm` on member `sisu` 2010-08-28 09:41:48.779: [ CRSRES][11852]32startRunnable: setting CLI values 2010-08-28 09:41:48.781: [ CRSRES][11338]32Attempting to start `ora.sisusz.ASM2.asm` on member `sisusz` 2010-08-28 09:41:48.799: [ CRSRES][11852]32Attempting to start `ora.swdb.swdb2.inst` on member `sisusz` 2010-08-28 09:41:48.821: [ CRSRES][11595]32Attempting to start `ora.swdb.swdb1.inst` on member `sisu` 2010-08-28 09:41:54.830: [ CRSAPP][11852]32StartResource error for ora.swdb.swdb2.inst error code = 1 2010-08-28 09:41:56.904: [ CRSRES][11852]32Start of `ora.swdb.swdb2.inst` on member `sisusz` failed. 2010-08-28 09:41:57.801: [ CRSRES][11595]32Start of `ora.swdb.swdb1.inst` on member `sisu` failed. 2010-08-28 09:42:10.353: [ CRSRES][11338]32Start of `ora.sisusz.ASM2.asm` on member `sisusz` succeeded. 2010-08-28 09:42:12.511: [ CRSRES][11081]32Start of `ora.sisu.ASM1.asm` on member `sisu` succeeded. 2010-08-28 09:42:12.514: [ CRSRES][11359]32Skip online resource: ora.sisu.ons 2010-08-28 09:42:12.548: [ CRSRES][11102]32Attempting to start `ora.sisu.gsd` on member `sisu` 2010-08-28 09:42:13.044: [ CRSRES][11102]32Start of `ora.sisu.gsd` on member `sisu` succeeded. 2010-08-28 09:48:45.660: [ CRSRES][11374]32startRunnable: setting CLI values 2010-08-28 09:48:45.670: [ CRSRES][11374]32Attempting to start `ora.swdb.swdb2.inst` on member `sisusz` 2010-08-28 09:48:47.655: [ CRSRES][11374]32Start of `ora.swdb.swdb2.inst` on member `sisusz` succeeded. 2010-08-28 10:54:38.278: [ CRSRES][11376]32ora.sisusz.LISTENER_SISUSZ.lsnr target set to OFFLINE before stop action 2010-08-28 10:54:38.278: [ CRSRES][11376]32StopResource: setting CLI values 2010-08-28 10:54:38.293: [ CRSRES][11376]32Attempting to stop `ora.sisusz.LISTENER_SISUSZ.lsnr` on member `sisusz` 2010-08-28 10:54:38.512: [ CRSRES][11376]32Stop of `ora.sisusz.LISTENER_SISUSZ.lsnr` on member `sisusz` succeeded.
从crsd log里,我们看到了启动失败的信息。当然,如果很熟悉启动流程,在这里也能找到问题的端倪,但是很可惜,最开始我没有发现。
进一步查看在$ORACLE_HOME/log/*/imon_swdb.log中的日志,我们可以看到更详细的原因了:
2010-08-28 09:29:18.792: [ RACG][1] [120726][1][ora.swdb.swdb2.inst]: racgimon exiting 2010-08-28 09:41:49.819: [ RACG][1] [107422][1][ora.swdb.swdb2.inst]: racgimon started 2010-08-28 09:41:54.800: [ RACG][1286] [107422][1286][ora.swdb.swdb2.inst]: SQL*Plus: Release 10.2.0.4.0 – Production on Sat Aug 28 09:41:52 2010 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Enter user-name: Connected to an idle instance. SQL> ORA-01565: error in identifying file ‘+DG3/swdb/spfileswdb.ora’ 2010-08-28 09:41:54.800: [ RACG][1286] [107422][1286][ora.swdb.swdb2.inst]: ORA-17503: ksfdopn:2 Failed to open file +DG3/swdb/spfileswdb.ora ORA-15077: could not locate ASM instance serving a required diskgroup ORA-01078: failure in processing system parameters SQL> Disconnected
结合前面的crsd.log,我们会看的更清楚一些:
1. 09:42:10.353: [ CRSRES][11338]32Start of `ora.sisusz.ASM2.asm` on member `sisusz` succeeded
2. 09:41:54.830: [ CRSAPP][11852]32StartResource error for ora.swdb.swdb2.inst error code = 1
3. 09:41:56.904: [ CRSRES][11852]32Start of `ora.swdb.swdb2.inst` on member `sisusz` failed
ASM实例在9:42:10才启动成功,而database instance启动请求在9:41:54就发起了。在9:41的时候,ASM instance自身还没启动, disk group没有mount,自然spfile也就找不到了,database instance也就启动不了了。
为什么没有在alert log中记录呢? 没找到启动所需要的参数文件,这个时候是不会写alert的。 那怎样让它能够在ASM instance启动之后再去启动Database instance呢? Oracle在OCR里有定义这个启动顺序的,难道是顺序乱了?
使用crs_stat –p,我们可以找到这部分内容:
NAME=ora.swdb.swdb2.inst TYPE=application ACTION_SCRIPT=/u01/app/oracle/product/10.2.0/db/bin/racgwrap ACTIVE_PLACEMENT=0 AUTO_START=1 CHECK_INTERVAL=600 DESCRIPTION=CRS application for Instance FAILOVER_DELAY=0 FAILURE_INTERVAL=0 FAILURE_THRESHOLD=0 HOSTING_MEMBERS=sisusz OPTIONAL_RESOURCES= PLACEMENT=restricted REQUIRED_RESOURCES= RESTART_ATTEMPTS=5 SCRIPT_TIMEOUT=600 START_TIMEOUT=0 STOP_TIMEOUT=0 UPTIME_THRESHOLD=7d USR_ORA_ALERT_NAME= USR_ORA_CHECK_TIMEOUT=0 USR_ORA_CONNECT_STR=/ as sysdba USR_ORA_DEBUG=0 USR_ORA_DISCONNECT=false USR_ORA_FLAGS= USR_ORA_IF= USR_ORA_INST_NOT_SHUTDOWN= USR_ORA_LANG= USR_ORA_NETMASK= USR_ORA_OPEN_MODE= USR_ORA_OPI=false USR_ORA_PFILE= USR_ORA_PRECONNECT=none USR_ORA_SRV= USR_ORA_START_TIMEOUT=0 USR_ORA_STOP_MODE=immediate USR_ORA_STOP_TIMEOUT=0 USR_ORA_VIP=
我们看到,required_resources这一项为空。 要重新添加这种依赖关系的命令如下: srvctl modify instance -d -i -s

Recent Comments