可以说,这个bug是相当的出名了.

本周近乎都12点以后才睡,本想今晚早点睡的。呵呵,电话来了(众多故事中的一个,可以像曰福一样看众生皆菩萨了),说是一个运行了2年的库,在DSG新版本使用后,做全同步就会把系统搞趴下,直接的宕机了。

先来看看alert:

Sat Jun 13 20:25:39 2009
Errors in file /data/app/oracle/admin/hnwx5/bdump/hnwx5_smon_5488928.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01115: IO error reading block from file 1 (block # 59633)
ORA-01110: data file 1: ‘/data1/hnwx5/system01.dbf’
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 3
Sat Jun 13 20:25:39 2009
Errors in file /data/app/oracle/admin/hnwx5/bdump/hnwx5_ckpt_3711122.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: ‘/data/hnwx5/control03.ctl’
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 22: Invalid argument
Additional information: 8
Sat Jun 13 20:25:39 2009
Errors in file /data/app/oracle/admin/hnwx5/bdump/hnwx5_ckpt_3711122.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: ‘/data/hnwx5/control03.ctl’
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 22: Invalid argument
Additional information: 8
Sat Jun 13 20:25:39 2009
CKPT: terminating instance due to error 221
Termination issued to instance processes. Waiting for the processes to exit
Sat Jun 13 20:25:49 2009
Instance termination failed to kill one or more processes
Sat Jun 13 20:26:18 2009
Instance terminated by CKPT, pid = 3711122
Sat Jun 13 20:27:28 2009
Starting ORACLE instance (normal)
Sat Jun 13 20:27:41 2009
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0

写文件的时候IO入队就出错了。后续的trace文件也没提供更有价值的线索:

*** 2009-06-13 20:25:39.376
*** SERVICE NAME:(SYS$BACKGROUND) 2009-06-13 20:25:39.154
*** SESSION ID:(3297.1) 2009-06-13 20:25:39.154
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: ‘/data/hnwx5/control03.ctl’
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 22: Invalid argument
Additional information: 8
error 221 detected in background process
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: ‘/data/hnwx5/control03.ctl’
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 22: Invalid argument
Additional information: 8
*** 2009-06-13 20:25:49.779
Instance termination failed to kill one or more processes
ksuitm_check: OS PID=4629400 is still alive
*** 2009-06-13 20:25:49.780
Dumping diagnostic information for oracle@query (TNS V1-V3):
OS pid = 4629400
loadavg : 1.10 1.71 1.93
swap info: free_mem = 495.89M rsv = 64.00M
           alloc = 1560.44M avail = 16384.00M swap_free = 14823.56M
       F S      UID     PID    PPID   C PRI NI ADDR    SZ    WCHAN    STIME    TTY  TIME CMD
  250004 Z      dsg 4629400 4063938   1  60 20                                      0:00 <defunct>
open: The file access permissions do not allow the specified action.
procstack: 4629400 is not a process
*** 2009-06-13 20:26:18.632
ksuitm_check: OS PID=5574928 is still alive
*** 2009-06-13 20:26:18.632
Dumping diagnostic information for oracle@query (DBW3):
OS pid = 5574928
loadavg : 1.54 1.78 1.95

跟以前在aix5305上遇到的bug很像,可这里的版本已经是aix5307了:

[hnsec:root:/data/dsg/bin] oslevel -s
5300-07-04-0818
[hnsec:root:/data/dsg/bin]

再搜寻metalink,其实文章提到的,只是以前忽略了:

文档 ID: 443944.1 Is Patch 5496862 applicable on AIX 5.3 TL 06 / TL 07 / TL 08 / TL 09?
文档 ID: 418105.1  Is Patch 5496862 Mandatory for 10.2.0.3?
文档 ID:  390656.1 IO Interoperability Issue between IBM ML05 and Oracle Databases

 

节选部分内容:

For systems running ML06 and above, you only need to install Oracle patch.
For the Oracle fix :

1. Navigate to http://metalink.oracle.com/ and sign into MetaLink
2. Select the "Patches & Updates" tab
3. Select the "Simple Search" link and enter Patch Number 5496862, select Platform AIX Based Systems (64-bit) and desired Oracle version
4. Proceed to download the desired patch.

无它,再打个补丁。

如果有新装AIX上的oracle,建议必打这个patch(10.2.0.4貌似可以不打)