Exalogic connection to Exadata with SDP Protocol over Virtual Infiniband IPs

Last week,Some developers ask me how we can connect our Weblogic(in the exalogic) server to exadata,I quickly read the ACS document.
Summaries in briefly,

First of all,you need to define a new LISTENER which is named LISTENER_IB at the Exadata Database server.

[oracle@sba5db01 ~]$ srvctl config listener
Network: 1, Owner: oracle
Home: <CRS home>
End points: TCP:1521
Network: 2, Owner: oracle
Home: <CRS home>
End points: TCP:1522/SDP:1522

This listener has to listen our service,remember that one service can register more than one listener.

Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version - Production
Start Date                10-APR-2012 12:06:25
Uptime                    2 days 10 hr. 30 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/
Listener Log File         /u01/app/
Listening Endpoints Summary...
Services Summary...
Service "SERVICE_NAME.domain.lokal" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSER_ORA.AV$SRC_QUEUE_44.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSR_RAC.AV$SRC_QUEUE_21.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
The command completed successfully

This IP which is listened from port 1522 is a virtual IB IPs.

In the hosts files.
### CELL Node Private Interface details   sba5cel01-priv.domain.lokal     sba5cel01-priv   sba5cel02-priv.domain.lokal     sba5cel02-priv   sba5cel03-priv.domain.lokal     sba5cel03-priv  sba5cel04-priv.domain.lokal     sba5cel04-priv  sba5cel05-priv.domain.lokal     sba5cel05-priv  sba5cel06-priv.domain.lokal     sba5cel06-priv  sba5cel07-priv.domain.lokal     sba5cel07-priv  sba5cel08-priv.domain.lokal     sba5cel08-priv  sba5cel09-priv.domain.lokal     sba5cel09-priv  sba5cel10-priv.domain.lokal     sba5cel10-priv  sba5cel11-priv.domain.lokal     sba5cel11-priv  sba5cel12-priv.domain.lokal     sba5cel12-priv  sba5cel13-priv.domain.lokal     sba5cel13-priv  sba5cel14-priv.domain.lokal     sba5cel14-priv

### SDP IB IPs  sba5db01-ibvip.domain.lokal     sba5db01-ibvip  sba5db02-ibvip.domain.lokal     sba5db02-ibvip  sba5db03-ibvip.domain.lokal     sba5db03-ibvip  sba5db04-ibvip.domain.lokal     sba5db04-ibvip  sba5db05-ibvip.domain.lokal     sba5db05-ibvip  sba5db06-ibvip.domain.lokal     sba5db06-ibvip  sba5db07-ibvip.domain.lokal     sba5db07-ibvip  sba5db08-ibvip.domain.lokal     sba5db08-ibvip

At the database side,look inside the parameters files,


These 4 parameters LISTENER_IBLOCAL,LISTENER_IBREMOTE,LISTENER_IPLOCAL,LISTENER_IPREMOTE have to defined in the TNSNAMES.ORA file (at the dbhome not grid home)

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db02-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db03-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db04-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db05-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db06-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db07-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db08-ibvip.domain.lokal)(PORT = 1522))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = SDP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5-scan)(PORT = 1521))

And finally your connection string like this:

SDP connection

ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file]

Last weekend,I have successfully upgraded the quarter exadata software version from to
Before upgrade:

Version :
Image activation date : 2010-07-19 15:01:17 +0300
Imaging mode : patch
Imaging status : success

After upgrade:
Active image version:
Active image activated: 2012-04-07 12:52:12 +0300
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

But on the cell side,celld was not able to start successfully.
MS and RS service were started,but cellsrv was not.

# service celld status/restart
Getting the state of RS services…
Starting CELLSRV services…
The STARTUP of CELLSRV services was not successful. Error: Start Failed
By the way in the log file

Incident 9 created, dump file: /opt/oracle/cell11.
ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file], [], [], [], [], [], [], [], [], []

This ORA-00700 error also was emailed by each cell storage server.

I checked the each cell


CELL-02653: Cell configuration check encountered the following issues:
Check Exadata configuration via ipconf utility
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0( : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED
[INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations.

[root@exa1cel02 config]# /usr/local/bin/ipconf -verify -semantic
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0( : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED

Something was wrong with cell configuration.
I checked cell IPs both cellinit.ora and ifconfig -a
There was no inconsistency wth IPs.

I suspected IB switch didnt work properly.

When I verified the IB switches topology.

./verify-topology -t quarterrack

[ DB Machine Infiniband Cabling Topology Verification Tool ]
 [Version ]

--------------- Quarter Rack Exadata V2 Cabling Check---------

Check if all hosts have 2 CAs to different switches..................[SUCCESS]
 Leaf switch check: cardinality and even distribution.................[SUCCESS]
 Check if each rack has an valid internal ring........................[SUCCESS]

Everything seems ok.

So where is the problem,why the cells raised Invalid IP addresses error ?

Before upgrade:
version of the ibswitch software is :

# nm2version
NM2-36p version: 1.0.1-1
Build time: Sep 14 2009 12:52:51
ComExpress info:
Manufacturing Date: 2009.05.05
Serial Number: “NCD3X0178”
Hardware Revision: 0x0006
Firmware Revision: 0x0102

Users of FW version 1.0.1 will need to upgrade to 1.1.3 or 1.1.4 before upgrading to 1.3.3.
So I first upgraded 1.1.3 and then 1.3.3 successfully.

when I checked the master of the switch.

root@exa1sw-ib2 ~]# getmaster</pre>
Local SM not enabled
 20120407 20:41:52 No Master SubnetManager seen in the system

The problem was here.

So I reconfigured the SM configuration with 2 IB switches
root@exa1sw-ib2 ~]# setsmpriority 0

Current SM settings:
smpriority 0
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
[root@exa1sw-ib2 ~]#
[root@exa1sw-ib2 ~]# disablesm
Stopping partitiond daemon.
/usr/local/util/partitiond is already stopped
Stopping IB Subnet Manager.[FAILED]
[root@exa1sw-ib2 ~]# enableasm
-bash: enableasm: command not found
[root@exa1sw-ib2 ~]# enablesm
Starting IB Subnet Manager.[ OK ]
Starting partitiond daemon.[ OK ]
[root@exa1sw-ib2 ~]# getmaster
Local SM enabled and running
20120407 21:44:19 Master SubnetManager on sm lid 0 sm guid 0x2128469ea1a0a0 :
[root@exa1sw-ib2 ~]#

This solved the problem.

CellCLI> alter cell validate configuration;
Cell exa1cel02 successfully altered

CellCLI> alter cell validate configuration;
Cell exa1cel01 successfully altered

Unfourtunaly this was a bug
New Bug 13937466 – ORA-00700: SOFT INTERNAL ERROR, ARGUMENTS: [MAIN_6A], [3], [INVALID IP ADDRESSES has been created for the issue.

and also Doc ID 1341062.1 helps me.