Exalogic connection to Exadata with SDP Protocol over Virtual Infiniband IPs

Last week, some developers asked me how we can connect our Weblogic (in the exalogic) server to exadata. I quickly read the ACS document.
In summary:

First of all,you need to define a new LISTENER which is named LISTENER_IB at the Exadata Database server.

[oracle@sba5db01 ~]$ srvctl config listener
Network: 1, Owner: oracle
Home: <CRS home>
End points: TCP:1521
Network: 2, Owner: oracle
Home: <CRS home>
End points: TCP:1522/SDP:1522

This listener has to listen our service,remember that one service can register more than one listener.

Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version - Production
Start Date                10-APR-2012 12:06:25
Uptime                    2 days 10 hr. 30 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/
Listener Log File         /u01/app/
Listening Endpoints Summary...
Services Summary...
Service "SERVICE_NAME.domain.lokal" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSER_ORA.AV$SRC_QUEUE_44.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSR_RAC.AV$SRC_QUEUE_21.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
The command completed successfully

This IP which is listened from port 1522 is a virtual IB IPs.

In the hosts file:
### CELL Node Private Interface details   sba5cel01-priv.domain.lokal     sba5cel01-priv   sba5cel02-priv.domain.lokal     sba5cel02-priv   sba5cel03-priv.domain.lokal     sba5cel03-priv  sba5cel04-priv.domain.lokal     sba5cel04-priv  sba5cel05-priv.domain.lokal     sba5cel05-priv  sba5cel06-priv.domain.lokal     sba5cel06-priv  sba5cel07-priv.domain.lokal     sba5cel07-priv  sba5cel08-priv.domain.lokal     sba5cel08-priv  sba5cel09-priv.domain.lokal     sba5cel09-priv  sba5cel10-priv.domain.lokal     sba5cel10-priv  sba5cel11-priv.domain.lokal     sba5cel11-priv  sba5cel12-priv.domain.lokal     sba5cel12-priv  sba5cel13-priv.domain.lokal     sba5cel13-priv  sba5cel14-priv.domain.lokal     sba5cel14-priv

### SDP IB IPs  sba5db01-ibvip.domain.lokal     sba5db01-ibvip  sba5db02-ibvip.domain.lokal     sba5db02-ibvip  sba5db03-ibvip.domain.lokal     sba5db03-ibvip  sba5db04-ibvip.domain.lokal     sba5db04-ibvip  sba5db05-ibvip.domain.lokal     sba5db05-ibvip  sba5db06-ibvip.domain.lokal     sba5db06-ibvip  sba5db07-ibvip.domain.lokal     sba5db07-ibvip  sba5db08-ibvip.domain.lokal     sba5db08-ibvip

At the database side, look inside the parameters files:


These 4 parameters LISTENER_IBLOCAL,LISTENER_IBREMOTE,LISTENER_IPLOCAL,LISTENER_IPREMOTE have to defined in the TNSNAMES.ORA file (at the dbhome not grid home)

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db02-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db03-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db04-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db05-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db06-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db07-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db08-ibvip.domain.lokal)(PORT = 1522))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = SDP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = sba5-scan)(PORT = 1521))

And finally your connection string like this:

SDP connection


ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file]

Last weekend,I have successfully upgraded the quarter exadata software version from to
Before upgrade:

Version :
Image activation date : 2010-07-19 15:01:17 +0300
Imaging mode : patch
Imaging status : success

After upgrade:
Active image version:
Active image activated: 2012-04-07 12:52:12 +0300
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

But on the cell side,celld was not able to start successfully.
MS and RS service were started,but cellsrv was not.

# service celld status/restart
Getting the state of RS services…
Starting CELLSRV services…
The STARTUP of CELLSRV services was not successful. Error: Start Failed
In the log file:

Incident 9 created, dump file: /opt/oracle/cell11.
ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file], [], [], [], [], [], [], [], [], []

This ORA-00700 error also was emailed by each cell storage server.

I checked each cell:


CELL-02653: Cell configuration check encountered the following issues:
Check Exadata configuration via ipconf utility
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0( : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED
[INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations.

[root@exa1cel02 config]# /usr/local/bin/ipconf -verify -semantic
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0( : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED

Something was wrong with cell configuration.
I checked cell IPs both cellinit.ora and ifconfig -a
There was no inconsistency wth IPs.

I suspected IB switch didnt work properly.

When I verified the IB switches topology.

./verify-topology -t quarterrack

[ DB Machine Infiniband Cabling Topology Verification Tool ]
 [Version ]

--------------- Quarter Rack Exadata V2 Cabling Check---------

Check if all hosts have 2 CAs to different switches..................[SUCCESS]
 Leaf switch check: cardinality and even distribution.................[SUCCESS]
 Check if each rack has an valid internal ring........................[SUCCESS]

Everything seems ok.

So where is the problem,why the cells raised Invalid IP addresses error ?

Before upgrade:
version of the ibswitch software is :

# nm2version
NM2-36p version: 1.0.1-1
Build time: Sep 14 2009 12:52:51
ComExpress info:
Manufacturing Date: 2009.05.05
Serial Number: “NCD3X0178”
Hardware Revision: 0x0006
Firmware Revision: 0x0102

Users of FW version 1.0.1 will need to upgrade to 1.1.3 or 1.1.4 before upgrading to 1.3.3.
So I first upgraded 1.1.3 and then 1.3.3 successfully.

when I checked the master of the switch.

root@exa1sw-ib2 ~]# getmaster</pre>
Local SM not enabled
 20120407 20:41:52 No Master SubnetManager seen in the system

The problem was here.

So I reconfigured the SM configuration with 2 IB switches
root@exa1sw-ib2 ~]# setsmpriority 0

Current SM settings:
smpriority 0
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
[root@exa1sw-ib2 ~]#
[root@exa1sw-ib2 ~]# disablesm
Stopping partitiond daemon.
/usr/local/util/partitiond is already stopped
Stopping IB Subnet Manager.[FAILED]
[root@exa1sw-ib2 ~]# enableasm
-bash: enableasm: command not found
[root@exa1sw-ib2 ~]# enablesm
Starting IB Subnet Manager.[ OK ]
Starting partitiond daemon.[ OK ]
[root@exa1sw-ib2 ~]# getmaster
Local SM enabled and running
20120407 21:44:19 Master SubnetManager on sm lid 0 sm guid 0x2128469ea1a0a0 :
[root@exa1sw-ib2 ~]#

This solved the problem.

CellCLI> alter cell validate configuration;
Cell exa1cel02 successfully altered

CellCLI> alter cell validate configuration;
Cell exa1cel01 successfully altered

Unfourtunaly this was a bug
New Bug 13937466 – ORA-00700: SOFT INTERNAL ERROR, ARGUMENTS: [MAIN_6A], [3], [INVALID IP ADDRESSES has been created for the issue.

and also Doc ID 1341062.1 helps me.