Exalogic connection to Exadata with SDP Protocol over Virtual Infiniband IPs

Last week,Some developers ask me how we can connect our Weblogic(in the exalogic) server to exadata,I quickly read the ACS document.
Summaries in briefly,

First of all,you need to define a new LISTENER which is named LISTENER_IB at the Exadata Database server.

[oracle@sba5db01 ~]$ srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
Home: <CRS home>
End points: TCP:1521
Name: LISTENER_IB
Network: 2, Owner: oracle
Home: <CRS home>
End points: TCP:1522/SDP:1522

This listener has to listen our service,remember that one service can register more than one listener.

LSNRCTL> status LISTENER_IB
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                10-APR-2012 12:06:25
Uptime                    2 days 10 hr. 30 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.2/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0.2/grid/log/diag/tnslsnr/sba5db01/listener_ib/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=sdp)(HOST=192.168.10.111)(PORT=1522)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.10.111)(PORT=1522)))
Services Summary...
Service "SERVICE_NAME.domain.lokal" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSER_ORA.AV$SRC_QUEUE_44.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
Service "SYS$SRCUSR_RAC.AV$SRC_QUEUE_21.SERVICE_NAME.domain.LOKAL" has 1 instance(s).
Instance "SERVICE_NAME1", status READY, has 1 handler(s) for this service...
The command completed successfully
LSNRCTL>

This IP which is listened from port 1522 is a virtual IB IPs.

In the hosts files.
### CELL Node Private Interface details
192.168.10.97   sba5cel01-priv.domain.lokal     sba5cel01-priv
192.168.10.98   sba5cel02-priv.domain.lokal     sba5cel02-priv
192.168.10.99   sba5cel03-priv.domain.lokal     sba5cel03-priv
192.168.10.100  sba5cel04-priv.domain.lokal     sba5cel04-priv
192.168.10.101  sba5cel05-priv.domain.lokal     sba5cel05-priv
192.168.10.102  sba5cel06-priv.domain.lokal     sba5cel06-priv
192.168.10.103  sba5cel07-priv.domain.lokal     sba5cel07-priv
192.168.10.104  sba5cel08-priv.domain.lokal     sba5cel08-priv
192.168.10.105  sba5cel09-priv.domain.lokal     sba5cel09-priv
192.168.10.106  sba5cel10-priv.domain.lokal     sba5cel10-priv
192.168.10.107  sba5cel11-priv.domain.lokal     sba5cel11-priv
192.168.10.108  sba5cel12-priv.domain.lokal     sba5cel12-priv
192.168.10.109  sba5cel13-priv.domain.lokal     sba5cel13-priv
192.168.10.110  sba5cel14-priv.domain.lokal     sba5cel14-priv

### SDP IB IPs
192.168.10.111  sba5db01-ibvip.domain.lokal     sba5db01-ibvip
192.168.10.112  sba5db02-ibvip.domain.lokal     sba5db02-ibvip
192.168.10.113  sba5db03-ibvip.domain.lokal     sba5db03-ibvip
192.168.10.114  sba5db04-ibvip.domain.lokal     sba5db04-ibvip
192.168.10.115  sba5db05-ibvip.domain.lokal     sba5db05-ibvip
192.168.10.116  sba5db06-ibvip.domain.lokal     sba5db06-ibvip
192.168.10.117  sba5db07-ibvip.domain.lokal     sba5db07-ibvip
192.168.10.118  sba5db08-ibvip.domain.lokal     sba5db08-ibvip

At the database side,look inside the parameters files,

listener_networks='((NAME=network2) (LOCAL_LISTENER=LISTENER_IBLOCAL)(REMOTE_LISTENER=LISTENER_IBREMOTE))’,'((NAME=network1)(LOCAL_LISTENER=LISTENER_IPLOCAL)(REMOTE_LISTENER=LISTENER_IPREMOTE))’

These 4 parameters LISTENER_IBLOCAL,LISTENER_IBREMOTE,LISTENER_IPLOCAL,LISTENER_IPREMOTE have to defined in the TNSNAMES.ORA file (at the dbhome not grid home)

LISTENER_IBREMOTE =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db02-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db03-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db04-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db05-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db06-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db07-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db08-ibvip.domain.lokal)(PORT = 1522))
)
)

LISTENER_IBLOCAL =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))
(ADDRESS = (PROTOCOL = SDP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1522))
)
)

LISTENER_IPLOCAL =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5db01-ibvip.domain.lokal)(PORT = 1521))
)
)

LISTENER_IPREMOTE =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = sba5-scan)(PORT = 1521))
)
)

And finally your connection string like this:

SDP connection
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db01-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db02-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db03-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db04-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db05-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db06-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db07-ibvip.domain.lokal)(PORT=1522))
(ADDRESS=(PROTOCOL=SDP)(HOST=sba5db08-ibvip.domain.lokal)(PORT=1522)))
(CONNECT_DATA=(SERVICE_NAME=SERVICE_NAME.domain.lokal)))

ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file]

Last weekend,I have successfully upgraded the quarter exadata software version from 11.2.1.2.6 to 11.2.2.4.2
Before upgrade:

#imagehistory
Version : 11.2.1.2.6
Image activation date : 2010-07-19 15:01:17 +0300
Imaging mode : patch
Imaging status : success

After upgrade:
Active image version: 11.2.2.4.2.111221
Active image activated: 2012-04-07 12:52:12 +0300
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

But on the cell side,celld was not able to start successfully.
MS and RS service were started,but cellsrv was not.

# service celld status/restart
Getting the state of RS services…
running
Starting CELLSRV services…
The STARTUP of CELLSRV services was not successful. Error: Start Failed
By the way in the log file

Incident 9 created, dump file: /opt/oracle/cell11.2.2.4.2_LINUX.X64_111221/log/diag/asm/cell/exa1cel03/incident/incdir_9/svtrc_9688_0_i9.trc
ORA-00700: soft internal error, arguments: [main_6a], [3], [Invalid IP addresses in cellinit.ora file], [], [], [], [], [], [], [], [], []

This ORA-00700 error also was emailed by each cell storage server.

I checked the each cell

CellCLI>ALTER CELL VALIDATE CONFIGURATION

CELL-02653: Cell configuration check encountered the following issues:
Check Exadata configuration via ipconf utility
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0(192.168.6.73) : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED
[INFO] The ipconf check may generate a failure for temporary inability to reach NTP or DNS server. You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] You may ignore this alert, if the NTP or DNS servers are valid and available.
[INFO] As root user run /usr/local/bin/ipconf -verify -semantic to verify consistent network configurations.

[root@exa1cel02 config]# /usr/local/bin/ipconf -verify -semantic
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Loopback rds ping for bondib0(192.168.6.73) : FAILED
Error. Overall status of verification of Exadata configuration file: FAILED

Something was wrong with cell configuration.
I checked cell IPs both cellinit.ora and ifconfig -a
There was no inconsistency wth IPs.

I suspected IB switch didnt work properly.

When I verified the IB switches topology.

</pre>
./verify-topology -t quarterrack

[ DB Machine Infiniband Cabling Topology Verification Tool ]
 [Version 11.2.2.4.2 ]

--------------- Quarter Rack Exadata V2 Cabling Check---------

Check if all hosts have 2 CAs to different switches..................[SUCCESS]
 Leaf switch check: cardinality and even distribution.................[SUCCESS]
 Check if each rack has an valid internal ring........................[SUCCESS]
<pre>

Everything seems ok.

So where is the problem,why the cells raised Invalid IP addresses error ?

Before upgrade:
version of the ibswitch software is :

# nm2version
NM2-36p version: 1.0.1-1
Build time: Sep 14 2009 12:52:51
ComExpress info:
Manufacturing Date: 2009.05.05
Serial Number: “NCD3X0178”
Hardware Revision: 0x0006
Firmware Revision: 0x0102

But
Users of FW version 1.0.1 will need to upgrade to 1.1.3 or 1.1.4 before upgrading to 1.3.3.
So I first upgraded 1.1.3 and then 1.3.3 successfully.

when I checked the master of the switch.

root@exa1sw-ib2 ~]# getmaster</pre>
Local SM not enabled
 20120407 20:41:52 No Master SubnetManager seen in the system
<pre>

The problem was here.

So I reconfigured the SM configuration with 2 IB switches
root@exa1sw-ib2 ~]# setsmpriority 0

Current SM settings:
smpriority 0
controlled_handover TRUE
subnet_prefix 0xfe80000000000000
[root@exa1sw-ib2 ~]#
[root@exa1sw-ib2 ~]# disablesm
Stopping partitiond daemon.
/usr/local/util/partitiond is already stopped
Stopping IB Subnet Manager.[FAILED]
[root@exa1sw-ib2 ~]# enableasm
-bash: enableasm: command not found
[root@exa1sw-ib2 ~]# enablesm
Starting IB Subnet Manager.[ OK ]
Starting partitiond daemon.[ OK ]
[root@exa1sw-ib2 ~]# getmaster
Local SM enabled and running
20120407 21:44:19 Master SubnetManager on sm lid 0 sm guid 0x2128469ea1a0a0 :
[root@exa1sw-ib2 ~]#

This solved the problem.

CellCLI> alter cell validate configuration;
Cell exa1cel02 successfully altered

CellCLI> alter cell validate configuration;
Cell exa1cel01 successfully altered

Unfourtunaly this was a bug
New Bug 13937466 – ORA-00700: SOFT INTERNAL ERROR, ARGUMENTS: [MAIN_6A], [3], [INVALID IP ADDRESSES has been created for the issue.

and also Doc ID 1341062.1 helps me.

ugurcan