Grid Control 10.1.0.3 on SUSE LINUX Enterprise Server 9
 

The Grid Control installation is not trivial if you are seeking a customization. For this reason I'm not too eager to write this documentation since it will be incomplete and won't cover the infinite number of issues you can meet.

Let's start enumerating wht you need before installing.

In my installation I wished a 10g repository (a database where the grid control metadata are stored) using ASM (a sort of volume manager based on a special oracle instance). Since I didn't have spare hardware I had to install the grid control on the same machine of the repository (most installation are performed in this way).
I warn you: resource consumption is high! Use, at least, 2 processors machines.

!!!You don't need gcc_old!!!

The installation of the 10g repository is described here . The document describe how to prepare your system for an oracle installation. It is almost enough even for the grid control. You need two other steps:

  • Install the package db1 which contains libdb.so.2 or create a link beetween libdb.so.3 and libdb.so.2 (ln -s libdb.so.3 libdb.so.2) in /usr/lib; otherwise you will get this error:
      error while loading shared libraries: libdb.so.2: cannot open shared object file: No such file or directory
  • comment all the ipv6 references in /etc/hosts;
    commenting this line should be enough:
      • #::1             localhost ipv6-localhost ipv6-loopback
  • Otherwise you are going to meet the usual error:

  •  
      <MSG_TEXT>[TM] TaskMaster sysInit failed for /u01/app/oracle/product/oem10g</MSG_TEXT>
           <SUPPL_DETAIL><![CDATA[java.net.ConnectException: Connection refused
               at java.net.PlainSocketImpl.socketConnect(Native Method)
               at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:305)
               at
      java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:171)
               at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:158)
               at java.net.Socket.connect(Socket.java:452)
               at java.net.Socket.connect(Socket.java:402)
               at java.net.Socket.<init>(Socket.java:309)
               at java.net.Socket.<init>(Socket.java:124)

Now, since, I had a previous DB configuration and a well set environment for my 10 DB I decided to create a new user for the grid installation (user called grid).
 

linux: # useradd -m -u 62 -G disk,dba,oinstall grid
linux: # cp /etc/profile.d/oracle.sh /etc/profile.d/grid.sh
linux: # mkdir -p /u01/app/oracle/product/oem10g
linux: # chown grid:oinstall /u01/app/oracle/product/oem10g

 

Set the right env in /etc/profile.d/grid.sh (usual ORACLE_HOME, ORACLE_BASE, LD_LIBRARY_PATH, etc).
I suggest, for the grid user, to have the following PATH variable:

PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/dcm/bin:$ORACLE_HOME/Apache/Apache/bin

Change the line:

if [ `id -un` == "oracle" ]; then

in

if [ `id -un` == "grid" ]; then

Your repository database needs a couple of parameters to be set:

aq_tm_processes greater than 0.
session_cached_cursors at least 300
unset all dispatcher parameters

Disable the statistics for you 10g DB:
dbms_scheduler.disable(’GATHER_STATS_JOB’);

Before starting make sure your machine has a real DNS names!!
(Use even the command hostaname to check if the name is correct).

I need to explain a further step (always because my customization love). I was worried to make errors during the installtion and to corrupt the repository oraInventory. So I changed the location with a special parameter of the runInstaller.
Since I have two different oraInvetory (the grid is not in the default location as specified in /etc/oraInst.loc) I have to remember that everytime I apply a patch to the grid controll I must specify the correct position!
You can even use the default orainventory giving permission to the user grid to write there (chmod -R g+w <orainventory location>). In this case don't specify invPtrLoc.

Now insert the first CD, mount it and run the installer:
 

  • mount /dev/cdrom (if you didn't insert oracle in the cdrom group then the command ahs to be issued by root).
  • if you are in a remote machine make sure your X server is running and export the DISPLAY: export DISPLAY=<your local IP>:0.0;
  • /media/cdrom/runInstaller -invPtrLoc /home/grid/oraInst.loc

(Always specify invPtrLoc if patching your installation).

If you get an error on the memory check (and you are sure the installer made a mistake) you can add -ignoreSysPreReqs -ignorePreReq at the runInstaller line (metalink note #285303.1).

Now the easiest part:

Specify a correct oraInventory.

An error I got.

Set were to install the grid engine.

If you wish to create a new databse for the repository all is smooth and easy!!

Here a big issue: I tried to use the ASM semantic... without being able to do it.
I withdrew using a normal filesystem and moving all the objects later (I'll explain it below).

A useful configuration. Remember the sysman password since it is the administrator user!

I got several problems with the proxy and grid 10.1.0.2 so this time I didn't set it!

I need to wait several minutes here.
This is the place where you can get most of your trouble.
Report to the Oracle support the error showd in the grey box.

READY!!!
(Almost).

Use you prefered browser to connect to: http://machinename:7780/em
7780 is the default port for the Grid Control.
Use the sysman user (the password is the one you choose during the installation).

You can even reenable the GATHER_STATS_JOB in your DB.

For administering the grid control components you need to use special command. This is what I have in a script for starting and stoppin the grid automatically (need to be launched by the grid user):

#!/bin/bash

        . /etc/profile.d/grid.sh

        $ORACLE_HOME/bin/emctl $1 agent
        $ORACLE_HOME/opmn/bin/opmnctl $1all
#       $ORACLE_HOME/agent/bin/emctl $1 iasconsole
#       $ORACLE_HOME/agent/bin/emctl $1 oms
#       $ORACLE_HOME/agent/bin/emctl $1 iasconsole
        $ORACLE_HOME/agent/bin/emctl $1 agent
        $ORACLE_HOME/opmn/bin/opmnctl $1proc ias-component=dcm-daemon
        $ORACLE_HOME/opmn/bin/opmnctl $1proc ias-component=LogLoader

The commented one, as well as the second start for the agent, shouldn't be necessary.

About dbconsole

The component dbconsole is not necessary for grid console. On the contrary, usually it is used when you don't have a centralized enterprise manager to administer your Databases.
Before being able to start it you need to configure you 10g database (it only works on 10g and later databases!) using dbca.
If you are installing the grid control using the default database provided with the "basic installation" don't bother trying to start dbconsole. Why? Because the default repository DB is 9.0.1.5 and not a 10g!
Remember: you need a dbconsole started for every instance on your machine. Bind every dbconsole on a different port or they won't start at all!

Moving the objects inside ASM

This step was really complicated and took more than a couple of hours!
Now I would do it using RMAN but I'll describe my involuted procedure anyway.

I closed all grid control components,
exported the whole DB (with sys),
dropped the SYMAN user and recreated the two enterprise manager tablespaces inside the ASM (MGMT_TABLESPACE, MGMT_ECM_DEPOT_TS).
Since I don't like the easy way I even created two tablespaces for the indexes.
Now I imported the DB structures only (imp with rows=n and ignore=y).
Then I moved the objects in the right tablespaces (indexes in their tablespaces), disabled the sysman triggers and constraints.
Imported anly the sysman schema (fromuser=sysman touser=sysman) and started reanabling everything.
I needed several try recompilying objects (start from types and type bodies).

I have to give to the sysman user a couple of execute permission on dbms_lock, dbms_redefinition.
I believe this is needed even without moving the objects!!

High CPU utilization and 'missing form factor'

Let's say that I found the grid control a greed CPU user. I always had a high load average (sometimes higher than 2 on a single CPU system) and a slow system.
I took it as "normal for this product".

Then I found my filesystem full of log files, particularly the copies of ons.log.
What was inside? A log list of "missing form factor" messages.

Making a search on metalink I discovered that a port conflict between the IAS ons (and the grid control is based on a IAS) and my 10g listener (which is a ons client). What to do then?

I went into the $ORACLE_HOME/opmn/conf of my database (not the one of the grid control or the IAS) and renamed the ons.conf and ons.config in ons.conf.orig and ons.config.orig. Then I restarted the listener.

The result?
No more messages in the ons.log and the CPU usage dropped from almost 100% to... nothing.

I have now a low load avarage and a happy system!
 

Contact information:
fabrizio.magni _at_ gmail.com