20 January 2012

RAC in Containers


Virtualizing Oracle RAC with Solaris Containers


Oracle Solaris Containers


An integral part of the Oracle Solaris 10 Operating System, Oracle Solaris Containers isolate
software applications and services using flexible, software-defined boundaries and allow many private execution environments to be created within a single instance of Oracle Solaris 10. Each environment has its own identity, separate from the underlying hardware. Each behaves as if it is running on its own operating system making consolidation simple, safe, and secure. Containers can all share CPU resources, can each have dedicated CPU resources, or can each specify a guaranteed minimum amount of resources as well as a maximum. Memory can be shared among all Oracle Solaris Containers, or each can have a specified memory cap. Physical I/O resources such as disk and network can be dedicated to individual Oracle Solaris Containers, shared by some, or shared by all. Regardless of what is shared or dedicated, each virtualized environment will have isolated access to local file system and networking, as well as system and user processes. Ideal for environments, that consolidates a number of applications on a single server. The cost and complexity of managing numerous machines make it advantageous to consolidate several applications on larger, more scalable servers. Enable more efficient resource utilization on your system. Dynamic resource reallocation permits unused resources to be shifted to other Oracle Solaris Containers as needed. Fault and security isolation mean that poorly behaved applications do not require a dedicated and under-utilized system. There are two ways to create Oracle Solaris Containers. One of them is 'sparse root' Container, where the root file system of the Container is mounted as read-only from the global zone, occupies less space on the file system and its quick to create. Second one is 'whole root' Container, where the root file system is mounted in read-write mode, all the packages required for Container are installed inside it. The whole root Container is like a typical system. There are two types of networking available for Containers. One of which is 'shared-IP' where the NIC is shared with the global zone, with the shared-IP can not plumb IP address within Container, only at the time of booting configured IPs of Container are brought up. For 'exclusive-IP' type of network a dedicated NIC is assigned and the IP to this NIC can be configured within a Container.

Oracle RAC database

Oracle RAC database is a RDBMS database, which is highly available and scalable. It can dynamically adjust to dynamic resource changes in the environment. Oracle RAC's data files are hosted on shared storage, be it NAS or SAN based storage connected to all the available systems. It's recommended to use low latency high throughput private interconnect for Oracle RAC inter node communication. Minimum 2 physical systems are required to provide high availability, to mitigate failure of one node. However, for increased scalability and availability add additional physical server or node.

To host Oracle RAC in Oracle Solaris Containers environment, following points to be considered,
 • Provision one Oracle Solaris Container per node or server.
       o By this high availability is achieved per instance or node level, as if one node
          goes down the other instance is available from other node and not the same.
       o Consolidate multiple such Oracle RAC databases within one set of physical
          servers.
       o In addition to that Oracle Solaris Container is being highly secure and isolated
          environment, these environments could be assigned as independent nodes to
          different DBAs managing different Oracle RAC database installations with the
          Oracle Solaris Container root access.
 • Name space isolation offers, configuration of different Containers of different Oracle
    RAC database to be in different time zones on a given server, however, it's practiced
    that the same time zone is configured across all the other nodes hosting that particular
    Oracle RAC database.
         o Name space isolation offers, to configure these contained environments as
            independent systems by configuring them to different name server, be it DNS
            or NIS etc.
         o Independent process tree per Container offer process level isolation, there by
            a processes running in one Container will not have any impact on the other
            Container in the event of a crash and reboot by that process on that
            environment reboots other Containers are untouched.
  • To host Oracle RAC database binaries use ZFS file system. Or by creating Containers
     environment on ZFS and host database binaries on it. Gives great advantages like:
         o Faster deployment of Oracle binaries and setting up of the Container
            environment by cloning the ZFS file system hosting it.
         o Dynamically increase the file system size transparently, when the file system
            is near full.
    • To host the Oracle RAC database data files leverage ASM, as it offers
          o Greater availability of the LUNs achieved by using redundancy. ASM does
             mirror disks or LUNs across controllers there by the availability increases.
          o Most optimum way to retrieve and store the data, as Oracle has better
             understanding of data.
   • Flexibility to change the CPU resources based on the requirement could be achieved
     by the host administrator, to accommodate workload needs.

Deploy Oracle RAC without Virtualization

To host Oracle RAC without virtualization, following is considered,

Need minimum 2 to 3 nodes for high availability, Shared storage to host data files,
  and the required private and public networking for this Oracle Cluster environment.
However there is no scope of hosting other version or different patch level of Oracle
  RAC on the
  same set of systems.
Though the over-all CPU and other resource utilization might be just 20 to 25%. To host
  another
  setup with a different version another set of systems needs to be configured, installed
  and managed.

The below drawing 1 shows that,
There are 4 * Sun SPARC T5220 servers, configured in the RAC environment, hosts
  Oracle 10gR2 RAC database with 2 different databases on global zone or host’s
  operating system environment.
There are 2 'Sun StorageTek 6140' array with redundant controllers. Each redundant
  controller is connected to redundant HBAs on the systems. On the system, multipath IO
  (MPxIO) is used to
  provide high availability of connectivity to the LUNs on the disk array.
There are 2 color codes used to show 2 different controllers,
From each storage controller, say storage1, cables would be connected to port1 on
  slot4 and slot5. Same with storage2 however port02 is used.
MPxIO enables HA among slot4 and slot5 and one disk name is provided instead of
  two.
Oracle Automatic Storage Management (ASM) normal disk group is used for normal
  redundancy among 2 LUNs of port1 and port2, it provides HA among the failure of
  entire storage array.



·      Public network cables are connected to switch1 and switch2 from e1000g0 and e1000g1, an IPMP group ipmp0 provides HA for the public network to hosts IP and VIP addresses.
·      These switches are connected to the public network, in the lab environment they can have an uplink port configured
·       Private network cables are connected to switch3 and switch4 from e1000g2 and e1000g3, an IPMP group ipmp1 provides HA for the private network on the host, and the private IP address is configured and brought up as the host comes up, and the ipmp1 group also comes-up along with that.
·       Oracle RAC leverages this IP address as its private IP address, when the Oracle clusterware comes up.
·       These switches have an up-link port configured as trunk port to pass the traffic in case of a failure of one of the switches components the link fails-over transparently to other switch.


Oracle RAC Deployment with Virtualization

The virtualized environment is Oracle Solaris Containers, a built-in technology as explained earlier. To deploy Oracle RAC inside Oracle Solaris Containers extend the learning from the previous chapter as the high level deployment inside this virtual environment is seamless. Let us consider the following while deploying Oracle RAC inside Oracle Solaris Containers.

The below drawing 2 is a typical deployment scenario, that shows,
There are 4 nodes, hosting 2 Container environments, running 2 different versions
  of Oracle RAC database 10gR2 and 11gR1 on 4 different Oracle clusterware inside
  Containers.
This shows that one container is created per node for a given Oracle Clusterware
  environment, there by HA of instances is achieved.
At the same time, other container environment on a given failed node also goes
 down, will not have major impact as only one Container or virtual node per cluster is
 not available.
These cont10g01 to cont10g04, and cont11g01 to cont11g04, are hosted on physical
  nodes port01 to port04 respectively.
Two cores are assigned (16 threads) per Container.
The public and private networking shares the same set of hardware NICs among 2
  Oracle Clusterware environments without impacting each other by using VLAN tagged
  NICs. Since Oracle clusterware plumbs the VIP, exclusive-IP type Containers are
  created and VLAN tagged NICs are assigned instead of physical NICs.
NICs e1000g0 and e1000g1 are connected to switch1 and switch2.
    o Ports of these switches are configured as trunk ports to allow VLAN traffic with
       VLAN tags 131,132 for Oracle 10gR2 and Oracle 11gR1 environments.
    o Are public NICs, VIP is hosted on it. Plumbed by vip service of Oracle clusterware.
    o For example, inside cont10g01 Container these NICs are e1000g131000 and
       e1000g131001, an IPMP group ipmp0 is created to provide HA at this network layer.
NICs e1000g2 and e1000g3 are connected to switch3 and switch4.
     o Ports of these switches are configured as trunk ports to allow VLAN traffic with
        VLAN tags 111,112,113 for Oracle 10gR2, Oracle 11gR1, and Oracle 11gR2
        environments.
     o Private IPs are configured by Container and Oracle clusterware.
     o For example, inside cont10g01 Container these NICs are e1000g111002 and
        e1000g111003, an IPMP group ipmp1 is created to provide HA at this network
        layer.
For simplification, same IPMP group names could be used across all the Container
  environments of a given Clusterware.

A set of dedicated Storage LUNs are assigned per Oracle clusterware environment, and the same set
  of LUNs are configured for other Containers across all the nodes. Physical connectivity is same as
  explained in the previous chapter.





Conclusion

Oracle Solaris Containers is the best choice for consolidating various Oracle RAC
 databases with different versions or patch levels. And could be hosted on the same
 operating system kernel.
Oracle Solaris Containers enable the end users to manage resources well, pool the
  resources from underutilized Containers and provision the same to the required
  Containers.
In addition to that the accounting feature of SRM could be leveraged to bill the end
  users based on the CPUs consumed, probably a grid environment could leverage
  this feature for appropriate billing based on resource utilization.
Dynamic resource management like memory and CPU changes would offer greater
  flexibility.
VLAN tagged NICs overcome the hard limitation of physical NICs on a server. To
  support this, network switches also posses this feature.
IPMP takes care of availability at the network stack, apart from Oracle RAC's monitoring
  and timeouts to detect network failure.
MPxIO offers seamless availability by offering single disk name for different paths of the
  same disk/LUN. It manages the availability of different paths of LUNs/disks from
  storage arrays to hosts.
ASM offers the benefits of a cluster volume manager where disks/LUNs from different
  storage arrays could be mirrored to offer HA among two different storage arrays.



Sample Container configuration

Hardware:
Sun SPARC Enterprise T5220 (M-series or x86 based system could also be used to host Container virtual environment).

4 * Sun SPARC Enterprise T5220 server each configured with 1.6GHz 8 cores or 64 threads, and

64GB RAM.

Storage:
Sun StorageTek 6140 Array with dual controller.

OS:
Solaris10 10/09 SPARC with kernel patch 142900-14 and it's dependent patch 143055-01

Oracle database:
Oracle RAC 10gR2 10.2.0.4 with latest patch set 9352164
Oracle RAC 11gR1 11.1.0.7 with latest patch set 9207257 + 9352179

Configuration files

Host system configurations files:

1. Limit ZFS cache to 1GB
edit /etc/system # add these lines

 * set this value for limiting zfs cache to 1G
set zfs:zfs_arc_max = 0x3E800000

2. To override the system wide limit of 1/4 of physical memory by default on S10, the
    shminfo_shmmax tunable would need to be configured in /etc/system to remove that
    limit. Edit /etc/system and set the value of shminfo_shmmax to the value that suites
    the requirement.

 * set the max shared memory to 24G
set shmminfo_shmmax = 0x600000000

3. Enable MPxIO

edit /kernel/drv/fp.conf # the following entry

 mpxio-disable="no";

If the entry is present, make sure it's set as mentioned above
to enable the MPxIO or add the entry.

For all the /etc/system values to take affect reboot the node.

Inside Containers configuration files:

1. Oracle shared memory configuration using SRM facility /etc/project.


root@cont11g01:/# cat /etc/project
system:0::::
user.root:1::::
noproject:2::::
default:3::::
group.staff:10::::
user.oracle:100:Oracle project:::process.max-sem-nsems=(privileged,4096,deny);project.max-shmmemory=(
privileged,16106127360,deny)



2. IPMP configuration

a. Public network

root@cont10g01:/# cat /etc/hostname.e1000g131000
cont10g01 group pub_ipmp0
root@cont10g01:/# cat /etc/hostname.e1000g131001
group pub_ipmp0 up
root@cont10g01:/#

b. Private network

 root@cont10g01:/# cat /etc/hostname.e1000g111002
cont10g01-priv group priv_ipmp0
root@cont10g01:/# cat /etc/hostname.e1000g111003
group priv_ipmp0 up
root@cont10g01:/#

Container configuration file:

Create a file with the following content to create the Containers.
#save the below content as:
 config_template_to_create_cont10g01.cfg


 create -b
set zonepath=/zonespool/cont10g01
set autoboot=false
set limitpriv=default,proc_priocntl,proc_lock_memory
set scheduling-class=TS,RT,FX
set ip-type=exclusive
add net
set physical=e1000g111002
end
add net
set physical=e1000g111003
end
add net
set physical=e1000g131000
end
add net
set physical=e1000g131001
end
add capped-memory
set physical=24G
end
add dedicated-cpu
set ncpus=16
end
add rctl
set name=zone.max-swap
add value (priv=privileged,limit=25769803776,action=deny)
end
add rctl
set name=zone.max-locked-memory
add value (priv=privileged,limit=12884901888,action=deny)
end
add device
set match=/dev/rdsk/c5t600A0B800011FC3E00000E074BBE32EAd0s6
end
add device
set match=/dev/dsk/c5t600A0B800011FC3E00000E074BBE32EAd0s6
end
add device
set match=/dev/rdsk/c5t600A0B800011FC3E00000E194BBE3514d0s6
end
add device
set match=/dev/dsk/c5t600A0B800011FC3E00000E194BBE3514d0s6
end
add device
set match=/dev/rdsk/c5t600A0B800011FC3E00000E234BBE720Cd0s6
end
add device
set match=/dev/dsk/c5t600A0B800011FC3E00000E234BBE720Cd0s6
end

# Copy paste the above content, change the disk paths and NIC names to suite your configuration.

 root@port01:~/# zonecfg -z cont10g01 -f config_template_to_create_cont10g01.cfg

# This will create the zone with the name “cont10g01”.

 root@port01:~/# zoneadm -z cont10g01 install

#Installation is complete, boot the Container and configure it.

 root@port01:~/# zoneadm -z cont10g01 boot

#Login at the Container console to configure it for the first time, set the root passwd, configure networking, configure time zone, etc. And zone/Container will reboot.

 root@port01:~/# zlogin –C cont10g01

# Follow the steps on the screen to configure the zone.



Share:

0 reacties:

Post a Comment