29 July 2011

Why Choose Oracle RAC

Mostly RAC is the preferred choice for enterprises with scalability problems. This means that an installation with one DB Server with the Oracle DB server software and a SAN with the tablespace files and configuration files and redologs is not capable of servicing all users. Neither will connection pooling solve the problem.
That is mostly a scenario where you might consider Oracle RAC with Oracle Clusterware software. This mostly means that a few extra nodes (extra iron) are bought and a few extra Oracle instance are created on the extra servers (see figure)





















But is this always the only reason to choose for a RAC installation?  I have been listening to a presentation by Riyaj Shamsudeen who came up with a few more good reasons to choose for RAC:

 Good reasons
·       Hardware fault tolerance

o   RAC protects from non-shared hardware failures.
o   For example, CPU board failure in a node doesn’t affect other
node availability.
o   But, failure in interconnect hardware can still cause cluster wide
failures, unless interconnect path is fault-tolerant.
o   Path to Storage also must be fault-tolerant.

·       Workload segregation
o   If you are planning to (or able to ) segregate the workload to
different nodes, then RAC may be a good option.
o   Long running reports generating huge amount of I/O (with
higher percent of single block I/O) can pollute the buffer cache,
causing performance issues to critical parts of the application.
o   For example, separation of OLTP and RPT to separate instances.
o   Of course, you should consider Active Data Guard for offloading
Reporting activity.
·       Application affinity
o   Application node affinity is a great way to improve performance
in RAC databases.
o   For example, if you have three applications, say PO, FIN, and SC,
running in the same database, then consider node affinity.
o   Node affinity should also translate in to segment level affinity.
o   Say, if the application PO is accessing mostly *PO* tables and
the application SC accessing mostly *SC* tables, then the node
affinity might be helpful.
·       To manage excessive redo generation
o   Each instance has its own redo thread and LGWR process.
o   If your application generates huge amount of redo, and if single
instance LGWR can not handle the load, RAC might be a
solution to effectively scale up LGWR throughput.
o   For example, by converting to a 3 node cluster and balancing
application workload, you can increase LGWR throughput,
approximately by 3.
o   Still, this doesn’t solve the problem completely, if excessive redo
generation is in conjunction with excessive commits.
·       To avoid SMP bottlenecks
o   Access to memory and System bus becomes a bottleneck, in
SMP architecture.
o   Increasing number of CPUs in an SMP architecture doesn’t scale
linearly.
o   Big Iron machines now uses NUMA architecture to alleviate this
problem.
o   If you can’t afford big iron machines to increase scalability, RAC
is a good option to consider.

 , and not so good reasons are:

 Not-So-Good reasons
·       General Performance improvements
o   However when one uses it in combination with connection pooling
,and uses enough extra nodes with an extra DB instance it will help
you achieve a better performance.
·       To combat poor application design and coding practices
o   There is no hardware architecture one can think of that will improve bad coding and very bad programming practices. Learn how to code well.
·       RAC as a stand-alone Disaster Recovery solution
o   RAC + Data Guard is a good disaster recovery solution, but:
o   RAC alone is not a good DR solution.
·        To maximize use of hardware
 Good inventarisation, what you will use your RAC implementation for, will help you choose a good hardware architecture.
o   Fault tolerant hardware is a key to successful RAC
implementation.
o   Use a multipathing solution to avoid loss of access to disks
rebooting the server.
o   Multiple voting disks is a must. Odd number of voting disks and
3 is a good place to start.
o   Remember that a node must have access to at least 50% of
visibility to the voting disks to servive.
o   LGWR performance is critical for Global cache performance.
o   Global cache transmission requires a log flush sync for Current
blocks and “busy” CR blocks.
·        Stretch cluster to enhance hardware usage


But are this the only good reasons and not so good reasons? 

These days Oracle tries to sell lots and lots of ExaData servers to customers. There are good valid reasons for that. With Oracle Maximum Availabilty Architecture on ExaData one can achieve the best performancve available and get rid of a lot of stress about the question how secure and safe is your data. Here is an example of Oracle MAA:










Like said before with Data Guard and MAA architecture RAC can be the preferred choice for you!

You can start with a quarter rack and extend on that to half a rack and when needed a full rack and up to 8 racks for a full Exadata install with the best performance and scalability available so far these days.

The Exadata MAA architecture consists of the following major building blocks:
1. A production Exadata system (primary). The production system may consist of one or more interconnected Exadata Database Machines as needed to address performance and scale-out requirements for data warehouse, OLTP, or consolidated application environments.
2. A standby Exadata system that is a replica of the primary. Oracle Data Guard is used to maintain synchronized standby databases that are exact, physical replicas of production databases hosted on the primary system. This provides optimal data protection and high availability if an unplanned outage makes the primary system unavailable. A standby Exadata system is most often located in a different data center or geography to provide disaster recovery (DR) by isolating the standby from primary site failures. Configuring the standby system with identical capacity as the primary also guarantees that performance service-level agreements can be met after a switchover or failover operation. Note that Data Guard is able to support up to 30 standby databases in a single configuration. An increasing number of customers use this flexibility to deploy both a local Data Guard standby for HA and a remote Data Guard standby for DR. A local Data Guard standby database complements the internal HA features of Exadata Database Machine by providing an additional layer of HA should unexpected events or human error make the production database unavailable even though the primary site is still operational. Low network latency enables synchronous redo transport to a local standby resulting in zero data loss if a failover is required. A local standby database is also useful for offloading backups from the primary database, for use as a test system, or for implementing planned maintenance in rolling fashion (e.g. database rolling upgrades). The close proximity of the local standby to the application tier also enables fast redirection of application clients to the new primary database at failover time. Following a failover or switchover to a local standby database, the remote standby database in such a configuration will recognize that a role transition has occurred and automatically begin receiving redo from the new primary database - maintaining disaster protection at all times. While the term ‘standby’ is used to describe a database where Data Guard maintains synchronization with a primary database, standby databases are not idle while they are in standby role. High return on investment is achieved by utilizing the standby database for purposes in addition to high availability, data protection, and disaster recovery. These include: o Active Data Guard enables users to move read-only queries, reporting, and fast incremental backups from the primary database, and run them on a physical standby database instead. This improves performance for all workloads by bringing the standby online as a production system in its own right. Active Data Guard also improves availability by performing automatic repair should a corrupt data block be detected at either the primary or standby database, transparent to the user.

·      Data Guard Snapshot Standby enables standby databases on the secondary system to be used for final pre-production testing while they also provide disaster protection. Oracle Real Application Testing can be used in conjunction with Snapshot Standby to capture actual production workload on the primary and replay on the standby database. This creates the ideal test scenario, a replica of the production system that uses real production workload – enabling thorough testing at production scale.
·      Oracle Patch Assurance using Data Guard standby-first patching (My Oracle Support Note 1265700.1) or Data Guard Database Rolling Upgrades are two methods of reducing downtime and risk during periods of planned maintenance. This is a key element of Exadata MAA Operational Best Practices discussed later in this paper.

3. A development/test Exadata system that is independent of the primary and standby Exadata systems. This system will host a number of development/test databases used to support production applications. The test system may even have its own standby system to create a test configuration that is a complete mirror of production. Ideally the test system is configured similar to the production system to enable:

·      Use of a workload framework (e.g. Real Application Testing) that can mimic the production workload.
·      Validation of changes in the test environment - including evaluating the impact of the change and the fallback procedure - before introducing any change to the production environment.
·      Validation of operational and recovery best practices.

Some users will try to reduce cost by consolidating these activities on their standby Exadata Database Machine. This is a business decision that represents a trade-off between cost and operational simplicity/flexibility. In the case where the standby Exadata Database Machine is also used to host other development and test databases, additional measures may be required at failover time to conserve system resources for production needs. For example, non-critical test and development activities may have to be deferred until failed system is repaired and back in production.
The Exadata MAA architecture provides the foundation needed to achieve high availability.




Share:

0 reacties:

Post a Comment