Global Cache and Global Enqueue Service
Processes and
Functions
Cache Fusion uses the most efficient communications as possible to
limit the amount of traffic used on the interconnect, now you don't need this
level of detail to administer a RAC environment but it sure helps to understand
how RAC works when trying to diagnose problems. RAC appears to have one large
buffer but this is not the case, in reality the buffer caches of each node
remain separate, data blocks are shared through distributed locking and messagingoperations. RAC copies data blocks across the interconnect to
other instances as it is more efficient than reading the disk, yes memory and
networking together are faster than disk I/O.
Ping
The transfer of a data block from instances buffer cache to
another instances buffer cache is know as a ping. As mentioned already when an instance requires a data block it
sends the request to the lock master to obtain a lock in the desired mode, this
process isknown as blocking asynchronous trap (BAST). When an instance receives a BAST it downgrades
the lock ASAP, however it might have to write the corresponding block to disk,
this operation is known as disk
ping or hard ping. Disk pings have been reduce in the later versions of RAC, thus
relaying on block transfers more, however there will always be a small amount
of disk pinging. In the newer versions of RAC when a BAST is received sending
the block or downgrading the lock may be deferred by tens of milliseconds, this
extra time allows the holding instance to complete an active transaction and
mark the block header appropriately, this will eliminate any need for the
receiving instance to check the status of the transaction immediately after
receiving/reading a block. Checking the status of a transaction is an expensive
operation that may require access (and pinging) to the related undo segment
header and undo data blocks as well. The parameter _gc_defer_time can be used
to define the duration by which an instance deferred downgrading a lock.
Past Image Blocks (PI)
Past Images (PIs), basically are copies of data blocks in the
local buffer cache of an instance. When an instance sends a block it has
recently modified to another instance, it preserves a copy of that block,
marking as a PI. The PI is kept until that block is written to disk by the
current owner of the block. When the block is written to disk and is known to
have a global role, indicating the presents of PIs in other instances buffer
caches, GCS informs the instance holding the PIs to discard the PIs. When a
checkpoint is required it informs GCS of the write requirement, GCS is
responsible for finding the most current block image and informing the instance
holding that image to perform a block write. GCS then informs all holders of
the global resource that they can release the buffers holding the PI copies of
the block, allowing the global resource to be released. You can view the past
image blocks present in the fixed table X$BH.
Cache Fusion I
Cache Fusion I is also know as consistent read server and was
introduced in Oracle 8.1.5, it keeps a list of recent transactions that have changed
a block.the original data contained in the block is preserved in the undo
segment, which can be used to provide consistent read versions of the block.
In a single instance the following happens when reading a block
·
When a reader reads a recently
modified block, it might find an active transaction in the block
·
The reader will need to read the
undo segment header to decide whether the transaction has been committed or not
·
If the transaction is not
committed, the process creates a consistent read (CR) version of the block in
the buffer cache using the data in the block and the data stored in the undo
segment
·
If the undo segment shows the
transaction is committed, the process has to revisit the block and clean out
the block (delay block cleanout) and generate the redo for the changes.
In an RAC environment if the process of reading the block is on an
instance other than the one that modified the block, the reader will have to
read the following blocks from the disk
·
data
block to get the data and/or transaction
ID and Undo Byte Address (UBA)
·
undo
segment header block to find the last undo block used
for the entire transaction
·
undo
data block to get the actual record to
construct a CR image
Before these blocks can be read the instance modifying the block
will have to write those's blocks to disk, resulting in 6 I/O operations. In
RAC the instance can construct a CR copy by hopefully using the above blocks
that are still in memory and then sending the CR over the interconnect thus
reducing 6 I/O operations.
As
from Oracle 8 introduced a new background process called the Block Server
Process makes the CR fabrication at the holders cache and ships the CR version
of the block across the interconnect, the sequence is detailed in the table
below
While
making a CR copy, the holding instance may refuse to do so if
·
it does not find any of the blocks needed in
its buffer cache, it will not perform a disk read to make a CR copy for another
instance
·
It is repeatedly asked to send a CR copy of
the same block, after sending the CR copies four times it will voluntarily
·
relinquish the lock, write the block to the
disk and let other instances get the block from the disk. The number of copies
it will serve before doing so is governed by the parameter _fairness_threshold
Cache Fusion II
Read/Write
contention was addressed in cache fusion I, cache fusion II addresses the
write/write contention
Cache Fusion in Operation
A
quick recap of GCS, a GCS resource can be local
or global, if it is local it can be acted upon without consulting other
instances, if it is global it cannot be acted upon without consulting or
informing remote instances. GCS is used as a messaging agent to coordinate
manipulation of a global resource. By default all resources are in NULL mode
(remember null mode is used to convert from one type to another (share or
exclusive)).
The
table below denotes the different states of a resource
Mode/Role
|
Local
|
Global
|
Null (N)
|
NL
|
NG
|
Shared (S)
|
SL
|
SG
|
Exclusive (X)
|
XL
|
XG
|
States
|
||
SL
|
it can serve a copy of the block to other
instances and it can read the block from disk, since the block is not
modified there is no need to write to disk
|
|
XL
|
it has sole ownership and interest in
that resource, it has exclusive right to modify the block, all changes to the
blocks are in the local buffer cache and it can write the block to the disk. If
another instance wants the block it can to come via the GCS
|
|
NL
|
used to protect consistent read block, if
an instance wants it in X mode, the current instance will send the block to
the requesting instance and downgrades its role to NL
|
|
SG
|
a block is present in one or more
instances, an instance can read the read from disk and serve it to other
instances
|
|
XG
|
a block can have one or more PIs, the
instance with the XG role has the latest copy of the block and is the most
likely candidate to write the block to the disk. GCS can ask the instance to
write the block and serve it to other instances
|
|
NG
|
after discarding PIs when instructed to
by GCS, the block is kept in the buffer cache with NG role, this serves only
as the CR copy of the block.
|
Below
are a number of common scenarios to help understand the following
- · reading from disk
- · reading from cache
- · getting the block from cache for update
- · performing an update on a block
- · performing an update on the same block
- · reading a block that was globally dirty
- · performing a rollback on a previously updated block
- · reading the block after commit
We
will assume the following
- · Four RAC environment (Instances A, B, C and D)
- · Instance D is the master of the lock resource for the data block BL
- · We will only use one block and it will reside at SCN 987654
- · We will use a three-letter code for the lock states
o
first letter will indicate the lock mode - N
= Null, S = Shared and X = Exclusive
o
second latter will indicate lock role - G =
Global, L = Local
o
The third letter will indicate the PIs - 0 =
no PIs, 1 = a PI of the bloc
for
example a code of SL0 means a global shared lock with no past images (PIs)
The
above sequence of events can be seen in the table below
(Last overview picture of all RAC processes, all rights Julian Dyke)
0 reacties:
Post a Comment