Blog Archive

My Works

I am an Oracle database Consultant My Areas of Interests being High availabilty,Infrastructure consulting and Performance Tuning. I am posting Topics of My interests Related to these Technology Areas. Do post Your comments as well.

Monday 11 August 2008

In Memory Undo (IMU)

In Memory Undo (IMU) introducted in oracle 10g. We all as experienced DBA's would appreciate the fact that how much ever improvements have been made so far in the undo management solutioning provided by oracle , this topic always takes a beating the moment we start a debate on comparing oracle with other DataBase offerings like IBM or informix.The undo segment management has given rise to lot of hidden complexities in terms of consuming resources; memory, CPU, and IO.The next step is the avtar of In Memory Undo (IMU) which improves performance by overrding the the above said.

I did go through a very nice document by Craig Shallahamer which gives an expericene of IMU

http://www.trutek.com/uploads/docs/In%20Memory%20Undo%201g_1.pdf

Hope we will have more responses from IBM on top of the below

ftp://ftp.software.ibm.com/software/data/pubs/papers/readconsistency.pdf

Friday 18 July 2008

Oracle Clusterware I

Oracle 10g Real Application Clusters has redefined High Availability (HA) architecture by extending a solid framework through a complete integrated clustering and volume management solution on all supported platforms. I am trying to provide a technical overview of the 10g Clusterware and its usage when compared to the third party Clusterware products in the market.

CLUSTERED DATABASE

Oracle Real Application Clusters (RAC) allows multiple computers to run the Oracle RDBMS software simultaneously while accessing a single database. This is called a clustered database. In a non-RAC Oracle database, a single database is accessed by a single instance. The database is considered the collection of data files, control files, and redo log files located on a shared disk subsystem. The instance is considered the collection of Oracle-related memory and operating system processes that are running on the computer.
In Oracle RAC, two or more computers (each with an instance) concurrently access a single database. This allows an application or user to connect to either computer and have access to the same data.

Now having said about a cluster, there should be a mechanism in place which monitors and manages the multiple computers part of the cluster. The piece of software that does this functionality is known as the Cluster Manager (CM) or some times referred to as the Clusterware. The CM is primarily responsible for maintaining information about nodes in the system. Starting with oracle 10g RAC, Oracle Corporation has started providing the Clusterware known as Oracle Clusterware as a product which not only manages the interconnected computers but also has integrated with the High Availability (HA) Functionality of the application i.e. the RDBMS software. Prior to oracle 10g RAC the vendor supplied Clusterware like Veritas Storage Foundation For RAC ,Sun Cluster or HP-Service Guard etc had to be relied upon. The same can be used in 10g RAC as well for Managing the Cluster. This paper examines the usage of Vendor supplied Clusterware against Oracle Clusterware for Managing the Cluster.

CLUSTERWARE AND SPLIT BRAIN

Cluster nodes has to communicate with each other to check whether all the participating computers are part of the cluster or alive, for this they make use of the private interconnect which is ideally a redundant network for intranode communication. Also this information needs to be captured by the Clusterware.
Now imagine a situation where all the intranode communication fails, but the nodes are still running without the knowledge of each other.
Considering the above scenario each node thinks that the other one is dead, and that it should have control over the application. This Condition is known as a split Brain this is a cluster phenomenon and not RAC specific. This would end up in nodes independently accessing the shared disk subsystem; because they do not know the other nodes are doing the same thing. This could lead to a potential Data corruption. It should be noted that splitting communications between cluster nodes does not constitute a split brain. A split-brain means cluster membership is under threat in such a way that multiple nodes are trying to access the same exclusive resources, which results in data corruption. The goal is to minimize the chance of a system taking over an exclusive resource while another has it active, yet accommodate a system powering off.

To prevent this kind of situation the Clusterware operates in the following manner.
When a catastrophe in the form of private interconnects failure leading to the formation of sub clusters happens then only one of the sub clusters is allowed to continue cluster operation and the rest are kicked out. Clusterware implements this based on a voting or quorum disk, the mechanism is such that the participating nodes writes to this disk continually and any failure in writing here with in a specified time out interval then the node is evicted out of the cluster. The implementation of the quorum disk varies from vendor to vendor but the basic concept remains the same.The quorum disk cannot store any data pertaining to the application i.e. the user data.

Consider a scenario where only one node experienced problems on all interconnect links and could not receive heartbeats from the other node anymore. Having this information this node would go through a reconfiguration and decide to kick the other node out and take control of the cluster. However if the other node was still receiving heartbeats from this node and did not reconfigure or check whether it could take control over the cluster or not, at this point it could be possible that this would continue accessing the shared disk and issuing a disk I/O. But once a node has decided to take control over the cluster and access the data it should make sure that the other node is unable to access any of the shared data.

The Clusterware makes sure that soon as the ‘unaware’ node tries to access the data on one of the shared disks; it is kicked out to prevent any data corruption. The other node then remains in the cluster fully operational. Simple reconfiguration of the node membership does not guarantee data protection. If a node is hung or suspended and comes back to life, it could cause data corruption before the cluster manager can determine the node was supposed to be dead. Clusterware takes care of this situation by providing full data protection at the data disk level; the mechanism used for this is called Disk Fencing or I/O Fencing. This is a key aspect of a Clusterware which ensures data consistency.

Tuesday 13 November 2007

Logical I/O and Row Prefetching

Logical I/O is the sum of current reads and consistent reads, most of these can be satisfied from the buffer cache and any that cannot be satisfied from the cache would result in a phyisical read.Current reads are read of current version of the block.This can be used to resconstruct consistent versions of a block.There will be only one current version of the current read block that can exist at any point in the buffer cache.Consistent read can be potentially a read of historic version of the block, this can also be used to construct read consistent versions of a block.Oracle maintains multiple versions of such blocks in the buffer cache and is not a dirty block(updated blocks) on the other hand current read blocks would be dirty .We would be interested in logical reads per session or logical reads for a query.This is the sum of db block gets(current reads) and consistent gets (consistent reads).In real world we would be accessing the data through a client interface which could be sqlplus,jdbc,OCI etc.When we have queries returning more number of rows,not all the rows are returned in a single network round trip but would need multiple round trips.This essentially means that the session would be re-reading the same blocks again from its PGA.The number of rows fetched per round trip depends on the prefetch size for the respective interfaces.It can be visualized that a small prefetch size would lead to an increase in the number of round trips and hence degrade the perfromance.This does not mean that we should opt for a much higher prefetch size, there is always a trade off as a higher value of the same would result in increased network traffic and would have a negative impact on the performance.The best way would be to bench mark what would be the optimal prefetch size for each interface.Listed below are the mechanisms/calls which should be made use for setting the prefetch size.


OCI ==> OCI_ATTR_PREFETCH_ROWS
Pro*C ==> Host Array
JDBC ==> setRowPrefetch ()
PL/SQL ==> BULK COLLECT
SQL*Plus ==> SET ARRAYSIZE

I am providing reference to another blog which explains in great detail with examples on how to calculate consistent gets and its impact when using sqlplus.

http://viveklsharma.blogspot.com/2007/11/consistent-gets-myth.html



Below is another link which talks about the prefetch size and its impact when accessing the result set through a JDBC connection.

http://www.oracle.com/technology/sample_code/tech/java/sqlj_jdbc/files/advanced/RowPrefetchSample/Readme.html


In case an Object Relationship Mapping tool like Toplink or hibernate is used then prefetch-size and hibernate.jdbc.fetch_size may be used.

Monday 12 November 2007

Indexing Strategy and Column Order - III

We discussed a neat scenario where the skip scan be used because of the data distribution in the columns concerned.Let us now try with a different Distribution to create 256 distinct values for the first column and expanding the on the explanation we did on the closing of the previous post.

begin
for i in 1 .. 30000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/



begin
for i in 30001 .. 60000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/




begin
for i in 60001 .. 90000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/


begin
for i in 90001 .. 100000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/


TEST>select * from SKIP where b=60002;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=6)


1 0
TABLE ACCESS (FULL) OF 'SKIP' (Cost=18 Card=1 Bytes=6)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
172 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



This went for a full scan as opposed to a skip because it would have to manage and play with the 256 sub indexes if a skip scan was considered.


Let us now check the consistent gets if it goes for a skip scan


TEST>select /*+INDEX_SS(SKIP IDX_SKIP) */ * from SKIP where b=60002
;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=257 Card=1 Bytes=6)


1 0
INDEX (SKIP SCAN) OF 'IDX_SKIP' (NON-UNIQUE) (Cost=257 Card=1 Bytes=6)



Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
268 consistent gets
0 physical reads
0 redo size
234 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


We can see that COST as well as the Consistent gets is on the higher side.


Given the above data distribution and it would certainly benifit in going for an additional index on B.This demonstrates what we discussed in the previous post.

TEST>create index idx_new on skip(b);

Index created.

TEST>exec dbms_stats.gather_table_stats('SKIP_TEST','SKIP',CASCADE=
>TRUE);

PL/SQL procedure successfully completed.

TEST>SET AUTOTRACE TRACEONLY EXPLAIN statistics

TEST>select * from SKIP where b=60001;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)


1 0
TABLE ACCESS (BY INDEX ROWID) OF 'SKIP' (Cost=2 Card=1 Bytes=6)


2 1
INDEX (RANGE SCAN) OF 'IDX_NEW' (NON-UNIQUE) (Cost=1 Card=1)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Now what if we change the index order

TEST>drop index IDX_SKIP;

Index dropped.

TEST>drop index IDX_NEW;

Index dropped.

TEST>CREATE INDEX IDX_REV ON SKIP (B,A);

Index created.

TEST>exec dbms_stats.gather_table_stats('SKIP_TEST','SKIP',CASCADE=
>TRUE);

PL/SQL procedure successfully completed.

TEST>SET AUTOTRACE TRACEONLY EXPLAIN statistics
TEST>select * from SKIP where b=60001;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)


1 0
INDEX (RANGE SCAN) OF 'IDX_REV' (NON-UNIQUE) (Cost=2 Card=1 Bytes=6)






Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



TEST>select b from skip where a='z' ;

391 rows selected.


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=391 Bytes=2346)


1 0
TABLE ACCESS (FULL) OF 'SKIP' (Cost=18 Card=391 Bytes=2346)






Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
198 consistent gets
0 physical reads
0 redo size
3836 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
391 rows processed

TEST>select /*+ index_ffs(SKIP IDX_REV)*/ b from skip where a='z' ;


391 rows selected.


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=391 Bytes=2346)


1 0
INDEX (FAST FULL SCAN) OF 'IDX_REV' (NON-UNIQUE) (Cost=26 Card=391 Bytes=2346)



Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
284 consistent gets
0 physical reads
0 redo size
3837 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
391 rows processed

This shows that the column order of the index has defenitly an impact on the query.To be certain about what sort of indexing strategy we need, we would need to revisit the queries executed, number of executions and their predicate values.
If the above query (where b=60001) is executed very less frequently , we dont need to realy care about the additional index or about reversing the order.But if the query pattern was (where b=:2 and a=:2 , where b=:2 , where a=:1) we would need the index in the order (B,A).

I am closing the discussion with an excellent explanation from tom kyte for a typical scenario which was discussed in ask tom.

For a query that references all of A, B, C the above three indexes will perform more or less the same -- regardless of the selectivity of A, B, or C.

Does it mean than queries:

a) select /*+ index (x index_abc) */ * from x
where a = 1
and c between 1 and 7
and b > 2

b) select /*+ index (x index_cba) */ * from x
where a = 1
and c between 1 and 7
and b > 2

will have the same performance regardless of the selictivity of A, B and C fields?

-----------------------------------------------------------------------------------
It'll depend -- but in general, for most cases, they would be the same.

In your case -- if the following was true:


there is
one row for b such that b > 2 (and hence only one index entry)
and c was equal to 2 for that row
and a = 1

and at the same time there is
1,000,000 rows such that a=1
and c was equal to 2 for all of those rows
and one of those rows had b > 2


an index on B,C,A or B,A,C would probably be better then one on A,C,B as the index on B,C,A would be able to stop "faster"


(but did that REALLY have to do with the selectivity of B over C??? NO because this is also true:

there are 5 million other rows with b = 1
each of those 5 million rows have unique values for A
each of those 5 million rows have unique values for C (and all of them could be between 1 and 7 still

So here -- B is not selective at all but A and C are.

Indexing Strategy and Column Order II

Having Discussed about the index skip scan, let us explore what other alternatives the optimizer has to choose and why skip scan was considered as best.Does that have an influence in the column order?...Lets Explore

The index skip scan can be disabled by using an undocumented parameter.Let us now explore what is the plan that the optimizer chooses if we disable the same.

The table and index details are the same as that was used earlier.

TEST>alter session set "_optimizer_skip_scan_enabled"= false;

Session altered.

TEST>select * from SKIP where b=60002;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=6)


1 0
TABLE ACCESS (FULL) OF 'SKIP' (Cost=22 Card=1 Bytes=6)







Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
222 consistent gets
0 physical reads
0 redo size
250 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed



The above shows that optimizer has choosen a full scan of the table, also note the consistent gets.



What if we force a INDEX FAST FULL SCAN

TEST>select /*+index_ffs(skip idx_skip)*/ * from SKIP where b=60001
;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=30 Card=1 Bytes=6)


1 0
INDEX (FAST FULL SCAN) OF 'IDX_SKIP' (NON-UNIQUE) (Cost=30 Card=1 Bytes=6)






Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
302 consistent gets
0 physical reads
0 redo size
250 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed



The full scan of the index is favoured only when the hint is specified.The optimizer did the right thing here as expected.This can be verfied by looking at the consistent gets.The blocks to be scanned for a full table scan is less compared to the blocks to be scanned for the INDEX Full scan.

Note that consistent gets is the total number of blocks scanned including the re-read of the blocks from the PGA and not the total number of table blocks or index leaf blocks.The consistent gets depends on the array size,number of rows to be fetched and the total number of blocks.If your array size is too low then it can have a huge impact on the performance.



Now enabling the SKIP SCAN

TEST>alter session set "_optimizer_skip_scan_enabled"= true;

Session altered.

TEST>select * from SKIP where b=60002;


Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=6)


1 0
INDEX (SKIP SCAN) OF 'IDX_SKIP' (NON-UNIQUE) (Cost=5 Card=1 Bytes=6)






Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
249 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed



The above one is with SKIP SCAN enabled, with 4-subindexes it is better to skip scan.This is the default behaviour and oracle has done the right thing here.This is because the index here acts as a thinner version of the table where it has to scan only lesser number of blocks.

But this may not be true in all cases let us see why.Based on the above observations we may agree on the following facts.

Suppose we have

x varchar2(1) = 'M', F'
y number = 10,20,30,40,50
z date = 1 million distinct values


we have a query which satisfies

x = 'M' and y = 50 and z > sysdate

we would want to create index in the following order X,Y,Z or Y,X,Z because we can scan on Z. It would be most efficient to goto the place in the index where X=M, Y=50, Z > sysdate and simply scan from there.We can gurantee that we start from here rather than reach here by creating the index in the order specified above.

If Z were first, we would read lots of X=F values or Y!= 50 values while scanning A>sysdate and then only we would reach X=M,Y=50.

This tells us that the number of distinct values in a column in a concatenated index is not relevant when considering the position in the index.But an understanding of the values in the columns and the query is very important when we decide on creating an index and decide on the column order.

This also tells us that Skip Scan Index reduce number of indexes that need to be created to satisfy all of the combinations that one could use in the where clause,but the above case that we discussed is a very specific case where the leading edge column value is of very low cardinality.

The benefit of skip scan is that you can avoid having several indexes on the same columns,saving you update time and a bit of disk space.

In general if we have a query with the following patterns

where x = :1 and y = :2 and z = :3
where z = :3
where y = :2 and z = :3

It would be best to prefer a single index (c,b,a) over any other combination as the same can easily be used for all of the above queries.


so in general, look at the questions you have, try to minimize the indexes you need based on that.Look at the predicates in general, factor out what you want and use that as a starting point.