Recently I was talking with a company in Houston,TX running Oracle E-Business (EBS) 11i on Solaris with an Oracle 9i database. They run VMware for other parts of their business and wanted to leverage the features of VMware vSphere and Site Recovery Manager (SRM) to virtualize their EBS environment and have the ability to quickly move their EBS environment in the event of a hurricane or other natural disaster bearing down on Houston to their geographically diverse DR site.
This call had lots of moving parts. They were on Oracle 9i database and wanted to move to Oracle 11g to ease their support costs. They wanted to move from Solaris hardware to commodity x86 hardware and RedHat linux to ease their support costs. Their existing Oracle 9i database was running on and using 8 SPARC processors at peak levels throughout the business day.
in VMware vSphere, the maximum processors you can make visible to a VM is 8 virtual x86 processors. Is a virtualized x86 processor as fast as a physical SPARC processor? Would their SQL run faster on Oracle 11g than it did on Oracle 9i? Is RedHat Linux going to allow the database to process requests as fast as Solaris? Will their SAN storage and LUN layout be fast enough? Will their file system be a limiter?
Besides building up the environment and just going for it in production, how can you know?
By leveraging some very cool tools from both Oracle and VMware.
For Oracle database, Oracle offers an add-on called Real Application Testing which has a feature called Database Replay. Database replay allows you to capture the workload on your production database server and replay it on another environment to see if things are faster or slower. Although this was a new feature on Oracle 11g database, Oracle made backports of this available for 9i and 10g databases – exactly for purposes like this (well, maybe not to aid in virtualizing to VMware, but you get my drift).
Using Database Reply and Real Application Testing (both licensable features from Oracle Enterprise Edition) allows companies to test SAN changes, hardware changes, database upgrades, OS changes, etc., all with a production load, but without risking actual production issues.
Where does VMware fit into this? The way Real Application Testing and Database Reply work is by capturing all the transactions generated in production, massaging them a little bit, and then playing them back against a clone of Production. That clone needs to be at the exact same point in time (or SCN – System Change Number in database speak) as PROD so that the replay is playing back against an exact replica of the database. Although setting up a clone to an exact isn’t hard for an Oracle DBA, it does require time – time to build the test system, time to restore a backup of the database and time to apply archive logs and roll the database forward to match PROD’s SCN.
Even in cases such as this where the Production database isn’t virtualized, by making the test system virtualized, we can not only test all these changes, but we can leverage VMware snapshot technology to allow us to very quickly take our database back to the SCN we want to run Database Replay against, without having to continually restore the database. Using snapshots you just go thru that setup effort once, take a snapshot and then just keep rolling back to your snapshot as many times as necessary to test performance.
Of course, you may find that the 8 processor limit in VMware or the OS or the SAN can’t handle your production load. Time to give up and stay physical? No. In Oracle 10g and further refined in Oracle 11g, Oracle has greatly improved the ability the database has to help a DBA manage the system load and even to have the database tune itself. By leveraging features such as Advanced Compression and SecureFiles (to reduce the physical I/O), Automatic Optimizer Statistics Collection and Automatic SQL Tuning Advisor (to tune queries to use less CPU and/or disk resources), you can give your database more room to grow yet still stay on the same (or less!) hardware.
1 thought on “Too big to fail: Virtualizing Big Iron databases”
another option, which I used previously was to create a VM node for the RAC; so that if a node had to be taken offline for maintenance or migrated to another DC we had an easily duped production node.