Oracle needs to improve their software installations and accompanying documentation

Of the many things Oracle Corporation needs to fix, one big one is their installers and the accompanying documentation. They are, simply, negligent.

The definition of negligence is “failure to act with the prudence that a reasonable person would exercise under the same circumstances”

I’ve been spending the last few days setting up infrastructure for Oracle UCM (Universal Content Manager). Due to the “new-ness” of UCM 11g, we’re going with UCM 10gR3 which has been out for a few years and has been thru many updates. Yes, Oracle acquired this software when it bought Stellent. That was in 2006 and Oracle has released their own versions since then. So no excuses there.

Nowhere in those updates did Oracle think to improve their installation.

Here’s just some of the issues I’ve had to deal with on this software. Please understand this isn’t some cheap piece of software – licensing for our environment was somewhere north of $1M. We’re also using the latest version of Oracle’s flagship database (11gR2 Enterprise Edition) and used the dbca (Database Creation Assistant) to create the database.

11gR2

1) When creating a database using DBCA, which is Oracle’s recommended method, it doesn’t even follow Oracle’s standards for controlfiles or redologs – things that are critical to having a database setup for maximum resistance to disk corruption issues.

a) DBCA by default will create 2 controlfiles – Oracle’s standard is 3 controlfiles.

b) DBCA by default will create 3 redo log file groups each with one member and names each member redo0X.log (X is 1, 2 or 3) and makes them each 50Meg in size. Oracle’s standard is actually 3 groups each with 2 members. Although not Oracle’s standards, I cannot fathom why they would make the file extension .log – you’re just begging for someone to accidentally forget those are critical to database operation and just delete what could easily be construed as unnecessary logging files.

UCM

1) Nowhere in the installation guide does it tell you the characterset HAS to be AL32UTF8 for the automatic installation to succeed.

2) The scripts to automatically create the user for UCM don’t appear to work, but in Oracle’s defense that may somehow be my fault. I can’t get them to work.

3) In the UCM guide Oracle gives sample code for creating a tablespace for the UCM data – yet those create scripts are built for a database that doesn’t use locally managed tablespaces. Locally managed tablespaces was released with Oracle 9i database and UCM requires 9i or higher, so there’s just no reason for this.

4) If you’re doing the install on Linux and using Apache (which I suspect would be the majority of installs), Oracle doesn’t automatically make the necessary changes to Apache needed to get the product working. The pre-installation tasks and considerations (Chapter 3) don’t mention this. The step by step installation instructions (Chapter 4) don’t mention anything about manual setup. The installer itself asks you what web server you wish to use and gives you the following options

Web Server

*1. Apache

2. Sun ONE

3. Configure manually

Doesn’t that imply that if you choose Apache it will be setup automatically?

In Chapter 5 (Post-Installation Tasks and Considerations), the section on Web Servers says

“If, during the installation of the Content Server software, you chose to configure the web server manually, you need to perform a number of tasks to set up and configure the web server for use with Content Server. For further details refer to Appendix A”

Again, doesn’t that imply that since I chose Apache it was setup automatically?

It isn’t until you end up at Appendix A do you find this “Since Apache cannot be configured automatically by the Content Server installer, you need to do it manually” Once you do try those steps, you’ll notice they use non-default filename paths and inconsistent server instance names in their examples.

Seriously Oracle, this is negligent. I could write out similar blog posts about most Oracle products I’ve had to install over the years.

Oracle, before you publish your documentation, take it to someone not on the development team, give them the documentation and have them try and follow the steps. I think you’ll be surprised by the results.

a rant about FUD about Oracle on VMware

You know those searchXXXXXXXXXXX.com websites (searchoracle.techtarget.com, searchvmware.techtarget.com, searchvirtualization.techtarget.com etc)? There are some good articles, but I keep seeing alot of plainly inaccurate articles about virtualized Oracle, especially Oracle under VMware.

I keep seeing these FUD (Fear, Uncertainty, Doubt) articles on searchXXXXXXXXXX.com that just bug the hell out of me.

Take this article about Oracle RAC on VMware . They start out with something reasonable and accurate

Oracle will not support customers running Oracle RAC on VMware, for reasons that many say are political and technically outdated.

and then say things that are just completely not true:

In short, Oracle won’t support it unless the customer can prove that the problem wasn’t related to the virtual machine.

While getting support for single-instance Oracle on VMware is difficult…”

I run multiple Oracle databases and various Oracle products (Oracle E-Business Suite, Oracle Hyperion, Oracle Universal Content Manager, Oracle Hyperion, Oracle Agile, etc) and It is no different to get support for Oracle virtualized under VMware unless troubleshooting leads Oracle Support to suspect your issue is with VMware itself.

The rest of the article I pretty much agree with. I’ve met Dave Welch numerous times and find his outlook on Oracle on VMware similar to mine. Oracle’s stated “we do not support Oracle RAC on VMware” appears to be nothing more than Oracle’s whim with no current technical issues to back it up. As much as I don’t like it, that’s their choice. If / when VMware starts getting Fault Tolerance working with multiple CPUs in a VM, that’s going to mute the argument about needing to run Oracle RAC solely for uptime requirements. Sure, you’ll still have VMs that aren’t good candidates for virtualization (Oracle VM or VMware) but that’s not the bulk of installations out there.

Oracle uses VMware in its training classes – I attended an Oracle Hyperion installation and configuration class last year that utilized VMware Workstation running 3 or 4 VMs on each student’s machine. I’ve worked issues with Oracle Hyperion with Oracle Support and had the analyst not only notice my environment was under VMware, but state that roughly half their customers run Hyperion virtualized under VMware. With Oracle now having Oracle VM and Oracle Virtualbox, you’d think at least Oracle’s own training partners would be using Oracle products in their lab and you’d think if this support was such a big deal that I’d have Oracle’s support telling me about the benefits of Oracle VM when they noticed I was running VMware.

Here’s another article that bugged me, this time about how Oracle VM is not half bad . First line of the article:

“Oracle’s continued refusal to support its applications virtualized on something other than the Oracle VM hypervisor has forced the hands of some users, pushing them to try the Xen-based virtualization offering.”

Did you see what I did? ” Oracle’s continued refusal to support its applications virtualized on something other than the Oracle VM hypervisor..”. That’s simply and plainly wrong.

I’ll end this article quoting from the official stance of Oracle Support with regards to VMware, My Oracle Support (aka Metalink) note 249212.1

Support Status for VMware Virtualized Environments 
-------------------------------------------------- 
Oracle has not certified any of its products on VMware virtualized 
environments. Oracle Support will assist customers running Oracle products 
on VMware in the following manner: Oracle will only provide 
support for issues that either are known to occur on the native OS, or 
can be demonstrated not to be as a result of running on VMware. 

If a problem is a known Oracle issue, Oracle support will recommend the 
appropriate solution on the native OS.  If that solution does not work in 
the VMware virtualized environment, the customer will be referred to VMware 
for support.   When the customer can demonstrate that the Oracle solution 
does not work when running on the native OS, Oracle will resume support, 
including logging a bug with Oracle Development for investigation if required.

---

A battery improvement tip with VMware Fusion

So I recently switched from PCs running a RHEL base operating system to a MacBook Pro running MacOS. It’s been overall a pretty smooth transition, but with plenty of small bumps along the way.

In current MacBook Pros (MBPs) there are 2 graphics chipsets – an integrated Intel chipset and a NVIDIA discrete chipset. The NVIDIA gives much better graphics performance but at the expense of battery life.

Whenever I start up a Windows VM, I found the system would automatically switch to the NVIDIA chipset. Since I don’t use my Windows VMs for graphics intensive usage (they’re mainly to run those few Windows only business applications), I needed to find a way to force the system to stay using the Intel chipset.

I came across gfxCardStatus . With this program I can manually switch which graphics chipset is being used. I’ve found that I need to set my chipset to Intel only before starting the VM in order for things to work properly. If I try to change it while the Windows VM is already running, the VM will no longer respond to keyboard input.

This *may* also be the case with Linux and ESX VMs – I haven’t run any of them recently. It is definitely an issue with Windows XP VMs.

Hope this helps!

VMware Knowledge base entries of interest for Oracle DBAs

One of the things I love about VMware’s support site is their knowledge base. It’s not horrific flash like My Oracle Support (aka Metalink), and it’s freely searchable without a support contract. Also very cool is there is an RSS feed of new or updated knowledge base articles. It’s good to scan in an idle moment or two each day to have an idea of what issues other people are seeing.

Because of that RSS feed, I came across three knowledge base articles I’d like to highlight here:

KB Article 1023696: Oracle 11G R2 32 bit client fails with a segmentation fault when run in a RHEL 5.4 64 bit virtual machine

In this case, sqlplus would seg fault when you try running it. The issue it turns out isn’t a VMware issue – it’s an Oracle bug when running 32-bit 11gR2 client on a 64-bit RH OS with an AMD processor. The fix is Oracle patch 8670579.

KB Article: 1023898
RedHat and CentOS virtual machine show warning messages when starting the udev daemon

This issue actually cropped up in my VMware environments awhile ago. Basically you see messages like this when your VM starts:

udevd[572]: add_to_rules: unknown key ‘SUBSYSTEMS’
udevd[572]: add_to_rules: unknown key ‘ATTRS{vendor}’
udevd[572]: add_to_rules: unknown key ‘ATTRS{model}’
udevd[572]: add_to_rules: unknown key ‘SUBSYSTEMS’
udevd[572]: add_to_rules: unknown key ‘ATTRS{vendor}’
udevd[572]: add_to_rules: unknown key ‘ATTRS{model}’

On RHEL, the fix is to do the following

vi /etc/udev/rules.d/99-vmware-scsi-udev.rule

change

ACTION==”add”, BUS==”scsi”, SYSFS{vendor}==”VMware, ” , SYSFS{model}==”VMware Virtual S”, RUN+=”/bin/sh -c ‘echo 180 >/sys$DEVPATH/device/timeout'”

To:

ACTION==”add”, BUS==”scsi”, SYSFS{vendor}==”VMware ” , SYSFS{model}==”Virtual disk “, RUN+=”/bin/sh -c ‘echo 180 >/sys$DEVPATH/device/timeout'”

and then reboot the VM.

The final article I want to mention is
KB Article: 1023185 VMware Tools installation fails to start the guest operating system daemon on Red Hat Enterprise Linux 4 64-bit guests with the 32-bit glibc-common package installed

This issue relates to Oracle because 32-bit glibc-common is frequently required for Oracle DB installs. The issue occurs because VMware tools configuration is looking for the 64-bit tools (64-bit OS, generally you’d want to install the 64-bit RPMs…). The solution is to install VMware tools as normal, but before running the configuration script, to issue
ln –s /usr/lib/vmware-tools/lib64/libdnet.so.1/libdnet.so.1 /lib64/libdnet.so.1
ln –s /usr/lib/vmware-tools/lib64/libproc-3.2.7.so/libproc-3.2.7.so /lib64/libproc-3.2.7.so
and then run the configuration program for vmware tools
/usr/bin/vmware-config-tools.pl

Hopefully this is helpful to other Oracle on RHEL under VMware people out there.

Oracle VM compared to VMware vSphere: Part 1

I’ve been meaning to take a serious look at Oracle VM for a few months. In fact, it was this post [Live Migration of EBS Services Using Oracle VM] (and my long-winded reply) that was a major push for me to start this blog.

The final bit of impetus to learn all about Oracle VM came a few months ago when I saw the “Oracle VM for x86 Essentials” beta exam. If passed, you earn the certification “Oracle VM for x86 Certified Implementation Specialist”. It’s a certification geared for Oracle Partners. I figured the knowledge could help me to better understand Oracle’s offering. First and foremost, I’m an Oracle Applications DBA. If Oracle’s product could allow me to better serve my clients and do my job – awesome!

So I’ve been hitting all the Oracle VM resources I could find to learn about the product. I’ll post links to a number of the excellent resources I found at the end of this post. All the links at the bottom refer to information on the currently available product (Oracle VM 2.2). While compiling all of this information, I came across [Oracle Virtualization:Making Software Easier to Deploy, Manage, and Support] – a slide deck from a recent Sydney Australia Oracle meetup. It talks about upcoming features of Oracle VM 3.0. If those features come to pass, Oracle VM will become more enticing to many organizations.

Honestly, I’ve got *tons* of things I want to write about with regards to Oracle VM — so much that I don’t know where to begin.

General Impressions
Remember that first time you went from something with a nice GUI, like Windows (Thanks Apple Microsoft!) to something a little more “nerdy” like Linux ? The GUI, if there was one, was stripped down and clunky. Many of the things you could do with a couple of mouse clicks before now require specialized commands at a command line. All of these different steps you need to do just to get things working. Well, it’s the same type of thing going from VMware vCenter to Oracle VM Manager. It’s not that the product is bad — it isn’t. The Oracle VM interface is clunky and the product doesn’t have the richness of features of VMware vSphere. Simple as that. Are those differences worth it to you? Everyone’s needs are different. Both underlying products (Oracle VM Server, VMware vSphere ESX 4.0) run Linux and Windows VMs well enough for most enterprise-level systems.

As you can read in this Gartner report on Server Virtualization Infrastructure, VMware is the clear market leader. Oracle VM, although categorized as a niche player, is the strongest of the niche players and right on the border of being listed as a challenger to VMware.

Here are a few areas where Oracle VM has an advantage over VMware:

o Certified vs. Supported

(I hate talking about this but it needs to be addressed.) Is your VMware virtualized Oracle database supported by Oracle? YES. Is it Certified? No. I went into this in detail in this [Oracle Support on VMware] blog post so I won’t do it again. Short of running Oracle RAC, which is expressly NOT supported when virtualized under VMware, the question of whether you should care about the “certified” distinction is something each company needs to answer for themselves. To me, the whole thing smacks of FUD (Fear, Uncertainty, Doubt).

o Pricing

There are two parts to pricing. First is the effect virtualizing Oracle Database will have on your Oracle database licensing. I go into this in more detail in a post on Oracle licensing under VMware. One of Oracle VM’s main selling points is that Oracle considers Oracle VM (through hardcoding the CPU binding in the vm.cfg file) a type of hard partitioning and VMware vSphere a type of soft partitioning. When using hard partitioning, Oracle only requires you to license the processors (cores) in that hard partition (aka, the processors visible to the VM). When using soft partitioning, Oracle requires you to license ALL the processors (cores) in the server, even though there may be many more processors present than allocated to the VM. It should be noted that you can do the same type of CPU binding (called CPU affinity) with VMware vSphere, but that Oracle somehow still considers this soft partitioning.

This just seems like a way for Oracle to give their Oracle VM product preferential treatment. How does the joke go… Where does the 800 lb gorilla sit? Anywhere he wants to.

The second part of pricing involves the actual Oracle VM product versus the VMware vSphere product. Oracle basically has two pricing points
o Premier Limited — Up to 2 CPU sockets, regardless of the number of cores per socket in the physical server
o Premier — unlimited CPU sockets in the server

VMware, unlike Oracle, has four product feature levels (Standard, Advanced, Enterprise and Enterprise Plus) and so a head to head comparison is a complete pain to do. The short answer is that Oracle can be significantly cheaper.. The downside of this inexpensiveness is a lack of features. Yes, VMware generally costs more than Oracle, but you’re paying for additional features. Are those features worth it to your organization? That’s for you to decide. In my organization, we are willing to pay for VMware’s features, but my organization’s needs may be different than the needs of your organization.

Does your organization have a need for VM snapshots? Mine absolutely does. Oracle VM doesn’t have it and VMware does, even when you’re using the free version of each product.

Does your VM require more than 8 CPUs? VMware has a limit of 8 CPUs for a single VM. Oracle VM’s limit is 32 CPUs for a single VM. Through tuning and software improvements, my main client has managed to reduce the number of CPUs for our Production Oracle E-Business Suite database from an unvirtualized 8 cores to 2 cores virtualized, so the difference is immaterial… but maybe your organization needs 20 cores.

Does your organization have a need to do vMotions / Live Migrations? They come included with Oracle VM, but it’s not recommended to do more than one at a time. There is an additional cost to get VMware vMotion, but VMware supports a default configuration with up to 4 simultaneous moves and allows up to 8 simultaneous moves.

Does your organization need automated SAN level replication of your VMs so they can be brought up automatically in case of disaster? VMware has that functionality with Site Recovery Manager. Oracle doesn’t have anything like it.

o Oracle VM Templates

Do you want a pre-built VM you can download with the Oracle software already installed and configured? Oracle offers downloads of pre-built environments from a basic OEL 5 Linux box all the way through to a downloadable 38GB Oracle EBS R12.1.1 system. I admit, that could be pretty cool. However, it may not be right for your company. My main client never allows consultants to have console type access to our Linux servers. I don’t think my auditors would approve of a pre-built VM for production use, even if it was pre-built by Oracle. As something for quickly throwing up a demo or dev environment, I think it’s fantastic. I hope Oracle continues to do this for more and more of their products. Oracle Enterprise Manager 10gR5 took me roughly 2 weeks to install. Discoverer 11g about a week. Secure Enterprise Search about 2 weeks. It would be great to have a pre-built test system I could reference when building my production systems.

I’ve got numerous ideas for more blog posts with regards to Oracle VM. Feedback directing me to what interests others would be great.

Part 2 coming after the I take the exam later this week. Wish me luck!

Links
Live Migration of EBS Services with Oracle VM
Installing & Configuring OEL 5 with Database 11gR1 as a Paravirtualized Machine (PVM) on an Oracle VM Server
The underground Oracle VM Manual
Official Oracle VM Wiki home page
Oracle VM for x86 Essentials Exam 1Z0-540 Exam Topics Study Guide
Oracle VM 2.2 Documentation Library
Performing Physical to Virtual (P2V) and Virtual to Virtual (V2V) (aka VMware to Oracle) conversions Note the excellent pdf linked from the article too.
Installing, Configuring and Using Oracle VM Server for x86

Is this good coding practice?

A consultant developer just handed me code with the following exception clause.

EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN 0;
WHEN OTHERS
THEN
RETURN 0;
END XXXXXXXXXXX;

Now, I’m no developer, but what the heck is that point of that mess? Regardless of what exception comes up, always return the same exception code. And if that’s not bad enough, always return 0, meaning everything is OK.

Sigh.

Scrambling of HR data in an Oracle EBS instance

With our most recent upgrade, we’ve implemented some of the Human Resources modules.

We’re now storing some sensitive employee data (stuff like salaries, Social Security Numbers, banking info, etc.) and one of the business mandates is to scramble this data in our cloned instances.

I knew from reading Steven Chan’s blog (this article by Sanchit Jindal) that Oracle offers two plugins for Oracle Enterprise Manager (OEM) that might serve our needs. These are the Data Masking Pack and the Application Management Pack’s for E-Business Suite.

While reviewing which of these might be best for our environment, I made note of the following resources, so for those interested:

Application Management Pack for E-Business Suite
Data Sheet- http://www.oracle.com/technology/products/oem/pdf/apps_mgmt_ebiz.pdf
Download- My Oracle Support, Patches, Search for Patch Number- 8333939

Data Masking Pack
Data Sheet- http://www.oracle.com/technology/products/manageability/database/pdf/ds/data-masking-pack-11gr2-datasheet.pdf
Canned Demo- http://download.oracle.com/technology/products/oem/screenwatches/data_masking/index.html
Download- http://www.oracle.com/technology/software/products/oem/index.html

I also found the comparison matrix at the Steven Chan blog post above (direct link is http://blogs.oracle.com/stevenChan/images/comparisonmatrix.jpg ) to be extremely useful.

For an EBS environment, AMP looks to be a better fit – It has pre-defined masking for EBS, which was exactly what I needed since I was going to be modifying data in pre-defined EBS tables / objects. AMP also allows you to hot-clone an Oracle EBS instance. Many newer Apps DBAs don’t have methods / scripts written to do this themselves.

In our environments, AMP would have required some significant and PROD performance impacting changes in order to use the data scrambling.

The data scrambling can only take place as part of a clone and AMP only supports cloning thru RapidClone integration and specifically does NOT support cloning from an RMAN backup. This means that utilizing AMP would require I clone directly from PROD, causing a non-negligible performance impact.

Another issue was cost. AMP is licensed per processor at a list price of $7k per processor. The number of processor licenses needed is obtained by adding up your Database, Application Server and Web Server (if on separate system from Application Server) processors to get the # of processor licenses you need to buy. In our case we utilize 8 Oracle Database licenses and 4 Oracle Application Server / Web Server licenses, so 12 * $7k or $84,000 list price, plus yearly support fees.

As a result of these two issues with AMP, the business asked me to see if I could come up with a cheaper home grown solution that would allow us to utilize our existing cloning strategies and not cost tens of thousands of dollars.

We had 3 main things we needed to scramble: National Identifiers (aka Social Security Numbers or SSNs), Salary data, and banking direct deposit information like routing and account numbers.

National Identifiers (SSNs):

National identifiers are stored in table per_all_people_f . This is NOT the primary key however. The primary key for per_all_people_f is
(PERSON_ID, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE) and each time a person changes positions or leaves / comes back to the company, a new row is added with an appropriate effective date range. This means I had to change the SSN to the same value for each record for a person. For example, Bob might have 4 records in this table due to leaving and coming back or switching positions. In all those records in PROD, he’d have the same SSN of “123-45-6789” and I needed to ensure when I cloned the instance and scrambled the data that his new SSN was consistently the same (but of course different from PROD) in the clone.

First I needed to generate a bunch of random but valid SSNs. There’s actually quite a few rules associated with this (for example, if you receive your SSN while living in Deleware, the first three digits must be 221 or 222) and luckily I found some scripts at fakenamegenerator.com which allowed me to create a long list of fake random SSNs. The fngssn.class.php file is GPL and is available at this link. I also modified their example.php script to call the fngssn.class.php file and output a SSN. I then just looped thru this program however many hundreds or thousands of times I needed, outputting and concatenating the SSNs to a dump file.

I then bring this list of SSNs into an excel spreadsheet as column B.
Column A is update per_all_people_f set national_identifier = ZZ
Column B is my list of fake SSNs
Column C is ZZ where national_identifier = ZZ
Column D is my list of real SSNs – this can be obtained with the following SQL: select distinct(national_identifier) from per_all_people_f where national_identifier IS NOT NULL;
Column E is ZZ;

I then save this spreadsheet out to a plain text file and edit the file, replacing all the ZZ with ‘ (Excel handles ‘ weirdly, so I just use vi afterward to do a global replace of ZZ with ‘ – it’s one command and takes a couple seconds).

At this point I’m left with a sql file with commands like
update per_all_people_f set national_identifier = ‘567-42-3477’ where national_identifier = ‘123-45-6789’;

I just need to run these and poof, the SSNs are now scrambled.

Salary data:

This was the easiest to change. Some people may get paid hourly, some salary. In either case though, their salary or hourly rate is stored in per_pay_proposals.proposed_salary_n . For someone that gets a salary of $100k a year this field would have 100000 and for someone paid $10 an hour the field would have 10 . It’s perfectly ok for people to have the same salary in real life, and I figured for the purpose of our scrambled clone, I’d go with a socialist ideal and just gave everyone the same value of 50. Admittedly, this means people who are salary now get $50 a year while the hourly people get $50 an hour. It’s good to be an hourly employee, at least in my scrambled instances and maybe now the hourly employees can afford health care!

update per_pay_proposals set proposed_salary_n = 50 where proposed_salary_n IS NOT NULL;

Banking data:

There are two parts to the banking data: the routing number (think of it as the bank number) and the account number. These are the two sets of numbers at the bottom of your checks (there’s also the check number as the third set of numbers on your checks but that’s not relevant).

The pay_external_accounts table holds the routing and account number info in some of the segment columns – where exactly depends on your implementation. We only wanted to update this data for people’s personal accounts, so our SQL looks like this:

update pay_external_accounts
set segment6 = ‘44444444444’, segment7 = ‘1111111111’
where external_account_id IN (select b.external_account_id from pay_personal_payment_methods_f a, pay_external_accounts b
where a.external_account_id = b.external_account_id);


There you go, all done. I know it sounds like a lot of manual steps, but once you’ve built the excel spreadsheet and generated your list of fake SSNs, all you really need to do is run the SQL for Column D and paste those values in, change the ZZ to ‘ and run the file. The SQL for salary and banking data take a few seconds each. Call it 10 or 15 minutes total per clone.

Now could this all be better? Sure. I could generate my fake SSNs via a SQL cursor and never need excel or php. But for now, this works. Please feel free to improve on my system and I’ll update the blog with the changes.

Lessons learned from a virtualized Oracle upgrade

So about a week ago, we did a rather massive upgrade at my main client to the Oracle E-Business infrastructure. The main things in this upgrade were:

Licensing modules necessary for us to have a full installation of Oracle HR
Upgrade Oracle database from 11.1.0.7 64-bit to 11.2.0.1 64-bit
Apply all CPU security patches thru Apr 2010
Upgrade memory on DB server from 8G to 12G
Upgrade server side java from 1.6.0_16 to 1.6.0_20
Upgrade client side java from 1.6.0_16 to 1.6.0_20b5 (see this link on why the special b5 version)
Apply approximately 350 (not a typo) individual E-Business patches, for the following things:
o Minimum Baseline Patch Requirements for Extended Support on Oracle E-Business Suite 11.5.10 (Note 883202.1)
o Upgrading from Financials Family Pack F to Family Pack G (FIN_PF.G)
o Recommended 11i Apps patches for all our products
o Java related patches
o Latest DST v11 related patches (see here)
o Implement WebADI

As you might gather from this list, it was a rather large upgrade. The apps patches alone totaled about 10GB of patches once merged into one patch and the backup directory for the merged patches ended up totaling 6GB. Test runs had the upgrade running about 24 hours with 8 CPUs on some scratch disk storage I had in the SAN . Like I mentioned in previous posts, we utilized VMware snapshots on our boxes at various points in the upgrade in case we needed to roll back or experienced an unforeseen issue.

One of the VMware best practices we follow with our VMs is to break the boot “disk” and the data “disk” for our VMs into their own virtual disks. Besides during booting up / shutting down of a VM, the boot disk generally experiences very low traffic. So it’s pretty typical, especially with a replicated SAN system such as ours, to put your boot “disks” (VMDKs) for a bunch of VMs on one VMware datastore, possibly with slower disks, and your data “disks” (VMDKs) on another dedicated datastore. In our case, the boot disk datastore is a 2 disk RAID 1 (mirrored) set with Fiber Channel drives and the data disk datastore is a 9 disk (8+1) RAID 5 datastore of SSDs (aka EFDs aka super super fast disks).

Although I had run multiple dry runs before the upgrade, one thing I failed to notice / realize is that by default VMware snapshots are stored where the VM lives, or more specifically, where the VM’s configuration file lives… in this case on my slowest disks.

This became extremely clear during our large merged patch of 330+ Apps patches – things got slower and slower. At that point, shutting down the VM and moving the snapshots wasn’t really an option. It was just a matter of suffering thru and learning for next time. Luckily the business had fully planned on the upgrade taking 24 hours for the patching even though I expected us to be at roughly 1/2 that time with SSDs.

By the time the upgrade was done and the business analysts had finished their testing and calling the upgrade good (and hence when we were ready to delete the 5 sets of snapshots), the snapshots for my two VMs that utilize about 450GB of space had grown to about 200GB. It took about 5 hours for the snapshots to be merged into the base VMDKs. Although the system was usable during that time, it was quite laggy. Luckily it was still the weekend for most of our users and they weren’t too inclined to utilize Oracle.

On the subject of VMware snapshot deletions, I recently came across two notes that should be of use to other VMware admins
1) With the latest version of vSphere (4.0 Update 2), VMware has greatly improved the speed and efficiency of deleting all the snapshots for a VM. You can read more about it here. Unfortunately at the time of my Oracle upgrade I was on vSphere 4.0 Update 1.
2) When you delete a large snapshot, it will frequently appear to “hang” at 95% – check out this knowledge base article on how to monitor snapshot deletions.

Overall the upgrade was a success and minus the occasional user issues Monday morning (first business day after the upgrade) was pretty much a non-event.

These are the sorts of situations that make sending your people to training, or giving them the time and inclination to read manuals and blogs, so essential. Not as a result of this, but somewhat related, I’ll be attending the VMware vSphere troubleshooting class in the next month or two and will be (assuming I pass the test) earning my VCP and possibly trying to earn a VCAP-DCA by end of year.

How virtualization can magnify your architecture problems

I recently started working with a new client who has a hosting provider hosting their Oracle database on Linux under VMware. An excellent choice, but this client is experiencing major performance issues – data for forms taking a minute or more to come up is just one example.

As I learned more about their environment I found that virtualization (VMware in this case, though the issue isn’t specific to any particular virtualization vendor) actually made their system performance worse. I know, I’m a VMware groupie (heck a VMware vExpert!) and we’re all amazed I’d write such a thing, but alas, it’s true.

The database is around 80GB in size. Each day this hosting provider would take a full (level 0 incremental) backup of the Oracle database via RMAN. The hosting provider wrote this RMAN backup to the same mount point in the VM that the database uses.

Please take a moment to catch your breath and stop clenching your hands into fists over this very very bad idea.

So why is this such a bad idea? For a couple of reasons.

One is performance – you’re now greatly degrading the performance of your database by writing a full backup to the same disks that are trying to handle database requests. You have, at the least, doubled the amount of I/O going to those disks.

Two is the ability to recover. If your ESX host or your VM experiences an issue (running out of disk space, disk corruption, fire, whatever), you can no longer access the mount point in the VM where you backed up the data.

Best practice for implementing RMAN in a situation like this is to backup your database to another set of disks on another machine in another physical location. A typical example is to have an NFS export on your backup destination server (in another datacenter) and have RMAN write direLet’s say ctly to that NFS mount. This way you aren’t writing your backup to the same disks (thereby not impacting production performance much) and you’re covered in the case of issues with the hardware or VM itself.

So where does VMware fit into this? I mentioned that the hosting provider was also performing VM-level backups. In particular, they were performing VM-level backups at the same time they were running RMAN backups. All to the same set of disks.

Now I’ve got the VMware Admins and the Oracle DBAs cringing.

When you initiate a VM level backup, VMware takes a snapshot of the VM. This means it makes a delta file on the same ESX datastore and stops writing to the VMDK(s) that make up the VM. All changes to the VM get written to the delta file instead. That delta file can grow (8 megs at a time) up till it’s the same size as the original VMDK.

When you are taking a VM level backup, you want to choose a time when you’re not doing many writes to the VM. This way the delta file won’t grow so big that you could run out of space on the datastore (LUN) and your performance impact is decreased.

So here they are writing their full Oracle backup of 80GB out to a mount point inside their VM. That’s 80GB of writes you’re doing. VMware see those writes and has to write them to it’s snapshot (delta file). So now not only are you serving up database queries on your disks, you’re also scanning every block of your database on those disks for changes (this database did not employ Oracle Changed Block Tracking), you’re writing a full RMAN backup to those disks and VMware is having to copy all those writes into a delta file on those same disks.

Virtualization can be wonderful and solve or simplify many of the issues an administrator faces, but it can also magnify fundamental architecture flaws.

Why run Oracle E-Business on vSphere?

If I’ve got the dedicated hardware for my Oracle E-Business (EBS) install, why would I want to run it virtualized?

Most of the places I’ve seen typically have dedicated hardware for the Oracle EBS instances – so why virtualize it at all?

One major reason is the ability to rollback the entire environment in the event of a patching issue. My main client has an upcoming upgrade involving a database upgrades (11gR1 to 11gR2) OS level patches (updates to RedHat 5.4), Java updates (going to 1.6.0_20b5) and roughly 300 Oracle EBS patches. Yes, 300. Although we’re still staying at 11.5.10.2 ATG RUP 7, Oracle recently updated their minimum patch list you must be on to be received Extended Support on Apps 11i as of Nov. You can find the full up to date list at Metalink document Minimum Baseline Patch Requirements for Extended Support on Oracle E-Business Suite 11.5.10 (Note 883202.1).

Without virtualization, I’d have to take a full backup of my Database and App Tiers before starting the upgrade and then be prepared to go back to these if we can’t finish the upgrade by a certain time. Depending on your environment, this could be hours, days or not even practical.

Instead, at the start of the upgrade I merely take a snapshot of each VM involved. If I do this with the VMs shut down (i.e. not capturing the memory state of the VM as well) this takes about 10 seconds. Done. In the event we decide to rollback all our upgrade changes, it’s just a quick Revert to the Snapshot (might take as long as a minute per VM – OMG, a whole minute!!), restart the Apps services and poof, you’re done.

If only the upgrades themselves were so easy and foolproof.

Another reason to virtualize is of course getting the best usage of your Oracle licenses. Oracle EBS requires Oracle Enterprise Edition. I’ve blogged previously here about how to do that so I won’t go into that here.

Avoid the RAC tax. According to Oracle’s most recent Global Price list, RAC is an additional $23k per processor on top of Enterprise Edition. RAC makes sense for companies in two situations – 1) When their database must not go down. 2) Systems too large to run on 1 physical box. Sure, there are some installations where EBS requires more horsepower than can be run under 8 processors (the limit of a VM under vSphere 4.0 Update 2), but they are the vast minority. Many companies running EBS that implement RAC because they want to be protected from hardware failures. VMware has two features that protect you here – High Availability (HA) which will restart your VM on another vSphere host if the VM goes down and the much cooler and more RAC like feature, Fault Tolerance (FT). FT will actually run your VM on two ESX hosts simultaneously. They work in an active / passive arrangement – you aren’t having users connecting to both machines and load balancing between the two machines, but instead all users are connecting to one machine and in the event it fails, user sessions and other processes and connections are now active on the other node. Your users don’t notice anything, no disconnections, no restarts, no autoconfig required. It just simply works. The current drawback to FT is that it’s currently limited to 1 CPU VMs – that’s reasonable in my experience with an Apps 11i Forms/Web tier, but can frequently be a show stopper with a database server. However, if you’re willing to leverage the performance tuning features of 11g, you may be able to get past that. It is also rumored that VMware is working on getting FT to work with multiple processor VMs. When they do that it should really put a dent in RAC. Before I started the leveraging the SQL tuning available in 11gR2 and SQL Profiles, my main client system ran with CPU as the constraint and had a pretty constant load of 4 CPUs. After tuning the SQL, CPU load occasionally spikes at over 1 CPU and typical CPU load is 0.5 processors. The system is now a prime candidate for FT. My limiter became disk I/O which we addressed with more spindles.

Creating a clone of an EBS system for development or testing can be a big pain. Almost always you’re talking about cloning 2 servers (DB and Apps tiers), running autconfig, canceling scheduled concurrent jobs, etc. With VMware Lab Manager, you can have your developers / analysts create linked clones of your Production environment in minutes. No need to run autoconfig, the systems show up with their same instance names, same machine names, etc. This is done behind the scenes by putting the cloned VMs in their own private network. Instead of your copy of PROD taking up 100GB or so for EBS executables and another XXX GB for the database itself, it’s merely just the size of the changes between your production environment and your cloned VM. I have yet to do this in my own environment, so I can’t speak from experience, but what I have seen has been impressive.

It’s been a few years and your server hardware needs to be replaced due to expensive support costs? In the physical world that means building up new database / App servers and having a downtime to copy everything over, reconfigure the networking (and possibly the machine names) and then hoping you aren’t running into some sort of unexpected problem (like the AMD time drift issue a few years back that caused Kernel Panics – see here )… or you could just install vSphere onto your new host, make it visible to your shared storage and vMotion your system live and in production to the new hardware. What’s not to love?

Who really does run Oracle EBS under VMware though? Small little companies? No. Two of my favorite examples are VMware and EMC. Both run Oracle EBS virtualized.