Monthly Archives: June 2010

Is this good coding practice?

A consultant developer just handed me code with the following exception clause.

EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN 0;
WHEN OTHERS
THEN
RETURN 0;
END XXXXXXXXXXX;

Now, I’m no developer, but what the heck is that point of that mess? Regardless of what exception comes up, always return the same exception code. And if that’s not bad enough, always return 0, meaning everything is OK.

Sigh.

Scrambling of HR data in an Oracle EBS instance

With our most recent upgrade, we’ve implemented some of the Human Resources modules.

We’re now storing some sensitive employee data (stuff like salaries, Social Security Numbers, banking info, etc.) and one of the business mandates is to scramble this data in our cloned instances.

I knew from reading Steven Chan’s blog (this article by Sanchit Jindal) that Oracle offers two plugins for Oracle Enterprise Manager (OEM) that might serve our needs. These are the Data Masking Pack and the Application Management Pack’s for E-Business Suite.

While reviewing which of these might be best for our environment, I made note of the following resources, so for those interested:

Application Management Pack for E-Business Suite
Data Sheet- http://www.oracle.com/technology/products/oem/pdf/apps_mgmt_ebiz.pdf
Download- My Oracle Support, Patches, Search for Patch Number- 8333939

Data Masking Pack
Data Sheet- http://www.oracle.com/technology/products/manageability/database/pdf/ds/data-masking-pack-11gr2-datasheet.pdf
Canned Demo- http://download.oracle.com/technology/products/oem/screenwatches/data_masking/index.html
Download- http://www.oracle.com/technology/software/products/oem/index.html

I also found the comparison matrix at the Steven Chan blog post above (direct link is http://blogs.oracle.com/stevenChan/images/comparisonmatrix.jpg ) to be extremely useful.

For an EBS environment, AMP looks to be a better fit – It has pre-defined masking for EBS, which was exactly what I needed since I was going to be modifying data in pre-defined EBS tables / objects. AMP also allows you to hot-clone an Oracle EBS instance. Many newer Apps DBAs don’t have methods / scripts written to do this themselves.

In our environments, AMP would have required some significant and PROD performance impacting changes in order to use the data scrambling.

The data scrambling can only take place as part of a clone and AMP only supports cloning thru RapidClone integration and specifically does NOT support cloning from an RMAN backup. This means that utilizing AMP would require I clone directly from PROD, causing a non-negligible performance impact.

Another issue was cost. AMP is licensed per processor at a list price of $7k per processor. The number of processor licenses needed is obtained by adding up your Database, Application Server and Web Server (if on separate system from Application Server) processors to get the # of processor licenses you need to buy. In our case we utilize 8 Oracle Database licenses and 4 Oracle Application Server / Web Server licenses, so 12 * $7k or $84,000 list price, plus yearly support fees.

As a result of these two issues with AMP, the business asked me to see if I could come up with a cheaper home grown solution that would allow us to utilize our existing cloning strategies and not cost tens of thousands of dollars.

We had 3 main things we needed to scramble: National Identifiers (aka Social Security Numbers or SSNs), Salary data, and banking direct deposit information like routing and account numbers.

National Identifiers (SSNs):

National identifiers are stored in table per_all_people_f . This is NOT the primary key however. The primary key for per_all_people_f is
(PERSON_ID, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE) and each time a person changes positions or leaves / comes back to the company, a new row is added with an appropriate effective date range. This means I had to change the SSN to the same value for each record for a person. For example, Bob might have 4 records in this table due to leaving and coming back or switching positions. In all those records in PROD, he’d have the same SSN of “123-45-6789” and I needed to ensure when I cloned the instance and scrambled the data that his new SSN was consistently the same (but of course different from PROD) in the clone.

First I needed to generate a bunch of random but valid SSNs. There’s actually quite a few rules associated with this (for example, if you receive your SSN while living in Deleware, the first three digits must be 221 or 222) and luckily I found some scripts at fakenamegenerator.com which allowed me to create a long list of fake random SSNs. The fngssn.class.php file is GPL and is available at this link. I also modified their example.php script to call the fngssn.class.php file and output a SSN. I then just looped thru this program however many hundreds or thousands of times I needed, outputting and concatenating the SSNs to a dump file.

I then bring this list of SSNs into an excel spreadsheet as column B.
Column A is update per_all_people_f set national_identifier = ZZ
Column B is my list of fake SSNs
Column C is ZZ where national_identifier = ZZ
Column D is my list of real SSNs – this can be obtained with the following SQL: select distinct(national_identifier) from per_all_people_f where national_identifier IS NOT NULL;
Column E is ZZ;

I then save this spreadsheet out to a plain text file and edit the file, replacing all the ZZ with ‘ (Excel handles ‘ weirdly, so I just use vi afterward to do a global replace of ZZ with ‘ – it’s one command and takes a couple seconds).

At this point I’m left with a sql file with commands like
update per_all_people_f set national_identifier = ‘567-42-3477’ where national_identifier = ‘123-45-6789’;

I just need to run these and poof, the SSNs are now scrambled.

Salary data:

This was the easiest to change. Some people may get paid hourly, some salary. In either case though, their salary or hourly rate is stored in per_pay_proposals.proposed_salary_n . For someone that gets a salary of $100k a year this field would have 100000 and for someone paid $10 an hour the field would have 10 . It’s perfectly ok for people to have the same salary in real life, and I figured for the purpose of our scrambled clone, I’d go with a socialist ideal and just gave everyone the same value of 50. Admittedly, this means people who are salary now get $50 a year while the hourly people get $50 an hour. It’s good to be an hourly employee, at least in my scrambled instances and maybe now the hourly employees can afford health care!

update per_pay_proposals set proposed_salary_n = 50 where proposed_salary_n IS NOT NULL;

Banking data:

There are two parts to the banking data: the routing number (think of it as the bank number) and the account number. These are the two sets of numbers at the bottom of your checks (there’s also the check number as the third set of numbers on your checks but that’s not relevant).

The pay_external_accounts table holds the routing and account number info in some of the segment columns – where exactly depends on your implementation. We only wanted to update this data for people’s personal accounts, so our SQL looks like this:

update pay_external_accounts
set segment6 = ‘44444444444’, segment7 = ‘1111111111’
where external_account_id IN (select b.external_account_id from pay_personal_payment_methods_f a, pay_external_accounts b
where a.external_account_id = b.external_account_id);


There you go, all done. I know it sounds like a lot of manual steps, but once you’ve built the excel spreadsheet and generated your list of fake SSNs, all you really need to do is run the SQL for Column D and paste those values in, change the ZZ to ‘ and run the file. The SQL for salary and banking data take a few seconds each. Call it 10 or 15 minutes total per clone.

Now could this all be better? Sure. I could generate my fake SSNs via a SQL cursor and never need excel or php. But for now, this works. Please feel free to improve on my system and I’ll update the blog with the changes.

Lessons learned from a virtualized Oracle upgrade

So about a week ago, we did a rather massive upgrade at my main client to the Oracle E-Business infrastructure. The main things in this upgrade were:

Licensing modules necessary for us to have a full installation of Oracle HR
Upgrade Oracle database from 11.1.0.7 64-bit to 11.2.0.1 64-bit
Apply all CPU security patches thru Apr 2010
Upgrade memory on DB server from 8G to 12G
Upgrade server side java from 1.6.0_16 to 1.6.0_20
Upgrade client side java from 1.6.0_16 to 1.6.0_20b5 (see this link on why the special b5 version)
Apply approximately 350 (not a typo) individual E-Business patches, for the following things:
o Minimum Baseline Patch Requirements for Extended Support on Oracle E-Business Suite 11.5.10 (Note 883202.1)
o Upgrading from Financials Family Pack F to Family Pack G (FIN_PF.G)
o Recommended 11i Apps patches for all our products
o Java related patches
o Latest DST v11 related patches (see here)
o Implement WebADI

As you might gather from this list, it was a rather large upgrade. The apps patches alone totaled about 10GB of patches once merged into one patch and the backup directory for the merged patches ended up totaling 6GB. Test runs had the upgrade running about 24 hours with 8 CPUs on some scratch disk storage I had in the SAN . Like I mentioned in previous posts, we utilized VMware snapshots on our boxes at various points in the upgrade in case we needed to roll back or experienced an unforeseen issue.

One of the VMware best practices we follow with our VMs is to break the boot “disk” and the data “disk” for our VMs into their own virtual disks. Besides during booting up / shutting down of a VM, the boot disk generally experiences very low traffic. So it’s pretty typical, especially with a replicated SAN system such as ours, to put your boot “disks” (VMDKs) for a bunch of VMs on one VMware datastore, possibly with slower disks, and your data “disks” (VMDKs) on another dedicated datastore. In our case, the boot disk datastore is a 2 disk RAID 1 (mirrored) set with Fiber Channel drives and the data disk datastore is a 9 disk (8+1) RAID 5 datastore of SSDs (aka EFDs aka super super fast disks).

Although I had run multiple dry runs before the upgrade, one thing I failed to notice / realize is that by default VMware snapshots are stored where the VM lives, or more specifically, where the VM’s configuration file lives… in this case on my slowest disks.

This became extremely clear during our large merged patch of 330+ Apps patches – things got slower and slower. At that point, shutting down the VM and moving the snapshots wasn’t really an option. It was just a matter of suffering thru and learning for next time. Luckily the business had fully planned on the upgrade taking 24 hours for the patching even though I expected us to be at roughly 1/2 that time with SSDs.

By the time the upgrade was done and the business analysts had finished their testing and calling the upgrade good (and hence when we were ready to delete the 5 sets of snapshots), the snapshots for my two VMs that utilize about 450GB of space had grown to about 200GB. It took about 5 hours for the snapshots to be merged into the base VMDKs. Although the system was usable during that time, it was quite laggy. Luckily it was still the weekend for most of our users and they weren’t too inclined to utilize Oracle.

On the subject of VMware snapshot deletions, I recently came across two notes that should be of use to other VMware admins
1) With the latest version of vSphere (4.0 Update 2), VMware has greatly improved the speed and efficiency of deleting all the snapshots for a VM. You can read more about it here. Unfortunately at the time of my Oracle upgrade I was on vSphere 4.0 Update 1.
2) When you delete a large snapshot, it will frequently appear to “hang” at 95% – check out this knowledge base article on how to monitor snapshot deletions.

Overall the upgrade was a success and minus the occasional user issues Monday morning (first business day after the upgrade) was pretty much a non-event.

These are the sorts of situations that make sending your people to training, or giving them the time and inclination to read manuals and blogs, so essential. Not as a result of this, but somewhat related, I’ll be attending the VMware vSphere troubleshooting class in the next month or two and will be (assuming I pass the test) earning my VCP and possibly trying to earn a VCAP-DCA by end of year.

How virtualization can magnify your architecture problems

I recently started working with a new client who has a hosting provider hosting their Oracle database on Linux under VMware. An excellent choice, but this client is experiencing major performance issues – data for forms taking a minute or more to come up is just one example.

As I learned more about their environment I found that virtualization (VMware in this case, though the issue isn’t specific to any particular virtualization vendor) actually made their system performance worse. I know, I’m a VMware groupie (heck a VMware vExpert!) and we’re all amazed I’d write such a thing, but alas, it’s true.

The database is around 80GB in size. Each day this hosting provider would take a full (level 0 incremental) backup of the Oracle database via RMAN. The hosting provider wrote this RMAN backup to the same mount point in the VM that the database uses.

Please take a moment to catch your breath and stop clenching your hands into fists over this very very bad idea.

So why is this such a bad idea? For a couple of reasons.

One is performance – you’re now greatly degrading the performance of your database by writing a full backup to the same disks that are trying to handle database requests. You have, at the least, doubled the amount of I/O going to those disks.

Two is the ability to recover. If your ESX host or your VM experiences an issue (running out of disk space, disk corruption, fire, whatever), you can no longer access the mount point in the VM where you backed up the data.

Best practice for implementing RMAN in a situation like this is to backup your database to another set of disks on another machine in another physical location. A typical example is to have an NFS export on your backup destination server (in another datacenter) and have RMAN write direLet’s say ctly to that NFS mount. This way you aren’t writing your backup to the same disks (thereby not impacting production performance much) and you’re covered in the case of issues with the hardware or VM itself.

So where does VMware fit into this? I mentioned that the hosting provider was also performing VM-level backups. In particular, they were performing VM-level backups at the same time they were running RMAN backups. All to the same set of disks.

Now I’ve got the VMware Admins and the Oracle DBAs cringing.

When you initiate a VM level backup, VMware takes a snapshot of the VM. This means it makes a delta file on the same ESX datastore and stops writing to the VMDK(s) that make up the VM. All changes to the VM get written to the delta file instead. That delta file can grow (8 megs at a time) up till it’s the same size as the original VMDK.

When you are taking a VM level backup, you want to choose a time when you’re not doing many writes to the VM. This way the delta file won’t grow so big that you could run out of space on the datastore (LUN) and your performance impact is decreased.

So here they are writing their full Oracle backup of 80GB out to a mount point inside their VM. That’s 80GB of writes you’re doing. VMware see those writes and has to write them to it’s snapshot (delta file). So now not only are you serving up database queries on your disks, you’re also scanning every block of your database on those disks for changes (this database did not employ Oracle Changed Block Tracking), you’re writing a full RMAN backup to those disks and VMware is having to copy all those writes into a delta file on those same disks.

Virtualization can be wonderful and solve or simplify many of the issues an administrator faces, but it can also magnify fundamental architecture flaws.

Why run Oracle E-Business on vSphere?

If I’ve got the dedicated hardware for my Oracle E-Business (EBS) install, why would I want to run it virtualized?

Most of the places I’ve seen typically have dedicated hardware for the Oracle EBS instances – so why virtualize it at all?

One major reason is the ability to rollback the entire environment in the event of a patching issue. My main client has an upcoming upgrade involving a database upgrades (11gR1 to 11gR2) OS level patches (updates to RedHat 5.4), Java updates (going to 1.6.0_20b5) and roughly 300 Oracle EBS patches. Yes, 300. Although we’re still staying at 11.5.10.2 ATG RUP 7, Oracle recently updated their minimum patch list you must be on to be received Extended Support on Apps 11i as of Nov. You can find the full up to date list at Metalink document Minimum Baseline Patch Requirements for Extended Support on Oracle E-Business Suite 11.5.10 (Note 883202.1).

Without virtualization, I’d have to take a full backup of my Database and App Tiers before starting the upgrade and then be prepared to go back to these if we can’t finish the upgrade by a certain time. Depending on your environment, this could be hours, days or not even practical.

Instead, at the start of the upgrade I merely take a snapshot of each VM involved. If I do this with the VMs shut down (i.e. not capturing the memory state of the VM as well) this takes about 10 seconds. Done. In the event we decide to rollback all our upgrade changes, it’s just a quick Revert to the Snapshot (might take as long as a minute per VM – OMG, a whole minute!!), restart the Apps services and poof, you’re done.

If only the upgrades themselves were so easy and foolproof.

Another reason to virtualize is of course getting the best usage of your Oracle licenses. Oracle EBS requires Oracle Enterprise Edition. I’ve blogged previously here about how to do that so I won’t go into that here.

Avoid the RAC tax. According to Oracle’s most recent Global Price list, RAC is an additional $23k per processor on top of Enterprise Edition. RAC makes sense for companies in two situations – 1) When their database must not go down. 2) Systems too large to run on 1 physical box. Sure, there are some installations where EBS requires more horsepower than can be run under 8 processors (the limit of a VM under vSphere 4.0 Update 2), but they are the vast minority. Many companies running EBS that implement RAC because they want to be protected from hardware failures. VMware has two features that protect you here – High Availability (HA) which will restart your VM on another vSphere host if the VM goes down and the much cooler and more RAC like feature, Fault Tolerance (FT). FT will actually run your VM on two ESX hosts simultaneously. They work in an active / passive arrangement – you aren’t having users connecting to both machines and load balancing between the two machines, but instead all users are connecting to one machine and in the event it fails, user sessions and other processes and connections are now active on the other node. Your users don’t notice anything, no disconnections, no restarts, no autoconfig required. It just simply works. The current drawback to FT is that it’s currently limited to 1 CPU VMs – that’s reasonable in my experience with an Apps 11i Forms/Web tier, but can frequently be a show stopper with a database server. However, if you’re willing to leverage the performance tuning features of 11g, you may be able to get past that. It is also rumored that VMware is working on getting FT to work with multiple processor VMs. When they do that it should really put a dent in RAC. Before I started the leveraging the SQL tuning available in 11gR2 and SQL Profiles, my main client system ran with CPU as the constraint and had a pretty constant load of 4 CPUs. After tuning the SQL, CPU load occasionally spikes at over 1 CPU and typical CPU load is 0.5 processors. The system is now a prime candidate for FT. My limiter became disk I/O which we addressed with more spindles.

Creating a clone of an EBS system for development or testing can be a big pain. Almost always you’re talking about cloning 2 servers (DB and Apps tiers), running autconfig, canceling scheduled concurrent jobs, etc. With VMware Lab Manager, you can have your developers / analysts create linked clones of your Production environment in minutes. No need to run autoconfig, the systems show up with their same instance names, same machine names, etc. This is done behind the scenes by putting the cloned VMs in their own private network. Instead of your copy of PROD taking up 100GB or so for EBS executables and another XXX GB for the database itself, it’s merely just the size of the changes between your production environment and your cloned VM. I have yet to do this in my own environment, so I can’t speak from experience, but what I have seen has been impressive.

It’s been a few years and your server hardware needs to be replaced due to expensive support costs? In the physical world that means building up new database / App servers and having a downtime to copy everything over, reconfigure the networking (and possibly the machine names) and then hoping you aren’t running into some sort of unexpected problem (like the AMD time drift issue a few years back that caused Kernel Panics – see here )… or you could just install vSphere onto your new host, make it visible to your shared storage and vMotion your system live and in production to the new hardware. What’s not to love?

Who really does run Oracle EBS under VMware though? Small little companies? No. Two of my favorite examples are VMware and EMC. Both run Oracle EBS virtualized.

Oracle Agile doesn’t follow Oracle best practices

So right now I’m doing an upgrade of one of our test systems from Agile Advantage SP4 on Windows to Agile 9.3.0.1 on Linux. It’s quite the involved process and I’ll post more at length about it later. However, two things I just finished fixing that saddens me:

Oracle doesn’t follow Oracle’s own best practices with the seeded Agile 9.3 database. I have no idea why they did this. It’s not because this best practice is new – it’s been the recommendation since I learned it back in 1997 with Oracle 7.3.

Agile 9.3.0.1 lays down a database with 4 redo log file groups, each with 1 200MB member. Oracle best practices with redo logs is 3 redo log file groups each with 2 members. Depending on the system I/O characteristics you may add more redo log groups and the size is highly dependent on the I/O characteristics. All Oracle databases should at least have 2 redo log groups each with 2 members. So instead of the minimum 2×2 or best practice 3×2 configuration, Oracle provides one of their products with a 4×1 configuration which is just asking for issues.

On a related note, the Oracle Agile 9.3.0.1 database also came with an UNDO tablespace that’s sized to autoextend automatically until it fills up the file system. Best practice is to NEVER set an UNDO tablespace to autoextend because a poorly written query can possibly use all the disk space on the system.

Seriously Oracle, what were you thinking?

A support story with EMC and Oracle

Like I’ve said before, my three main job duties involve Oracle, Linux and VMware.  This also tends to extend to the hardware this all runs on.  I sometimes need to open support tickets on these products and recently had two support experiences I wish to share.  In both cases, my initial support request didn’t get the attention it needed, but in both cases, support managers went above and beyond to give my client and I exemplary support.

In these days of off-shoring, near-shoring, and other other cost avoidance initiatives, I’m reminded that it’s the performance of support individuals who deal with end users that most greatly influence customer opinions.

The first experience involves EMC support with our SAN.  We’ve recently expanded our primary server room.  Part of that expansion involved adding additional UPS resources and an UPS bypass.  For awhile, we were running our SAN on non-redundant power.  When we provided redundant power, one of the power supplies in our SAN failed and I opened a support ticket with EMC.  Somehow the ticket didn’t get routed or assigned correctly and even after placing two additional support calls, here I was two days in and still with a failed power supply.  Not a comfortable feeling.  At that point I reached out to our sales rep (Bryan Coston of PBS Now) at the end of business day who in turn reached out to EMC’s area support representative ( Scott Marnell).  Even though I hadn’t even contacted them till close of business, I received calls and emails from them that night and had the option to get the defective part replaced that night.  It’s that sort of above the call of duty support that makes me very happy to have EMC products in my environment.  These two individuals (and perhaps others behind the scenes) in a couple of hours turned around an unpleasant support situation into a support success.  Thank you PBSNOW and EMC.

The second support experience involves Oracle Support with Oracle Agile software.  I’m currently migrating our Agile system to the latest versions in a test environment.  I’ve run into some issues I’ve been unable to resolve on my own, and opened a couple of technical support tickets with Oracle.  Both tickets ended up being assigned to same analyst.  On one ticket involving data migration errors, I received no updates for a whole week.  On the other ticket, the analyst and I got into a disagreement regarding Oracle’s support of EXPort and IMPort utilities with Oracle Database 11g.  After getting frustrated and not receiving any updates to either issue, including my written request that the requests by duty managed, I called in and requested a call from a duty manager.  Within an hour I had the duty manager on the phone.  Not only did he apologize for the delays and reassign my support tickets to a senior level support analyst, the duty manager (I’m sorry, I didn’t write down his name) also spoke with me at length regarding my experiences with the upgrade and asking specific questions on what could be done to make the upgrade documentation and experience better.  I’ll be posting my experiences and notes from the upgrade once it’s complete.  The analyst that the duty manager assigned to my tickets,  Mehdi Hamzezadeh, was fantastic.  Not only did he take ownership of my support tickets and resolved both issues in a matter of hours, it turns out he wrote many of the data cleanup scripts that Oracle Support provides to their end customers.  I’ve very grateful to the duty manager for assigning Medhi to my tickets and to Medhi for his quick and complete assistance.


	

vExpert 2010

VMware has an advocacy recognition program called VMware vExpert.  The program’s goal is to recognize individuals who significantly contribute to the community of VMware users.  Such individuals include authors, bloggers, VMUG leaders, tool builders, and other IT professionals.

I waited until the last minute to submit my application because I debated if I was even qualified to apply.  This blog is very new.  I’m not the knowledge expert on VMware products like many of the other vExperts.  Heck, I don’t even have VCP (VMware Certified Professional) credentials. I’m one of the IT professionals out there buying the books that many of the other vExperts publish!  The VMware vExpert judges deemed me worthy of the recognition and I’m thrilled and proud to have been awarded it on Friday.

Although I’m not the pure VMware subject matter expert many are, I have and will continue to contribute my subject matter expertise with Oracle products on Linux under VMware.  My main client made the switch to running our Tier 1 production Oracle systems under VMware about 2 years ago.  I believe it was one of the best decisions we ever made, from both a technical and financial perspective.

Since then in addition to my main duties as Sr. Oracle Applications DBA and Sr. Linux Administrator, I’ve been learning and leveraging many of the features VMware virtualization buys us.  I’ve done many reference calls both for Oracle (my main client was one of the first companies in the world to run Oracle 11g database in Production), EMC (we utilize many EMC products in our environment) and VMware and will continue to in the future