Tag Archives: bad practices

Software. Hardware. Complete.

So recently I was browsing Walmart.com ‘s Electronics section and was amazed at the selection they have.

You want to buy a computer? They’ve got it.

You want an operating system for that computer? They’ve got it.

You want to buy a network switch and cables to link multiple computers together? They’ve got it.

You want to buy 4TB of NAS storage? They’ve got it.

You can get them all from one vendor. The switches say their certified with the OS. The computer says its certified with the OS. Your storage is certified with your OS.

You can even install Oracle database on the hardware and be fully supported by Oracle (thought not certified by Oracle because Oracle doesn’t certify 3rd party hardware).

Have you ever bought a wireless Microsoft keyboard and mouse that didn’t work right with your Microsoft Windows OS running on a PC with a sticker on it that said “Designed for Windows” ? It’s all from one vendor. Just one throat to choke, right?

So why isn’t most of your data center running off of what’s at Walmart?

Because those products might not be leaders in their category.

Because the technical support backing those products might be crappy.

Because the software might not be enterprise ready .

Just because you can buy everything from one company doesn’t mean you should.

Is this good coding practice?

A consultant developer just handed me code with the following exception clause.

EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN 0;
WHEN OTHERS
THEN
RETURN 0;
END XXXXXXXXXXX;

Now, I’m no developer, but what the heck is that point of that mess? Regardless of what exception comes up, always return the same exception code. And if that’s not bad enough, always return 0, meaning everything is OK.

Sigh.

How virtualization can magnify your architecture problems

I recently started working with a new client who has a hosting provider hosting their Oracle database on Linux under VMware. An excellent choice, but this client is experiencing major performance issues – data for forms taking a minute or more to come up is just one example.

As I learned more about their environment I found that virtualization (VMware in this case, though the issue isn’t specific to any particular virtualization vendor) actually made their system performance worse. I know, I’m a VMware groupie (heck a VMware vExpert!) and we’re all amazed I’d write such a thing, but alas, it’s true.

The database is around 80GB in size. Each day this hosting provider would take a full (level 0 incremental) backup of the Oracle database via RMAN. The hosting provider wrote this RMAN backup to the same mount point in the VM that the database uses.

Please take a moment to catch your breath and stop clenching your hands into fists over this very very bad idea.

So why is this such a bad idea? For a couple of reasons.

One is performance – you’re now greatly degrading the performance of your database by writing a full backup to the same disks that are trying to handle database requests. You have, at the least, doubled the amount of I/O going to those disks.

Two is the ability to recover. If your ESX host or your VM experiences an issue (running out of disk space, disk corruption, fire, whatever), you can no longer access the mount point in the VM where you backed up the data.

Best practice for implementing RMAN in a situation like this is to backup your database to another set of disks on another machine in another physical location. A typical example is to have an NFS export on your backup destination server (in another datacenter) and have RMAN write direLet’s say ctly to that NFS mount. This way you aren’t writing your backup to the same disks (thereby not impacting production performance much) and you’re covered in the case of issues with the hardware or VM itself.

So where does VMware fit into this? I mentioned that the hosting provider was also performing VM-level backups. In particular, they were performing VM-level backups at the same time they were running RMAN backups. All to the same set of disks.

Now I’ve got the VMware Admins and the Oracle DBAs cringing.

When you initiate a VM level backup, VMware takes a snapshot of the VM. This means it makes a delta file on the same ESX datastore and stops writing to the VMDK(s) that make up the VM. All changes to the VM get written to the delta file instead. That delta file can grow (8 megs at a time) up till it’s the same size as the original VMDK.

When you are taking a VM level backup, you want to choose a time when you’re not doing many writes to the VM. This way the delta file won’t grow so big that you could run out of space on the datastore (LUN) and your performance impact is decreased.

So here they are writing their full Oracle backup of 80GB out to a mount point inside their VM. That’s 80GB of writes you’re doing. VMware see those writes and has to write them to it’s snapshot (delta file). So now not only are you serving up database queries on your disks, you’re also scanning every block of your database on those disks for changes (this database did not employ Oracle Changed Block Tracking), you’re writing a full RMAN backup to those disks and VMware is having to copy all those writes into a delta file on those same disks.

Virtualization can be wonderful and solve or simplify many of the issues an administrator faces, but it can also magnify fundamental architecture flaws.

Oracle Agile doesn’t follow Oracle best practices

So right now I’m doing an upgrade of one of our test systems from Agile Advantage SP4 on Windows to Agile 9.3.0.1 on Linux. It’s quite the involved process and I’ll post more at length about it later. However, two things I just finished fixing that saddens me:

Oracle doesn’t follow Oracle’s own best practices with the seeded Agile 9.3 database. I have no idea why they did this. It’s not because this best practice is new – it’s been the recommendation since I learned it back in 1997 with Oracle 7.3.

Agile 9.3.0.1 lays down a database with 4 redo log file groups, each with 1 200MB member. Oracle best practices with redo logs is 3 redo log file groups each with 2 members. Depending on the system I/O characteristics you may add more redo log groups and the size is highly dependent on the I/O characteristics. All Oracle databases should at least have 2 redo log groups each with 2 members. So instead of the minimum 2×2 or best practice 3×2 configuration, Oracle provides one of their products with a 4×1 configuration which is just asking for issues.

On a related note, the Oracle Agile 9.3.0.1 database also came with an UNDO tablespace that’s sized to autoextend automatically until it fills up the file system. Best practice is to NEVER set an UNDO tablespace to autoextend because a poorly written query can possibly use all the disk space on the system.

Seriously Oracle, what were you thinking?