Tag Archives: scramble

An alternative to scrambling data: Restricting access with Virtual Private Database (VPD)

Back in June, I wrote a blog post on scrambling HR data in our EBS instances . Although effective, it was a bit of a kludge – it involved an excel spreadsheet, and giving everyone the same salary and banking info.

As we went with this solution in our development and test environments, we ran into some issues where the salary data would totally screw up the benefits data as it’s calculated as a percentage of salary. The solution was effective at keeping the data secure, but it wasn’t optimal. After some investigation, we turned to Oracle VPD – Virtual Private Database – functionality. With this we are able to restrict access to certain columns (such as salary or national identifiers) to all but necessary users. With an EBS database, where every connection is connect as APPS, this poses special considerations.

I’ll cover the technical details of implementing VPD in an EBS environment. Then I’ll talk about the changes you need to make to keep things functional for your business analysts and yet keep the data secure.

First it was necessary to create a policy function. In our case this is very generic, basically just returning the predicate.

CREATE OR REPLACE FUNCTION “APPS”.”LUM_HIDE_HR_COLS” (schema in varchar2, tab in varchar2)

return varchar2 as predicate varchar2(8) default ‘1=2’;

begin

return predicate;

end;

/

Next we add a policy on the column we want to restrict access to point it at the policy function we created

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PER_ALL_PEOPLE_F’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘NATIONAL_IDENTIFIER,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;

/

In this case we’re ADDing a policy and ENABLing it, applying the LUM_HIDE_HR_COLS function on the NATIONAL_IDENTIFIER column of table PER_ALL_PEOPLE_F in the HR schema, preventing users from SELECTing data and stating that this is for ALL_ROWS.

Once we issued that, all users (besides SYS and SYSTEM) will get NULLs when they select NATIONAL_IDENTIFIER from that table. That took care of our social security number concern.

We also set up additional policies on other data:

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PER_PAY_PROPOSALS’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘PROPOSED_SALARY_N,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;

/

The above policy restricts access to PROPOSED_SALARY_N column of HR.PER_PAY_PROPOSALS. That took care of our salary data concern.

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PAY_EXTERNAL_ACCOUNTS’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘SEGMENT3,SEGMENT4,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;
/

The above policy restricts access to the SEGMENT3 and SEGMENT4 columns of HR.PAY_EXTERNAL_ACCOUNTS. That took care of our banking data concern.

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PER_ADDRESSES’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘ADDRESS_LINE1,ADDRESS_LINE2,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;
/
The above policy restricts access to ADDRESS_LINE1 and ADDRESS_LINE2 columns of HR.PER_ADDRESSES. That took care of our concern of employees looking up addresses of other employees.

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PER_ALL_ASSIGNMENTS_F’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘GRADE_ID,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;
/

The above policy restricts access to the GRADE_ID column of HR.PER_ALL_ASSIGNMENTS_F. That took care of our concern about employees looking up the pay grade of other employees.

begin dbms_rls.add_policy(object_schema => ‘HR’, object_name => ‘PAY_ELEMENT_ENTRY_VALUES_F’, policy_name => ‘LUM_HIDE_HR_COLS’, function_schema => ‘APPS’, policy_function => ‘LUM_HIDE_HR_COLS’, statement_types => ‘select,’, update_check => FALSE , enable => TRUE , static_policy => FALSE , policy_type => dbms_rls.STATIC , long_predicate => FALSE , sec_relevant_cols => ‘SCREEN_ENTRY_VALUE,’ , sec_relevant_cols_opt => DBMS_RLS.ALL_ROWS ); end;
/

The above policy restricts access to the SCREEN_ENTRY_VALUE column of HR.PAY_ELEMENT_ENTRY_VALUES_F. That took care of our figuring out salary based on insurance coverage concern.

That’s all there is to it. You can just issue the code above in an Apps 11i instance and at that point no user besides SYS and SYSTEM can see that data.
Now that you’ve handled the technical details, there’s the issue of cloning the instances and being able to test HR functionality in development environments while still restricting the data. Here’s what we do.
In PROD, in addition to the function and policies listed above, we create a read-only user calls APPSHR:

create user APPSHR identified by XXXXXXXXXXXXX;
GRANT ALTER SESSION TO “APPSHR”;
GRANT CREATE DATABASE LINK TO “APPSHR”;
GRANT CREATE PUBLIC SYNONYM TO “APPSHR”;
GRANT CREATE SESSION TO “APPSHR”;
GRANT CREATE SYNONYM TO “APPSHR”;
GRANT EXECUTE ANY PROCEDURE TO “APPSHR”;
GRANT SELECT ANY DICTIONARY TO “APPSHR”;
GRANT SELECT ANY SEQUENCE TO “APPSHR”;
GRANT SELECT ANY TABLE TO “APPSHR”;
GRANT UNLIMITED TABLESPACE TO “APPSHR”;
GRANT “RESOURCE” TO “APPSHR”;
ALTER USER “APPSHR” DEFAULT ROLE ALL;
That APPSHR user now has the ability to select any data in the system but it’s read only (no updating).
We then exempt the APPS and APPSHR user in PROD ONLY from the policies we created:

GRANT EXEMPT ACCESS POLICY to APPS;
GRANT EXEMPT ACCESS POLICY to APPSHR;

At this point only users who connect at the database level as APPS (that would be all forms based users) and APPSHR (our HR analysts) can see the restricted data. The APPS password in our PROD environment is known only to the DBAs. The APPSHR password is known only to the HR Business Analysts / HR Developers / DBAs. All other business analysts have access to another read-only account called APPSXXXX that is NOT exempt from the security policies. With that APPSXXXX account, the regular business analysts and developers can query the database directly for all but the restricted data and can access any data via the forms that their forms responsibilities allow.
When we clone an instance, we don’t have to do any scrambling. For all of our DEV, TEST and PSUP (Production Support) instances we merely have to change the APPS password to a commonly known password and issue

REVOKE EXEMPT ACCESS POLICY FROM APPS;

Now everyone can read and write data as APPS, but because APPS now is restricted by the policies we put in place, APPS can’t see the sensitive data. The only time this causes a problem is for the HR Business Analyst or HR Developers who need a non-PROD place to work issues or develop code. For them we created a special HR cloned instance with the same security setup as PROD but with the APPS password known to the HR Analysts and HR Developers.

This solution has worked out much better for us than the scrambling. Give it a shot and let me know how it goes for you in the feedback!

Scrambling of HR data in an Oracle EBS instance

With our most recent upgrade, we’ve implemented some of the Human Resources modules.

We’re now storing some sensitive employee data (stuff like salaries, Social Security Numbers, banking info, etc.) and one of the business mandates is to scramble this data in our cloned instances.

I knew from reading Steven Chan’s blog (this article by Sanchit Jindal) that Oracle offers two plugins for Oracle Enterprise Manager (OEM) that might serve our needs. These are the Data Masking Pack and the Application Management Pack’s for E-Business Suite.

While reviewing which of these might be best for our environment, I made note of the following resources, so for those interested:

Application Management Pack for E-Business Suite
Data Sheet- http://www.oracle.com/technology/products/oem/pdf/apps_mgmt_ebiz.pdf
Download- My Oracle Support, Patches, Search for Patch Number- 8333939

Data Masking Pack
Data Sheet- http://www.oracle.com/technology/products/manageability/database/pdf/ds/data-masking-pack-11gr2-datasheet.pdf
Canned Demo- http://download.oracle.com/technology/products/oem/screenwatches/data_masking/index.html
Download- http://www.oracle.com/technology/software/products/oem/index.html

I also found the comparison matrix at the Steven Chan blog post above (direct link is http://blogs.oracle.com/stevenChan/images/comparisonmatrix.jpg ) to be extremely useful.

For an EBS environment, AMP looks to be a better fit – It has pre-defined masking for EBS, which was exactly what I needed since I was going to be modifying data in pre-defined EBS tables / objects. AMP also allows you to hot-clone an Oracle EBS instance. Many newer Apps DBAs don’t have methods / scripts written to do this themselves.

In our environments, AMP would have required some significant and PROD performance impacting changes in order to use the data scrambling.

The data scrambling can only take place as part of a clone and AMP only supports cloning thru RapidClone integration and specifically does NOT support cloning from an RMAN backup. This means that utilizing AMP would require I clone directly from PROD, causing a non-negligible performance impact.

Another issue was cost. AMP is licensed per processor at a list price of $7k per processor. The number of processor licenses needed is obtained by adding up your Database, Application Server and Web Server (if on separate system from Application Server) processors to get the # of processor licenses you need to buy. In our case we utilize 8 Oracle Database licenses and 4 Oracle Application Server / Web Server licenses, so 12 * $7k or $84,000 list price, plus yearly support fees.

As a result of these two issues with AMP, the business asked me to see if I could come up with a cheaper home grown solution that would allow us to utilize our existing cloning strategies and not cost tens of thousands of dollars.

We had 3 main things we needed to scramble: National Identifiers (aka Social Security Numbers or SSNs), Salary data, and banking direct deposit information like routing and account numbers.

National Identifiers (SSNs):

National identifiers are stored in table per_all_people_f . This is NOT the primary key however. The primary key for per_all_people_f is
(PERSON_ID, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE) and each time a person changes positions or leaves / comes back to the company, a new row is added with an appropriate effective date range. This means I had to change the SSN to the same value for each record for a person. For example, Bob might have 4 records in this table due to leaving and coming back or switching positions. In all those records in PROD, he’d have the same SSN of “123-45-6789” and I needed to ensure when I cloned the instance and scrambled the data that his new SSN was consistently the same (but of course different from PROD) in the clone.

First I needed to generate a bunch of random but valid SSNs. There’s actually quite a few rules associated with this (for example, if you receive your SSN while living in Deleware, the first three digits must be 221 or 222) and luckily I found some scripts at fakenamegenerator.com which allowed me to create a long list of fake random SSNs. The fngssn.class.php file is GPL and is available at this link. I also modified their example.php script to call the fngssn.class.php file and output a SSN. I then just looped thru this program however many hundreds or thousands of times I needed, outputting and concatenating the SSNs to a dump file.

I then bring this list of SSNs into an excel spreadsheet as column B.
Column A is update per_all_people_f set national_identifier = ZZ
Column B is my list of fake SSNs
Column C is ZZ where national_identifier = ZZ
Column D is my list of real SSNs – this can be obtained with the following SQL: select distinct(national_identifier) from per_all_people_f where national_identifier IS NOT NULL;
Column E is ZZ;

I then save this spreadsheet out to a plain text file and edit the file, replacing all the ZZ with ‘ (Excel handles ‘ weirdly, so I just use vi afterward to do a global replace of ZZ with ‘ – it’s one command and takes a couple seconds).

At this point I’m left with a sql file with commands like
update per_all_people_f set national_identifier = ‘567-42-3477’ where national_identifier = ‘123-45-6789’;

I just need to run these and poof, the SSNs are now scrambled.

Salary data:

This was the easiest to change. Some people may get paid hourly, some salary. In either case though, their salary or hourly rate is stored in per_pay_proposals.proposed_salary_n . For someone that gets a salary of $100k a year this field would have 100000 and for someone paid $10 an hour the field would have 10 . It’s perfectly ok for people to have the same salary in real life, and I figured for the purpose of our scrambled clone, I’d go with a socialist ideal and just gave everyone the same value of 50. Admittedly, this means people who are salary now get $50 a year while the hourly people get $50 an hour. It’s good to be an hourly employee, at least in my scrambled instances and maybe now the hourly employees can afford health care!

update per_pay_proposals set proposed_salary_n = 50 where proposed_salary_n IS NOT NULL;

Banking data:

There are two parts to the banking data: the routing number (think of it as the bank number) and the account number. These are the two sets of numbers at the bottom of your checks (there’s also the check number as the third set of numbers on your checks but that’s not relevant).

The pay_external_accounts table holds the routing and account number info in some of the segment columns – where exactly depends on your implementation. We only wanted to update this data for people’s personal accounts, so our SQL looks like this:

update pay_external_accounts
set segment6 = ‘44444444444’, segment7 = ‘1111111111’
where external_account_id IN (select b.external_account_id from pay_personal_payment_methods_f a, pay_external_accounts b
where a.external_account_id = b.external_account_id);


There you go, all done. I know it sounds like a lot of manual steps, but once you’ve built the excel spreadsheet and generated your list of fake SSNs, all you really need to do is run the SQL for Column D and paste those values in, change the ZZ to ‘ and run the file. The SQL for salary and banking data take a few seconds each. Call it 10 or 15 minutes total per clone.

Now could this all be better? Sure. I could generate my fake SSNs via a SQL cursor and never need excel or php. But for now, this works. Please feel free to improve on my system and I’ll update the blog with the changes.