Anonimatron: Quick Start

Anonymous customerAfter reading my last blogpost on Anonimatron, you must have asked yourself “Great, but how do I actually use Anonimatron to de-personalize my database”? I tried my best to make basic Anonimatron configuration as self-explanatory as possible, just start it without any command line arguments and it will tell you.

Less adventurous or in a big hurry? This blogpost will show how simple it is to install and configure Anonimatron on an example MySQL database.

Setting up a test database
To demonstrate what Anonimatron can do to your data, we will create a little test database to play with. Anonimatron connects to all kinds of databases, including MySQL, Postgress and Oracle. In this example, we use MySQL. Here are all the statements you need to create a little database with 2 tables, a user, and some “private” data:

create database mydb;
create user myuser identified by 'mypassword';
grant all privileges on *.* to 'myuser'@'localhost'
       identified by 'mypassword' with grant option;

create table mydb.userdata (
         id int not null auto_increment primary key,
         firstname varchar(20),
         lastname varchar(20),
         creditcardnr varchar(20)
       );

create table mydb.lastnames (
         id int not null auto_increment primary key,
         lastname varchar(20)
       );

insert into mydb.userdata
       (firstname,lastname, creditcardnr)
values ('Homer', 'Simpson','1234'),
       ('Marge', 'Simpson','5678'),
       ('Ned', 'Flanders','3456'),
       ('Charles', 'Burns','3456');

insert into mydb.lastnames (lastname)
values ('Simpson'), ('Flanders'), ('Burns');

After all the hard work, you should be able to connect to the database with ‘myuser’ and see the “private” data in there:

select * from mydb.userdata;
id firstname lastname creditcardnr
1 Homer Simpson 1234
2 Marge Simpson 5678
3 Ned Flanders 3456
4 Charles Burns 3456
select * from mydb.lastnames;
id lastname
1 Simpson
2 Flanders
3 Burns

Let’s pretend that this is a copy of a production database, and you want to de-personalize or anonymize it.

Installing Anonimatron
To anonymize your data,  download Anonimatron and unzip it in a directory of your choice. You should find an “anonimatron.sh” and “anonimatron.bat” file there. Depending on your system, run it without arguments. If you have java installed on your system you should see something like this:

$ ./anonimatron.sh

This is Anonimatron 1.7, a command line tool to consistently
replace live data in your database with data data which
can not be traced back to the original data.
You can use this tool to transform a dump from a production
database into a large representative dataset you can
share with your development and test team.
Use the -configexample command line option to get an idea of
what your configuration file needs to look like.

usage: java -jar anonimatron.jar
 -config      The XML Configuration file describing what to
                   anonymize.
 -configexample    Prints out a demo/template configuration file.
 -dryrun           Do not make changes to the database.
 -synonyms    The XML file to read/write synonyms to. If the file
                   does not exist it will be created.

Victory! You’ve installed Anonimatron. Yes, life can really be that easy.

Configuring Anonimatron
Next, we need to tell Anonimatron how to connect to our database, and which tables and columns to process and how. The hardest part of this is probably figuring out how to create a jdbc connect string. Anonimatron can help you with that. If you start anonimatron with the “-configexample” parameter, it will scan the jdbcdrivers directory for available and supported drivers, and will show you how a jdbc URL for any of these should look:

$ ./anonimatron.sh -configexample

Supported Database URL formats:
Jdbc URL format                        By Driver
jdbc:oracle:oci8:@[SID]                oracle.jdbc.driver.OracleDriver
jdbc:oracle:thin:@[HOST]:[PORT]:[SID]  oracle.jdbc.driver.OracleDriver
jdbc:oracle:oci:@[SID]                 oracle.jdbc.driver.OracleDriver
jdbc:postgresql://[HOST]:[PORT]/[DB]   org.postgresql.Driver
jdbc:mysql://[HOST]:[PORT]/[DB]        org.gjt.mm.mysql.Driver

Anonimatron will try to autodetect drivers which are
stored in the lib directory. Add you driver there.

...

In this example, we have just created a MySQL database, so we’ll use that URL and fill in the blanks. We use the rest of the configuration example and with some copy-pasting we come up with the following configuration:

<?xml version="1.0" encoding="UTF-8"?>
<configuration jdbcurl="jdbc:mysql://localhost:3306/mydb"
               userid="mydata" password="mypassword">

  <table name="userdata">
    <column name="firstname" type="ROMAN_NAME" />
    <column name="lastname" type="ELVEN_NAME" />
    <column name="creditcardnr" type="RANDOMDIGITS"/>
  </table>

  <table name="lastnames">
    <column name="lastname" type="ELVEN_NAME" />
  </table>
</configuration>

This simple configuration file will tell Anonimatron the following things:

  • How to connect to the mydb database
  • The values in username.firstname should be processed with the ROMAN_NAME Anonymizer. Anonymizers are little plugins which are able to generate data with certain properties, sometimes based on the original data. This particular Anonymizer generates Roman Names by picking syllables from a builtin list.
  • The values in username.lastname should be replaced by Elven names. This is almost identical to Roman names but with a different Syllable file.
  • The userdata.creditcardnr should be replaced by a set of random digits of the same length. In our case, 4 digits will be replaced by 4 differnt digits. Should you really need numbers which are semantically correct credit card numbers, you could write your own Anonymizer plugin. We’ll cover that in a later blogpost.
  • The lastnames.lastname column is also an Elven name. Because of the way Anonimatron handles data, strings in this column will be processed exactly the same way as the userdata.lastname column, as we will see below.

Anonymize!
Now that we have configured Anonimatron, it’s time to start it up and tell it to use our configuration file and store synonyms. It will be finished in the blink of an eye, and your output should look somewhat like this:

$ ./anonimatron.sh -config config.xml -synonyms synonyms.xml

Anonymization process started

Jdbc url      : jdbc:mysql://localhost:3306/mydb
Database user : mydata
To do         : 2 tables.

Anonymizing table 'lastnames', total progress  [100%, ETA 11:36:56 PM]
Anonymization process completed.

Writing Synonyms to synonyms.xml ...[done].

If Anonimatron complains or does not run, you might want to check out the anonimatron.log file for clues. Most log entries will be pretty self-explanatory. If not, please register an issue and we’ll see what we can do to fix that.

Let’s check the results. First, we can check what synonyms were generated by looking into the synonyms.xml file we told it to create:

$ cat synonyms.xml
<?xml version="1.0" encoding="UTF-8"?>
<synonyms>
  <string type="ELVEN_NAME" from="QnVybnM=" to="RGhvZWxsaWFu"/>
  <string type="ELVEN_NAME" from="RmxhbmRlcnM=" to="QWhlbGhhbGRldGhlc3M="/>
  <string type="ELVEN_NAME" from="U2ltcHNvbg==" to="QWhkdWxlbGhhbGVs"/>
  <string type="ROMAN_NAME" from="SG9tZXI=" to="QmVudWxhdWJlbGl1cw=="/>
  <string type="ROMAN_NAME" from="TmVk" to="RWN1cw=="/>
  <string type="ROMAN_NAME" from="TWFyZ2U=" to="QWxudWxhdWN1cw=="/>
  <string type="ROMAN_NAME" from="Q2hhcmxlcw==" to="QWxudXM="/>
  <string type="RANDOMDIGITS" from="NTY3OA==" to="ODY5OA=="/>
  <string type="RANDOMDIGITS" from="MTIzNA==" to="NDM0Mw=="/>
  <string type="RANDOMDIGITS" from="MzQ1Ng==" to="NjEyNQ=="/>
</synonyms>

You’ll note that the “from” and “to” values look a bit garbled. This is because Anonimatron used Base64 encoding to store values of synonyms. This is so that we can store the values bit for bit, without worying about encodings. If you wanted to, you could easily decode these strings by writing a little program.

Even without decoding we can see some interesting things about this file. Remember we had lastnames configured as Elven names? Although we have 4 entries in the username table, we see only 3 Elven names. That is because Homer and Marge have the same lastname. These same synonyms are also used by the lastnames.lastname column. The same goes for the creditcard numbers. As might have noticed that Ned Flanders and Charles Burns used the same credit card number in this system.

Enough staring at XML. Let’s get to what matters most: our database. First, let’s see if the names and numbers have changed:

select * from mydb.userdata;
id firstname lastname creditcardnr
1 Benulaubelius Ahdulelhalel 4343
2 Alnulaucus Ahdulelhalel 8698
3 Ecus Ahelhaldethess 6125
4 Alnus Dhoellian 6125

That looks much better. The first and lastnames are (almost) pronounceable names which probably would look realistic in a screenshot or testcase, yet there is no trace left of the original data that was there. When we check the lastnames table we see that lastnames are being translated consistently with the userdata table:

select * from mydb.lastnames;
id lastname
1 Ahdulelhalel
2 Ahelhaldethess
3 Dhoellian

This consistent behavior makes sure that queries where the userdata table and the lastnames table are joined based on lastname will still work.

If you want to play some more with Anonimatron, recreate the original tables, add additional (overlapping) data and re-run anonimatron against it with the synonym file you just created. You’ll notice that any “Simpson” lastname will be translated to “Ahdulelhalel” constently on each run. If you don’t want that to happen, simply throw away the synonym file or don’t tell Anonimatron to use it. You can also generate the synonym file first without doing anything to the database by using the -dryrun option, and later do the same run based on the synonyms generated earlier.

Remember: The private data “moved” from the database into the synonyms.xml file we created. So that becomes the new data to watch. Store it in a safe place where nobody can access it.

Have fun experimenting!

About these ads

16 Responses to Anonimatron: Quick Start

  1. Mitch Amiano says:

    Looked promising, but when I download the latest stable, the anonimatron.sh” is not in the top-level folder of the zip file; it has been moved under resources/scripts/. The script throws: Exception in thread “main” java.lang.NoClassDefFoundError: com/rolfje/anonimatron/Anonimatron ; so it appears that it will not just run out of the box right now (running OSX, java 1.6.0_65). (Tried filing a bug, but SourceForge apparently dropped my account.)

  2. InterestedTester says:

    Any clues what’s happening here, please? Just downloaded the latest Java from their website and installed it, unzipped the download into C:\Program Files\Anonimatron\anonimatron-1.7 and tried to run the batch file. Win XP SP3.
    Thanks….

    C:\Program Files\Anonimatron\anonimatron-1.7>anonimatron.bat
    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    Error: Could not create the Java Virtual Machine.
    Error: A fatal exception has occurred. Program will exit.

    C:\Program Files\Anonimatron\anonimatron-1.7>

  3. InterestedTester says:

    I’ve done a little searching (I’m new to Java) and it’s a memory issue. The most I can get going without an error is :
    java -Xmx1600m …
    Is this enough to run your program successfully? Thanks

  4. InterestedTester says:

    OK, I’ve got it to work and I replaced a whole column of names with ROMAN_NAMEs – thanks.
    However, maybe I’m being thick but I can’t find a reference list of all the “anonymizers” (i.e. ROMAN_NAME, ELVEN_NAME etc) anywhere online or in any readme.txt files – does such a thing exist anywhere?
    Thanks

    • rolfje says:

      Listing all the registered Anonymizers is actually very easy: just run “Anonimatron.bat -configexample”. Anonimatron will give you an example configuration, and the configuration will contain all Anonimizers which are currently registered.

      In the configuration, you will find a “CUSTOM_TYPES_TABLE” configuration which contains all “custom”, or specific Anonimizers. The “DEFAULT_TYPES_TABLE” will show you the default types Anonimatron uses when you don’t specify what Anonimizer to use. The “DISCRIMINATOR_TABLE” shows you how to make Anonimatron switch between Anonimizers based on the contents of a column.

      I realize that you were looking for this information, should I add a clearification in the configexample, or should that be in the README, or both?

      • InterestedTester says:

        Both, I’d say. Would it be easier to put as much as possible in the README and have the the -configexample simply spew out the contents of that txt file, rather than duplicate the effort?

        The main README could also usefully mention that there is another one in the ‘anonymizers’ folder.

        Many Thanks, it’s been interesting. I might roll my own solution using the MS AdventureWorks data, but it’s handy to know this is available. Good Luck with your endeavors.

  5. Phillip R says:

    Working great on a DB I’m trying… except for a date field in the format YYYY-MM-DD. I can’t seem to get this converted using any of the built-in types, including DATE, always erroring with “java.sql.Date cannot be cast to java.lang.String” or similar.
    Any suggestions for this java and mysql newbie?
    Thanks for a super handy tool!

  6. Akshay says:

    [root@localhost scripts]# sh anonimatron.sh
    Exception in thread “main” java.lang.NoClassDefFoundError: com/rolfje/anonimatron/Anonimatron

    please help me out in resolving this issue.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 30 other followers