Thursday, March 08, 2007

List of job consultants (head hunters) in bangalore

I was going through my hard disk and found this interesting piece of info. All of us are pissed off by 100s of head hunter mails landing in our mailboxes. Confess it , we all curse those mails and most if the time delete it without even reading. Last year I just started collecting them in my yahoo mailbox on a whim! Then one day i just wrote a small script to get the emails of all those head hunters. Most of them are from bangalore (not surprising, considering my location) and a small number are from chennai,hyderabad , pune and delhi region. This list may be helpful for mass mailing to consultants ( Yah I like the idea, Give them back proper ;o)

But first here is the procedure, in case you decide to create your own list

This is the procedure to extract consultant emails

########################################################
# save consultants mails in one separate folder of
yahoo mailbox.
# do this for 1-2 , 3 months.
# Go to yahoo mail options | General preference | and
select 100 mails per page view
# copy+paste all the headers into text file
mailbox-dump-file
#########################################################
# now, before running this program, remove spaces
before and after the @sign
# inside mailbox-dump-file using vim editor
# : 1,$ s/[ ]@/@/g
# and :1,$ s/@[ ]/@/g
# run this script with
# perl consultant.pl | grep "@" >
clist.txt
# To concat 2 or more final lists
# cat clist* | uniq | tee megalist.txt
#
#
open (list,$ARGV[0]) || die " file not found \n" ;
@bucket ;
while ($line = ) { @words = split(/ /, $line);
push(@bucket,$words[0]) ; }
undef %saw;
@saw{@bucket} = ();
@emails = sort keys %saw;
foreach (@emails) { print " $_ \n" ; }

And Here is the list that you can download from esnips

Saturday, March 03, 2007

latest php 5.2.x on old iBook running panther

When I took that 12" iBook in 2005 I never thought it was going to last full 2 years. I had subjected it to pretty rigorous tests installing and compiling a lot of stuff. The age is showing now , nothing hardware wise ;o) but it is becoming increasingly difficult to maintain it as a development machine now. All the documents for all the tools now a day mention tiger only. Most of my libraries are way too old and attempting to install anything usually starts a "I-cant-find-this-also" chain reactions. The last time that I tried was to install ImageMagick with ruby and I gave up after some time. It has been my internet terminal since then.

So today when my brother asked me to install latest PHP on it I was really apprehensive. Now I do not want to start a libraries chase on a Saturday afternoon. PHP is supposed to be a big library with lot of dependencies and I was sure that I will give up this time also. But whoa!!! what a surprise!!! I did the install in 2 hours flat on mac os x 10.3.9! and i can not believe it. Those who find it hard to believe that installing something can take 2 hours have , well, obviously, not installed lot of things in life!

I started with this PHP mac article. Follow the article to the letter. Only difference is I download 2.0.59 instead of 2.2.x series and i chose my prefix to be /opt/apache2. Apache installed nice and smooth. The I downloaded PHP 5.2.1 tarball and did a configure and make. The configuration options I chose were to build with GD and mysql . ( we want image manipulation etc). I did not use all the switches mentioned in the article but YMMV. Here I ran into my first issue of a header file xmlsave.h not found.

Looks like PHP needs latest libxml headers and my panther was missing those. The best help I got from net was reading this blog and also the same PHP mac article mentioned above. So i decided to install libxml also. Install is pretty simple, just run configure and make and you will get libxml installed in /usr/local. Now you have to go back and configure PHP to build against your libxml. There is no need to install new version of libxml in /usr as it may break existing things. To configure PHP you should use --with-libxml-dir=/usr/local option now. Doing so and running make again I ran into my second issue.

The second issue came out to be a documented bug and you can read more on supplied link. I do not know what is the issue but downloading the latest snapshot and compiling again built the PHP 5.2.x for me. I had installed JPEG and PNG libraries using fink some time back. The GD built fine against those libraries. So yipeee now we have latest PHP and the iBook gets an elevated status of development machine.






importance of UPSERT database operation

This week I ran into upserts. I have to create a script/tool for uploading data into databases. The source data format is pretty "loose" and allows the users a lot of flexibility in how they generate it. The upside of this approach is that you are not imposing lot of rules when users are generating source data. BTW lot of rules piss people off , things like you can not have comments in between, you can not skip lines, you can not have empty lines, you can only have a certain format etc. So we tend to make it easy for the people to generate data. However there is a downside too.

Downside is , you need upsert kind of operation. If the data that user supplied is already there then do not try to insert it again , just update with whatever the user has supplied. Now doing this kind of operation in application layer would be tough.

To avoid duplicate data insertion from your program you first need to find out if the user supplied data already exists or not. So you either fire a query to do lookup on supplied fields or you load all the keys and fields in the beginning. Both are expensive operations atleast for large tables. In first case you waste a lot of time and in second case you read and keep carrying a lot of data. what we need is some in built mechanism from database.

Surprisingly, mysql provides quite a number of options on what to do when a duplicate row of data (some unique key violation) happens.
  • You can ignore the insert using INSERT IGNORE INTO
  • you can replace the row with new data using ( need to find more)
  • you can update a counter using INSERT INTO ON DUPLICATE UPDATE

All sort of things are possible with mysql. This article gives more information. Oracle and DB2 provide a MERGE operation. You can decide the matching on existing columns and then either insert or update. All this is very database specific and puritans will shout no "portability" but what the heck! Do you change your database every night? These features are powerful features and meant to be used. I am pretty comfortable using them , let the database care about the data, I do not want to do all the house keeping in my application code.

If you are using hibernate then you have to fire custom SQL queries on JDBC connection. You can define a <sql-insert> element for your entity but I am not sure what version of hibernate it works with plus I do not want to override the entity insert in all the cases. Only the data upload case is special. I also need to find out a way to provide named parameters as part of custom SQL queries in hibernate.
© Life of a third world developer
Maira Gall