Monday, January 29, 2007

Avoid database insert on page refresh using synchronizer(Deja vu) token pattern

I am writing the DAO (data access objects) for our web application and it is not done yet. I have not plugged-in any data duplication prevention logic and that means you can have duplicate rows in database. Now, duplicate row prevention is a topic in itself and a lot of time you do not want to plug it in because duplicate prevention is time consuming task. Doing a column by column comparison on a large table may not perform very well. Anyways that is not the main story. Main story is, my data API allow duplicate rows, so if a user re-submits some form page by clicking refresh button then we insert a duplicate row in database quite unintentionally.

We are using struts framework for page flows. Page refresh calls the data flushing action again and we want to prevent that. I did a bit of goggling around and looks like everyone is using "synchronizer token pattern" to avoid this problem. The scheme is following:

When a page is requested, generate a token and render this token as part of the page. (maybe as a hidden field). Store this same token in session using the page as key. when the page is submitted, you compare the token from page to the one stored in session, if both of them match then allow submit. If submit is successful then clear the token from session (using page as key).

Now, lets say the user hits refresh. The page still has the old token, so the token from page would not match the token in session and we do not allow submits. If user requests that page again through normal work flow then we re-generate a new token and everything is okay dokay ;o) if I am not feeling very lazy i will try this pattern with a simple form and servlet.

Struts has built in support for synchronizer tokens. Some useful links

Sunday, January 21, 2007

create myspace.com in 4 hours? Part II

Okay, forget myspace.com. Lets see what it takes to create and run a very simple online application. Our requirements are simple.(?)

Requirements

  • Let the user push some data via a web form to a database table.
  • Later the user can query back the same data.
  • One DB Table
  • The application is accessible online on some domain
  • Application works 24x7 with small maintenance windows
  • Anyone can access it
Now lets try to see what all steps are involved in development starting with a virgin machine. I am going to use perl cgi.

First up, we need to set up a development stack . we need a database, a web server and some programming toolkit of your choice. So download mysql , apache and download Camel Pack perl. If you are on linux boxes, most of the stack is ready made ( I do most of my development on win XP box) But creating the right stack is definitely an activity that takes up some time . if you may have perl but not the modules you want. if you may have java but not all the libraries you want. You may have version x when you want version y and so on.

After downloads, come the configuration part. Who has run apache or MYSQL just out of box? You do have to do configuration changes. even creating a single database, 2 users, a single table and configuring apache to handle cgi from your directory takes some time.

Lets come to form development part now. Say you are a very smart designer who knows CSS , HTML and javascript like the back of your hand. You can create both the HTML pages very quickly in a three column layout design. Still you need to make sure that the elements on your form are aligned. You have put the right javascript for form validations. You need to make sure of navigation from search to input and input to search.

Lets say it again that you are very efficient server side coder too. You can create 2 perl CGI scripts in a jiffy. One using DBI to store data in system table and the other to read back the data using tokens. You are a smart developer, so you get everything right in first go. Things like, checking for invalid inputs, making sure of case conversion when doing search , taking care of xss issues so people can not paste javascript URL on your form etc. But still, typing everything in an editor still takes time, isn't it?

Finally , out two forms are ready to be deployed. we input some data , check up the database and see that our table is populated. Now remember that the application is out there in the wild and anyone can do anything to it. So we need to test the forms a bit with all kind of edge cases. numbers-only name, string only dates etc. All goes fine with an occasional bug here and there and we are now ready to upload.

To deploy, you need a domain name and some server space. Lets say I order both from Yahoo. You get your stuff up on hosting server. Getting the domain up and propagated would take also some time. But is this the end once I can access my forms from my very own URL? Is everything over? of course, not. Let 10000 people put their data in your tables and then your searches would literally crawl!!! what happened here? Forgot to analyze your tables, isn't it? How do you take off your site while the maintenance is going on, do you know that?

Anyone can access your application. what if some guy uses some web pager from netbsd machines? what happens to all the groovy ajax validation stuff? Now you see, we need to put in browser match rules in apache config as well.

Morale of the story: slapping together 2 web forms with some server side script is no big deal. But to design and deploy an application that can be accessed by anyone and used in any which way is definitely a big deal. Online application look simple, but they are not!




create myspace.com in 4 hours? Part I

Today, i came across a posting on digg and now i will quote verbatim " A site like myspace.com can be put together in 3-4 hours too. Ditto for del.icio.us and a lot of the other high priced ones." Having done few web applications myself , I really know how much of an understatement that is. Why is that people learn some scripting and rudimentary SQL and think that putting together a decent web application is no big deal? Have they ever tried doing a web app end to end or are we talking super heroes here?? Let me whip put my requirements and some estimates and see how long it will take to do a simple page end to end.

Wednesday, January 17, 2007

Fault tolerant email queues

I had written an application that could send emails . That was done using javamail API. However at that time my requirements were "just" to deliver an email. The usage was not high and our application was on same network as our SMTP server. sending few emails far and between did not turn out to be an issue! However, today , I want to write a component that can send higher volumes of email (something like 20,000 in one shot). My earlier program will surely run out of steam here.
The program should try to resend the failed messages again. If one message send fails then it should just move on to the next one. Some smartness may also be required ,like checking domain names. We should have some way to mark delivered messages and of course we need some kind of queue implementation.
My first reaction now a days is to look for some ready made component. But surprise, surprise!!! so far I have not been able to locate any open source project that fulfils my requirements. I am approaching this problems from three angles
  • See if some message queue or enterprise bus can be used. I do not want to do lot of configurations etc. because my only mode of transport is email and that too one way only! There are a number of open source message queue implementations , I am looking at Mule in particular.
  • See if some MTA like qmail fits my bill. If there are APIs for MTA then I may be through. After all people have been using /bin/sendmail to send emails since time immemorial
  • turn to search.cpan.org as usual (when i am really desperate)
Final option is to write the component by hand. I am ready for that also but the question to ask is If I can trust javamail for a scalable job. still a way to go .....
© Life of a third world developer
Maira Gall