Scalable application architectures – stability

July 5th, 2009

Recently I started working on an application that will have to cater to the needs of thousands of users. It is not just the number of users but the application needs to aggregate data from multiple web services and push data to multiple webservice. This might sound as a simple but when you have to talk to about 30 webservice which have nothing in common except the HTTP and XML. Each webservice represents data in different format even though most of them deal with a simple text document. This means we need to figure out a way to create the business object from multiple sources at the same time keep the application linear. The complexity of the requirements increases by leaps and bounds when you have to work with live data. Yup, live up to date data. So the only way out seems to be to have a stateless, asynchronous design. But it is not easy to write stateless asynchronous applications :(

You may argue that why am I worried about the scalability of the application. Let the design evolve over a time. My experience with building applications is that, you cannot have a scalable design that “evolves”.  Not without tons of hard work later and not without breaking few things.  Writing scalable applications is like building an earthquake resistant skyscrapper. You cannot wait for the earthquake to come before you will start working on making the building earthquake resistant. You have to design it up front and test the model in lab before you lay the foundation stone of the building.

So what exactly is scalable. The sad part of computer industry is, we still dont have a scale to measure the scalability. What works for one set of data may fail for another set of data. A friend of mine suggested that, he measures his application profitability if the cost per transaction is less than the revenue per transaction.  I think the logical way to measure scalability would be, to measure how far the application can scale while keeping cost per transaction lower than the revenue per transaction :)

So lets try to define stability. To  an end user stability means that the system is available and capable of doing transaction irrespective load.  So first we need to identify what hampers system availability.

  1. Sudden surge of requests (like being slashdotted)
  2. Large number of requests being received continuous  over a period of time.
  3. Internal problems like memory leaks.

For point 1 we do have a solution. Do a load testing. That should give you an indication how long the system will survive before crashing under the load of sudden surge of request or in short what category of earthquake can building handle.

What about point number 2 ? How do you test a system under large number of continuous requests ? Do you do load testing for couple of days before releasing a new build in production ? One may argue that given the way most internet companies work, you have release the work very often. Acceptable point, but what is the use of adding that on cool new feature, that your marketing guy wants like anything, without testing the system stability ? If your cool new feature crashes it is only going to shake users confidence. To handle the point number 2, you need to test your application under different load conditions continuously for few days. I remember building a stock market ticker which would pass all the tests in development but crash in production. We found later that when the application was in productopn for 3 days continuously, some parts of application suffered from data overflow. Though it might sound a stupid mistake from a developer but the fact is the company suffered considerable losses due to repeatedly crashing application. And this was in the era when stock ticker from webservices was a new feature on the internet and every business head of a financial site, wanted to have the feature on the site because some competitor had it.

Testing for longevity of application is a very important test that is ignored more often than it is conducted. A test for longevity can bring out bugs in application that will go untraced in any other type of testing. The test of longevity needs to handle different load conditions under different time. It is equally important to measure the performance of the application during night conditions (low load) to peak conditions (day time).  Performance of different systems as the application load ramps up or down could reveal certain startling facts about your application.

What about point number 3 ? It takes some experience to identify internal problems. For instance memory leak can only be identified by seasoned programmer as compared to a johnny. So code review plays an important part here.  But what ever you do, some or the other internal problem will arise.  You need to build safety nets for such situations. Like building air bags for front passengers which inflate automatically when the car is hit.  Such impact absorbers will be able to handle internal problems and yet let the system perform or what is known as fault tolerance.

So keeping above points in mind, I have started designing the application. Currently I am evaluating whether to use a RDBMS or go with no-sql. Will post about the same when I arrive to a decision :) .

More later…

  • Share/Bookmark

A rose by any other name will end up as a cabbage

February 26th, 2009

Last night a friend of mine pointed me to the 97 things wiki and an interesting axiom “A rose by any other name will end up as a cabbage“. The axiom page talked about how you should name the components in a software project. The idea arises from a simple argument “If you don’t know what it is to be called you don’t know what it is”.

In my opinion natural languages are the best tools for describing the requirements of a project.  Although diagrams are very helpful in explaining the overall logic or flow, but the details of the requirement should always be explained in natural language such as plain English. When requirements are written in plain English, you can toss the requirements across users for comments and get feedback at an early stage of software development. Not all users can understand the complexities of state machines, use case diagrams, ER diagrams etc. but all of them  understand, jargon free simple English. Yes, you can use any other natural language than English, when working with non-English speaking population.

Another advantage of using a natural language to describe requirement is that you get a natural abstraction layer called ‘name’. When you name something you try to come up with a mental abstraction based on the major characteristics of a (physical) object and second time you have to refer to same object or collection of characteristics you call it by a name, instead of long description of the object.  When you see a repeating pattern, give that a name. When you see a bunch of instructions to be executed repeatedly give it a name. When you see an interface, give it a name. At the same time ensure that the name is specific enough to convey the characteristics (eg. cheap,fast,easy, strong etc.) of the abstraction or object. A name that is not specific enough points to lack of clarity in the abstraction layer and is a sign of either too much of abstraction in the system or over engineering. At the same time you need not build abstraction only when you see lots of features being clubbed. An interface might implement only two or three features but the frequency of repetition of those features will make them as a candidate for abstraction and thus a name.

Creating abstraction comes natural to all the humans and it is easy for us to identify things by a name. May be that is why we came up with an idea of commands to computers because it was easier for a human to remember a single command instead of typing a set of instructions to see a list of files in a directory for example.  We further progressed into creating scripts which would be a collection of commands (essentially an abstraction), giving name to a script and then remembering the name as a command.

When you are designing a new software, having a specific name for a component will create the abstraction automatically.  But do watch out if you have too many names in your requirements, you might be suffering from over engineering and you might have created too many abstractions…

Who says whats in the name ?

  • Share/Bookmark

Playing with dejavu

January 21st, 2009

So spend last night messing around with dejavu orm. While I was chatting with dejavu team over IRC and pointing bugs, fumanchu was busy fixing them asap. We had 3 revisions of dejavu in one hour and one new ticket.  It turned out that geniusql can not handle NULL in mysql timestamp type. Mysql’s timestamp is a badly designed data type, from the manual, “TIMESTAMP columns are NOT NULL by default, cannot contain NULL values, and assigning NULL assigns the current timestamp. However, a TIMESTAMP column can be allowed to contain NULL by declaring it with the NULL attribute”.  Now that’s a horrible way to design a datatype. To support both NULL and NOT NULL, some modifications are required in the geniusql and a ticket has been filed for the same.

So dejavu 2.0 is becoming stable day by day… Enjoy…

  • Share/Bookmark

Software fault tolerance

December 29th, 2008

Ever got frustrated when your software stops responding and crashes, throwing a popup message asking you to send the information to xyz developer? With increasing complexity in the software we see an increasing trend of software hangups. The classic case was the windows blue screen. Thanks to microsoft for getting rid of ugly blue screen and keeping windows OS running when one program misbehaves.

Even though whole world was fed up with blue screen, programmers have  not yet learned that one action by a misbehaving component should not bring the whole system down. With rise of web 2.0 we have seen rise in rich internet applications and a rise in hanging browsers. One misbehaving plugin or a misbehaving tab can crash the whole browser. Gosh, have they forgotten about keeping program stable while writing the initial code ? or introduce new uber cool unstable feature was more important than overall software performance ?

It is very easy to blame software authors for all the mess but lets spend some time trying to understand what causes software to fail and how to avoid failures.

Software failure can be divided in 3 parts, error,fault and failure. Fault or bug is that produces error and error leads to failure. Error is a state of the system under investigation, a state that can bring down the whole system. So our discussion will focus on handling the system states that are liable for failures. Fault tolerance is the set of techniques aimed at detecting, isolating and recovering from computational state that can lead to failure. In Software fault tolerance techniques and implementations Laura Pullum  identifies 4 steps for fault tolerance viz

  1. Error identification or detection
  2. Error diagnostic to identify the cause of error.
  3. Error containment to prevent further damage.
  4. Error recovery the transition from erroneous state to error-free state.

The simplest approach to faul tolerance is try-catch block in OOP. As soon as an error is detected an exception is thrown and a catch block isolates the error giving an option to recover from the fault.  Simple solution which works…  But OOP is a programming language feature whereas software is made of components, so one example is not enough. In this series I will pick examples from foss projects and show how fault tolerance can be built into a system. So see you for more…

  • Share/Bookmark

Saturday code jam details

December 17th, 2008

So we seem to be all set for saturday code jam on 20th Dec. 2008. What you need to join the code jam

  1. You should have a laptop with python installed.
  2. You should know some python programming (read “hello world”)
  3. You should be ready to work on completely new software.
  4. You should be willing to opensource your code.
  5. If you are proficient with python, come with cherrypy, dejavu, simplejson and jquery installed.
  6. If you dont know python but know databases, still you can join, we will need good db guys.
  7. You should know how to use git.
  8. You should be willing to code without internet, though python manual will be available.

So see you at code jam..

  • Share/Bookmark

Saturday code jam

December 16th, 2008

Inspired by foss.in workouts, I am arranging a small hack session in my office this Saturday, 20th Dec 2008.  We plan to hack some core features for stipend platform to get the basic application running. Already 3 people have confirmed and I have space left for 3 more. Sorry cant accommodate more than 6 people due to space restriction.

Oh yes… we are trying this hack session without wifi… Lets see how far we go… and if you are interested ping me… hurry…

  • Share/Bookmark

Dejavu

December 15th, 2008

Recently I started playing with dejavu ORM by Robert Brewer. For first time I found a python ORM which can be a  replacement for my over used data layer. Dejavu allows you to interface with more than one data source and this is a blessing when you are building application that have to fetch data from legacy or proprietary database along with  SQL based database(s).

Dejavu has done lot of things correctly in the design itself. Dejavu uses data mapper architecture, which creates loose coupling between the database and in-memory objects. This separation is achieved with help of a data mapper for translating in-memory objects to database tables. As in-memory objects do not have any responsibility of database operations, the domain layer can focus on one thing that it is meant for ‘domain logic’. As in-memory objects talk to database through a data mapper, they can talk to more than one data mapper and connect to multiple data source, plus the data source need not be a database. It can be anything for which a data mapper exists, thus allowing you to build business objects which can be composed of multiple data sources.

Normally organizations have multiple data sources and applications have to either replicate data or create multiple access layers to accommodate every data source. In such scenarios the loose coupling in dejavu is nothing short of a blessing. With an ORM capable of connecting to multiple data sources, you can expect reduction in development time and number of bugs.

Second good feature of dejavu are the triggers, behaviours that fire when value is changed. It is not uncommon for developers to write logic in the code which is fired on value change, for example update the value of A by 10 if the value of B is more than 20. We do this by writing tons of if-else statements, which becomes  hard to maintain as the code size grows. With dejavu, you can delegate the responsibility to ORM, resulting in easy to maintain code.

I also liked the way dejavu has separated the deployment from development. The official guide comes with a neat example of the config file to explain the deployment. No more complicated XML syntax when all I want to specify is  the database driver and connection string…

I can continue praising dejavu but I think I have done enough.. I think its time now to search the shortcomings of dejavu as by now I am not been able to find any. I am going to play with dejavu more and post about shortcomings as I come across, along with few examples of how to use dejavu…

  • Share/Bookmark

Lets get rolling…

December 15th, 2008

So I was thinking of restarting writing on the web and I thought hard for the topic of the first post. After spending some time I relaized the first post need to have nothing… let me not set any kind of trend.. let things roll on there own…

  • Share/Bookmark
Technorati Profile