No Mind

It's the mind that makes you miss the shot

Archive for the ‘programming’ Category

July 22nd, 2011 by Vivek Khurana

Introducing Krivah

For past some time I have been working on numerous “Enterprise” application. What I noticed that most of them have same bunch of problems. In majority of cases you have some data and bunch of rules to be applied on the data. When I explored this a bit I came to a list of features that I will like to have in a business management framework. Here is the initial list I came up

– Define data easily. By “easily” I mean that a Business user should be able to define data (preferably visually) or modify existing data.
– Pull data from various sources and define a collective business object.
– Define rules with a drag and drop interface.
– Define workflows.
– Define presentation of data.
– Package bunch of features as components and mix components to create applications (SCA)
– Allow individual applications to be stored on separate nodes (hardware) so that you can scale individual applications depending on demand.At the same time whole application should be available to user in a single interface, seamlessly.
– Reporting
– Incremental development of applications and components.

Since I had some free time today, I decided to start writing a framework which meets above requirements. The framework is called Krivah and is based on clojure. The code repo is available at https://github.com/vivekkhurana/krivah . Please note, at the time of writing this post, Krivah is pre-pre-alpha!. The code is for developers only. (not for faint hearted! :) )
Right now nothing special is working. Just a basic entity framework for defining entities. Current code supports only NoSQL and is based on MongoDB. But support for other database is coming soon.

Why Clojure
Clojure is a dialect of lisp that runs in JVM or .Net CLR. Lisp is a language that I find most suitable for business applications. The ability to treat code as data is a blessing for defining business rules. One can use older program source code and pass it to new function, making wrapping functions and extending functions a breeze.

What Krivah is not?
Krivah is not a generic framework. It is focused on business applications and workflow based applications to be precise. So you cannot expect to see Krivah used to build a social networking site. But expect things like inventory management, production planning etc.
To cut down on speculations, I am not trying to build SAP or anything similar. I dont think, this framework will be used by anyone beyond SME.

What is in the name?
Krivah (pronounced Kree-waah) is formed by joining two Sanskrit words Kree meaning work and Vaha meaning flow. Since one of the main focus on this framework will be business workflow, joining Kree and Vaha will mean workflow. :)

Why announcement in pre-pre-alpha ?

Its Friday night here in India and I have whole weekend. I think by Monday morning I should be able to finish couple of task and by sometime mid-next week have something usable by end-user. Since the project is going to be open source, the sooner you release the project, the better it is. :)

So stay tuned for more updates… :)

November 15th, 2010 by Vivek Khurana

Revisioning in custom drupal module

Today I was faced a problem with revision handling in Drupal. I was developing a custom module which had some data stored in separate DB table.  The second requirement was to be able to do revisioning on the additional data.  All was fine till I got stuck in one place. When you revert to an older revision, Drupal invokes ‘update’ op for nodeapi. In both cases, revision revert as well as ‘update’, ‘presave’ op is also called. My requirement was to be able to pull the data for the previous version and store it as new version. But the question is how do I come to know if the ‘presave’ is called for new node, a node being edited or a revision being reverted ?

After some debugging I found, that when ‘presave’ op for nodeapi is called for node revert,  the $node object has vid for the version to be reverted to and when the node is being updated, the vid is the latest vid. So, fetching the current vid of the $node in ‘presave’ op and comparing it with the vid of the $node passed to presave operation, solved the problem. If the vid of the$node passed is same as the vid in the node table, then it is an update, else it is a node revert.

What about new node creation ? Well in case of new node creation the vid for $node is not set. :)

Here is the code snippet explaining the above logic


function mymodule_nodeapi(&$node, $op, $arg = 0){
  switch($op){
   case 'presave':
    if($node->vid){
      //node vid is set, means a this is edit or revision revert.
      $sql = 'select vid from {node} where nid=%d';
      $vid_db = db_fetch_array(db_query($sql,$node->vid));
      if($node->vid == $vid_db['vid']){
        //Execute edit logic.
      }else{
        //Execute revision revert logic.
      }
    }
  }
}

May 7th, 2010 by Vivek Khurana

Why designers should learn HTML5

Of recently I had a discussion with few designers questioning the adoption of HTML5 and why should they learn HTML5. As everyone knows HTML5 is the new upcoming standard for the browsers. HTML5 enhances browser capabilities. Many capabilities which required using javascript libraries or flash, like drag and drop or video/audio playback, can now be done in browser natively, with HTML5.

But the question still remains, why should a designer waste her time in learning HTML5, when HTML5 specification is not yet complete. It makes no sense to invest time into something, because your nerd friend wants you to learn :)

HTML5 adoption is happening at a faster pace then you can imagine. All major browser vendors are now supporting HTML5, some have already started shipping HTML5 features in the current browser builds, while others are busy catching up. Not only browsers but several websites like youtube, facebook, scribd etc. have started HTML5 based versions. So unlike previous improvements in HTML, such as CSS, which took years to be adopted by browsers, HTML5 adoption is progressing at a healthy pace.

HTML5 is a standard that has took many years into making, with input from various software vendors and individuals, including designers. This process has ensured that most commonly used features of web development are supported by browsers natively, without need of any external libraries. At the same time, HTML5 has tried to add tags that will make content layout easier along with support for microformats. This makes presentation as well as sharing of data easy.

Though you may argue that HTML5 specification is not yet ready, then how are browsers implementing HTML5 ? This is acceptable. HTML5 is a big standard. It is so big that during the discussion stages, HTML5 specification has to be broken down into couple of smaller standards. Standardization process require drafts to be published and comments be received. Once drafts are published, the core ideas /features/processes/algorithms are rarely changed. Although it will take couple of years for the full specification to be ready, test suite be prepared and browsers passing those test. You can safely assume that the features/algorithms defined in the standard will not change.

So, no harm in investing time into learning HTML5 as it is going to become the defacto standard for web based applications in the decade to come.

How to learn HTML5 ? There are lots of tutorials available on the internet on HTML5. You can also, keep an eye on this blog, as there will many posts in near future, explaining various features of HTML5.

July 5th, 2009 by Vivek Khurana

Scalable application architectures – stability

Recently I started working on an application that will have to cater to the needs of thousands of users. It is not just the number of users but the application needs to aggregate data from multiple web services and push data to multiple webservice. This might sound as a simple but when you have to talk to about 30 webservice which have nothing in common except the HTTP and XML. Each webservice represents data in different format even though most of them deal with a simple text document. This means we need to figure out a way to create the business object from multiple sources at the same time keep the application linear. The complexity of the requirements increases by leaps and bounds when you have to work with live data. Yup, live up to date data. So the only way out seems to be to have a stateless, asynchronous design. But it is not easy to write stateless asynchronous applications :(

You may argue that why am I worried about the scalability of the application. Let the design evolve over a time. My experience with building applications is that, you cannot have a scalable design that “evolves”.  Not without tons of hard work later and not without breaking few things.  Writing scalable applications is like building an earthquake resistant skyscrapper. You cannot wait for the earthquake to come before you will start working on making the building earthquake resistant. You have to design it up front and test the model in lab before you lay the foundation stone of the building.

So what exactly is scalable. The sad part of computer industry is, we still dont have a scale to measure the scalability. What works for one set of data may fail for another set of data. A friend of mine suggested that, he measures his application profitability if the cost per transaction is less than the revenue per transaction.  I think the logical way to measure scalability would be, to measure how far the application can scale while keeping cost per transaction lower than the revenue per transaction :)

So lets try to define stability. To  an end user stability means that the system is available and capable of doing transaction irrespective load.  So first we need to identify what hampers system availability.

  1. Sudden surge of requests (like being slashdotted)
  2. Large number of requests being received continuous  over a period of time.
  3. Internal problems like memory leaks.

For point 1 we do have a solution. Do a load testing. That should give you an indication how long the system will survive before crashing under the load of sudden surge of request or in short what category of earthquake can building handle.

What about point number 2 ? How do you test a system under large number of continuous requests ? Do you do load testing for couple of days before releasing a new build in production ? One may argue that given the way most internet companies work, you have release the work very often. Acceptable point, but what is the use of adding that on cool new feature, that your marketing guy wants like anything, without testing the system stability ? If your cool new feature crashes it is only going to shake users confidence. To handle the point number 2, you need to test your application under different load conditions continuously for few days. I remember building a stock market ticker which would pass all the tests in development but crash in production. We found later that when the application was in productopn for 3 days continuously, some parts of application suffered from data overflow. Though it might sound a stupid mistake from a developer but the fact is the company suffered considerable losses due to repeatedly crashing application. And this was in the era when stock ticker from webservices was a new feature on the internet and every business head of a financial site, wanted to have the feature on the site because some competitor had it.

Testing for longevity of application is a very important test that is ignored more often than it is conducted. A test for longevity can bring out bugs in application that will go untraced in any other type of testing. The test of longevity needs to handle different load conditions under different time. It is equally important to measure the performance of the application during night conditions (low load) to peak conditions (day time).  Performance of different systems as the application load ramps up or down could reveal certain startling facts about your application.

What about point number 3 ? It takes some experience to identify internal problems. For instance memory leak can only be identified by seasoned programmer as compared to a johnny. So code review plays an important part here.  But what ever you do, some or the other internal problem will arise.  You need to build safety nets for such situations. Like building air bags for front passengers which inflate automatically when the car is hit.  Such impact absorbers will be able to handle internal problems and yet let the system perform or what is known as fault tolerance.

So keeping above points in mind, I have started designing the application. Currently I am evaluating whether to use a RDBMS or go with no-sql. Will post about the same when I arrive to a decision :) .

More later…

February 26th, 2009 by Vivek Khurana

A rose by any other name will end up as a cabbage

Last night a friend of mine pointed me to the 97 things wiki and an interesting axiom “A rose by any other name will end up as a cabbage“. The axiom page talked about how you should name the components in a software project. The idea arises from a simple argument “If you don’t know what it is to be called you don’t know what it is”.

In my opinion natural languages are the best tools for describing the requirements of a project.  Although diagrams are very helpful in explaining the overall logic or flow, but the details of the requirement should always be explained in natural language such as plain English. When requirements are written in plain English, you can toss the requirements across users for comments and get feedback at an early stage of software development. Not all users can understand the complexities of state machines, use case diagrams, ER diagrams etc. but all of them  understand, jargon free simple English. Yes, you can use any other natural language than English, when working with non-English speaking population.

Another advantage of using a natural language to describe requirement is that you get a natural abstraction layer called ‘name’. When you name something you try to come up with a mental abstraction based on the major characteristics of a (physical) object and second time you have to refer to same object or collection of characteristics you call it by a name, instead of long description of the object.  When you see a repeating pattern, give that a name. When you see a bunch of instructions to be executed repeatedly give it a name. When you see an interface, give it a name. At the same time ensure that the name is specific enough to convey the characteristics (eg. cheap,fast,easy, strong etc.) of the abstraction or object. A name that is not specific enough points to lack of clarity in the abstraction layer and is a sign of either too much of abstraction in the system or over engineering. At the same time you need not build abstraction only when you see lots of features being clubbed. An interface might implement only two or three features but the frequency of repetition of those features will make them as a candidate for abstraction and thus a name.

Creating abstraction comes natural to all the humans and it is easy for us to identify things by a name. May be that is why we came up with an idea of commands to computers because it was easier for a human to remember a single command instead of typing a set of instructions to see a list of files in a directory for example.  We further progressed into creating scripts which would be a collection of commands (essentially an abstraction), giving name to a script and then remembering the name as a command.

When you are designing a new software, having a specific name for a component will create the abstraction automatically.  But do watch out if you have too many names in your requirements, you might be suffering from over engineering and you might have created too many abstractions…

Who says whats in the name ?

January 21st, 2009 by Vivek Khurana

Playing with dejavu

So spend last night messing around with dejavu orm. While I was chatting with dejavu team over IRC and pointing bugs, fumanchu was busy fixing them asap. We had 3 revisions of dejavu in one hour and one new ticket.  It turned out that geniusql can not handle NULL in mysql timestamp type. Mysql’s timestamp is a badly designed data type, from the manual, “TIMESTAMP columns are NOT NULL by default, cannot contain NULL values, and assigning NULL assigns the current timestamp. However, a TIMESTAMP column can be allowed to contain NULL by declaring it with the NULL attribute”.  Now that’s a horrible way to design a datatype. To support both NULL and NOT NULL, some modifications are required in the geniusql and a ticket has been filed for the same.

So dejavu 2.0 is becoming stable day by day… Enjoy…

December 29th, 2008 by Vivek Khurana

Software fault tolerance

Ever got frustrated when your software stops responding and crashes, throwing a popup message asking you to send the information to xyz developer? With increasing complexity in the software we see an increasing trend of software hangups. The classic case was the windows blue screen. Thanks to microsoft for getting rid of ugly blue screen and keeping windows OS running when one program misbehaves.

Even though whole world was fed up with blue screen, programmers have  not yet learned that one action by a misbehaving component should not bring the whole system down. With rise of web 2.0 we have seen rise in rich internet applications and a rise in hanging browsers. One misbehaving plugin or a misbehaving tab can crash the whole browser. Gosh, have they forgotten about keeping program stable while writing the initial code ? or introduce new uber cool unstable feature was more important than overall software performance ?

It is very easy to blame software authors for all the mess but lets spend some time trying to understand what causes software to fail and how to avoid failures.

Software failure can be divided in 3 parts, error,fault and failure. Fault or bug is that produces error and error leads to failure. Error is a state of the system under investigation, a state that can bring down the whole system. So our discussion will focus on handling the system states that are liable for failures. Fault tolerance is the set of techniques aimed at detecting, isolating and recovering from computational state that can lead to failure. In Software fault tolerance techniques and implementations Laura Pullum  identifies 4 steps for fault tolerance viz

  1. Error identification or detection
  2. Error diagnostic to identify the cause of error.
  3. Error containment to prevent further damage.
  4. Error recovery the transition from erroneous state to error-free state.

The simplest approach to faul tolerance is try-catch block in OOP. As soon as an error is detected an exception is thrown and a catch block isolates the error giving an option to recover from the fault.  Simple solution which works…  But OOP is a programming language feature whereas software is made of components, so one example is not enough. In this series I will pick examples from foss projects and show how fault tolerance can be built into a system. So see you for more…

December 17th, 2008 by Vivek Khurana

Saturday code jam details

So we seem to be all set for saturday code jam on 20th Dec. 2008. What you need to join the code jam

  1. You should have a laptop with python installed.
  2. You should know some python programming (read “hello world”)
  3. You should be ready to work on completely new software.
  4. You should be willing to opensource your code.
  5. If you are proficient with python, come with cherrypy, dejavu, simplejson and jquery installed.
  6. If you dont know python but know databases, still you can join, we will need good db guys.
  7. You should know how to use git.
  8. You should be willing to code without internet, though python manual will be available.

So see you at code jam..

December 16th, 2008 by Vivek Khurana

Saturday code jam

Inspired by foss.in workouts, I am arranging a small hack session in my office this Saturday, 20th Dec 2008.  We plan to hack some core features for stipend platform to get the basic application running. Already 3 people have confirmed and I have space left for 3 more. Sorry cant accommodate more than 6 people due to space restriction.

Oh yes… we are trying this hack session without wifi… Lets see how far we go… and if you are interested ping me… hurry…

December 15th, 2008 by Vivek Khurana

Dejavu

Recently I started playing with dejavu ORM by Robert Brewer. For first time I found a python ORM which can be a  replacement for my over used data layer. Dejavu allows you to interface with more than one data source and this is a blessing when you are building application that have to fetch data from legacy or proprietary database along with  SQL based database(s).

Dejavu has done lot of things correctly in the design itself. Dejavu uses data mapper architecture, which creates loose coupling between the database and in-memory objects. This separation is achieved with help of a data mapper for translating in-memory objects to database tables. As in-memory objects do not have any responsibility of database operations, the domain layer can focus on one thing that it is meant for ‘domain logic’. As in-memory objects talk to database through a data mapper, they can talk to more than one data mapper and connect to multiple data source, plus the data source need not be a database. It can be anything for which a data mapper exists, thus allowing you to build business objects which can be composed of multiple data sources.

Normally organizations have multiple data sources and applications have to either replicate data or create multiple access layers to accommodate every data source. In such scenarios the loose coupling in dejavu is nothing short of a blessing. With an ORM capable of connecting to multiple data sources, you can expect reduction in development time and number of bugs.

Second good feature of dejavu are the triggers, behaviours that fire when value is changed. It is not uncommon for developers to write logic in the code which is fired on value change, for example update the value of A by 10 if the value of B is more than 20. We do this by writing tons of if-else statements, which becomes  hard to maintain as the code size grows. With dejavu, you can delegate the responsibility to ORM, resulting in easy to maintain code.

I also liked the way dejavu has separated the deployment from development. The official guide comes with a neat example of the config file to explain the deployment. No more complicated XML syntax when all I want to specify is  the database driver and connection string…

I can continue praising dejavu but I think I have done enough.. I think its time now to search the shortcomings of dejavu as by now I am not been able to find any. I am going to play with dejavu more and post about shortcomings as I come across, along with few examples of how to use dejavu…