Tuesday, September 23, 2008

Static Code Analysis - some thoughts

If you do (or at some point of time did) C programming on Unix then you get introduced to static code analysis pretty early. There are tools like lint, cflow, ctags and cxref, although I was not familiar with this term at that time. So what is Static Code Analysis?

As the name suggests it is a technique to analyze a software without actually trying to run it. It obviously implies that the technique looks at the source code of the said software. Note that it is not Software Testing - wherein we actually compile and run the software application. The application is fed with test data. Software Testing is a part of Dynamic Analysis. Typically static analysis tools are built on top of compiler front-ends. The information collected by compiler front-ends is run through sets of rules to find pre-defined patterns or stored for further analysis.

So is analyzing source code without actually running the software of any use? It turns out that it is very useful - both for prevention of bugs and also for discovery of information. Primarily Static Code Analysis tools have been used for two purposes:

  • Making code more maintainable: Over time the software community has collected lot of good coding practices. It is possible for a tool analyze source code to find if any such practices are violated. More importantly these kinds of tools can now be integrated into Integrated Development Environments (IDEs) so that a software developer can be warned right at the moment s/he is committing a mistake. These tools can also be run on software that is already existing to point out such weak areas or outright defects. Typically these kinds of issues were expected to be caught in Peer Reviews, but its always a good idea to take help from a software.

    Another way to make software more maintainable by using these tools is to rearrange the code (Beautify, Re-factor) so that it is well-formatted and adheres to good design practices. The resulting rearranged code is more elegant, more logical and hence more maintainable.

    Analyzing software for other non-functional characteristics like Security, Performance, Scalability are studied but are not as common place.
  • Finding design information in existing code: More often than not, software is written under artificially created "Time to Market" pressure. Another very misused term is "We use agile coding methodologies". Whatever be the means, but the end result is large amount of poorly documented and poorly structured source code. Further it is very likely that the team that inherits this source code is not the same as that which writes it. At these times tools that can extract design information from source code become very critical to the success of the projects.

    These are called as Information Abstraction systems. They collect data from the source code and put it into data stores (like relational databases). Querries can then be built on these data stores to aid in discovering design information. For example querries like - what are all the recursive functions in this software? Which classes 'use' a given class and so on. This process is also called as Reverse Engineering. There are tools that can generate 'intelligent' class diagrams from source code. Many a times such tools save the day for software developers.

    Another utility of the information thus collected is to aid a software developer to navigate large code bases easily - one who has waded through millions lines of code appreciates their value.
Over the last few years the importance of static analysis has steadily grown. Teams find that they can dramatically reduce their code review times if the reviews are preceded by a static analysis first. Open source projects where code reviews are hard to arrange logistics wise, depend on these techniques to ensure a minimum quality of the software. Today development environments like Eclipse, IBM Rational, Microsoft include a multitude of static analysis tools to make the developer's life easier. If one has to reach the next level as a developer then one needs to know how to take advantage of these tools to take care of the mundane basics and use ones time for the more important things in life (like getting a cup of coffee!)

References:

This is definitely not exhaustive, but I wanted to put together some interesting aspects of static analysis.

No comments: