Monday, August 18, 2008

Understanding Existing Code

Continuing from my last post on debugging, the next natural step is to look at how to understand existing code. While we are on this topic it is interesting to note that the overall software entropy in the world is ever increasing. What I mean by that is the number of lines of code of software written in the world keeps increasing. So far it has been the experience that systems that need software, keep needing it and never die out. They might change in shape and size but never die out. Take the example of mainframe systems. Even today many enterprises still employ systems from the 1970s although they are wrapped in modern connectors. Moreover the systems that have been decommissioned are replaced by more complex modern software systems. Would we ever reach a time when all software ever needed is written? It is hard to say but I would bet that this time would never come.

So whether we like or not, we will be co-existing with lot of software. And by definition understanding existing software becomes critical. This article is about my thoughts on how to ease the life of the programmer who is banished to understand other's code. More often than not there is little or no documentation, both inside the code and outside (what do you mean documentation, you have the code!). The challenge this programmer is faced with is to extract design ideas out of the code. Before a programmer embarks on this journey she needs to understand the goals of why she is doing it very clearly. It could taking ownership of the software, fixing a defect or adding a new feature. The following thoughts come to mind.
  • The first step in understanding a software system is to go outside in. One needs to understand as much as possible about the execution context, functionality exposed to the user, external interfaces to the world and so on. This information is embedded in User Interfaces so "playing around" with the system is obviously desirable. Some times this is not possible. The programmer has to satisfy herself with reading user manuals or rely on user descriptions or even use only reports and other artifacts generated by the system under consideration.
    • A neglected aspect in this regard are the test cases for the system. If the quality assurance for the system is up to date and complete, then the test cases are an extremely rich source information about the behaviour of the system. The programmer needs to classify the test cases and start looking at "end to end" cases. Executing these test cases on a running system is very desirable.
  • The next step is to understand the deployment aspects of the system. If one is faced with a compiled system, the build scripts embed a lot of information about the dependencies within the system. The process of coming to binary executables from code can be quite complex. Understanding this process is critical in understanding the software. There are a few tools that help the programmer in this process - for example one tool coverts build scripts into a graphical representation that is much easier to comprehend.
  • The next step is to start getting into the code. Many times programmer start understanding a software system inside out, that is start looking at the code first. This is not a good idea as it can be very confusing leading to wasted time. The better way is go outside in. However these three steps will be needed to be done back and forth many times. Coming back to understand source code there are quite a few advances in this area.
    • One of the ways to start is to set up source code as a "project" in an appropriate Integrated Development Environments (IDE). The IDE support for modern languages is phenomenal and one needs to harness this for understanding code. For one navigating the code becomes extremely easy.
    • Another point to consider here is Program Understanding has developed into a science in the last two decades and there are tools now availble to automate this process. Some examples:
      Using automatic static analysis tools like these makes a lot of sense when the code base the programmer is faced with is huge (this can be relative, but my opinion is anything larger than 100,000 lines of code). Many of these tools extract information about the software and store it into query-able systems. The data then is open for a lot of reporting and representations. The next level of detail is to put together the control and data flows that are relevant for ones investigations.
  • It is rarely that a programmer gets into understanding large programs just for the sake of understanding it or for leisure. More often than not, the effort is driven by a need to change the behaviour of the software under consideration - either add a new functionality or fix a defect. In this case the programmer has a definite goal. She can now start narrowing on to this goal. The ideal situation is to understand all the flows related to the module being changed and have a good sense of the impact of changing the code.
    • An interesting note here - as part of doing the above a programmer mentally "slices" the program in various ways. In computer science this is studied as "Program Slicing". There are a few tools that can help the programmer here, but more importantly having this theoretical awareness helps the programmer to have a better perspective on how to go about achieving ones goals.

Hopefully these points help a software programmer or two faced with this dilemma. The ideal outcome of a program understanding effort is to get a completed set of documentation, both inside and outside the code. However many times its not feasible because of time and budget constraints. My personal opinion is that the organization needs to at least budget for internal commenting of the code. The programmers given the task of understanding code should also be charged with commenting it. This can be done incrementally, the programmer adds suitable comments in the modules that she is targeting. If this is not done then the knowledge discovered in this process remains in the minds of people, but may be that is a good thing because it generates jobs for other programmers in a few years !!

No comments: