Data about students in a college (typical of strength of about 500) are to be tracked. We need to be able to track about twenty attributes for every student like:
- Opting for a particular subject, opting for hostel accommodation, Ladies and gents, etc.
- Give all students who have opted for Artificial Intelligence and stay in the hostel
If you recognized that these could be implemented as sets and the answers to the queries would be set operations, then you are on the right track. However what data structure did you think of using? If you thought of a structure like a list then you are not alone.
However there is a better structure suited for this situation - the bit vector. We will represent the data for students as bit vectors for each of the attribute - note that all the attributes need to be boolean. So we have a 500 bit vector for every subject, hostel accommodation and so on.
Then the answer to the queries like the one mentioned above can be obtained by a series of bitwise operations between different bit vectors. And there are some really cool algorithms to do the popular bitwise operations. This is very efficient both in terms of storage and time taken to process queries.
The question is why don't most of us think of this data structure? I believe its because our learning curriculum does not accord the kind of importance needed for such an elegant and efficient data structure. We are systematically "taught" to stay away from anything to do with bit operations. In fact I have seen good programmers with a bit operations phobia. I think it is time to revisit this mentality.
Let me end this discourse with a problem picked up from Jon Bentley's book Programming Pearls. The situation there was that a programmer needed to sort several thousand telephone numbers (seven digits long) stored in a file. The programmer was constrained by the fact that s/he could not read the entire set of numbers into main memory for lack of space.
The solution that was thought of was interestingly very elegant. An obvious note is that telephone numbers do not repeat. So the programmer represented every phone number (there can be only 99,99,999 of them since they are seven digits) by a bit. If the phone number existed in the file then the corresponding bit was set to one. So the bit vector could be initialized by one pass of the file on disk. Then sorting the numbers just meant walking through the bit vector and printing the (phone) numbers for which the bit was set to one.
I say three hurrays to our unsung hero The Bit Vector!
No comments:
Post a Comment