Almost a week to the day, I had to admit to myself that my Awk script won’t take me where I wanted. I had worked on it on an over the week, may be a total of forty hours. I spend just over four hours today before I realized what should have been obvious a few days or hours into the project.
I had a question to answer, and the answer seems to lie in one or more log files. These log file can run into the hundred of thousands of line or even one million. So how to have my question answered without having to looking with my Vim editor and a number of repeated regular expressions?
Enter the idea of a script. A script that would look for a few tell tail patterns. At the end of which, I can say these things happened so this must be the conclusion. Or these things didn’t happen, so something else. It was suppose to be that simple.
I started with a shell script. Found that that with the number of lines I had to read, I would essentially need some way to loop over the files and etc line. Why do that when Sed and Awk already does that. Fortunately, Awk is a lot more flexible than Sed. I guess I could have written it in Perl too. I wanted something powerful, but didn’t want to go down the road of Python or Groovy.
Perl, I dismissed because I had written a lot of Perl code years ago and didn’t want to go back there. It was fun then, but I haven’t had use for Perl since and didn’t want to revisit using it.
I also wanted something I can give to another person to user without much headache. So while most systems have Python, I am not as deft in Python, so it would have taken a bit more time. Oh, I didn’t have much time either. So Awk was going to be it, after I didn’t get more than fifteen minutes into writing in Bash.
It turned out that Awk wasn’t the problem. It was just processing a lot of text and what you might expect to find, may not be there. Since your script is not as flexible as a full fledge language might give you. You really are just looking for a bunch of patterns and reacting. So when something isn’t there that you expect. Well…it all goes South very quickly.
Some of my early testing showed a particular setting of lines. I deduced that if I found those, then I can draw a conclusion or not. When I tested on a later data set. Many of the patterns I had coded for, weren’t even there. Plus my design hinged on finding a start pattern as a marker between distinct events. That was nowhere to be found in some actual logs. Even though, they were there in the test logs I used.
And that is what should have been clear to me early on. That log files don’t always contain what you are looking or what you want. And even when something happened and you thing that everything will be logged so you can go back and tell a story. Sometimes, you just can’t. So now I will just have to go back to using Vim, tail, head, grep/ack, less, and friends. Because writing script to part logs is a non-trivial task, especially in limited scripting language.