Saturday, November 3, 2012

sed awk join two lines and print nth field from line using awk

Just a series of old-unix-guy tricks that I half-remembered from the last time that I had to do it. The result is still bitchin though. My log files have the variable with the bad value on one line, and the value in question on the next line - d'oh! The variable names all have the same prefix, so I realized that I could grep or sed for that. Then all I had to do was find a way to grab the next line and do stuff with it to pull out the bad value in question. To make a single line printout with the time, date, variable name, and value, my command line turned out to be:

sed -n '/PREFIX/{N;s/\n//;p}' 20121022_123632.msg | awk '{print $2 "\t" $3 "\t" $8 "\t" $NF}' > output_file.txt

Update: I have since continued to work this problem, and have come up with another solution involving only awk. The reason was that I wanted to use my command line with a recursive find such that I could run this command line on each file found, depositing the results for each found file back into a new file in the same directory as each found file with the same name as the found file but with a new suffix. I could either use xargs or use the -exec feature of find to run the command line on the results. What several hours of trying taught me was that piping things between sed and awk wasn't possible to do under either find -exe or xargs. Eventually, I found a way to do all the line manipulation in awk with a single command line without piping and then use it with find -exec which has a much nicer argument expansion capability than xargs. Here's the final result:

find . -name "KEYWORD*.log" -execdir awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"{}.ALARMS";getline;print $NF>>"{}.ALARMS"}' {} \;

The key features here are: 1) regex for matching just like sed, 2) the use of printf to print the result from the first line without a carriage return so that the next thing printed (from the next line) would be printed with the result from the first line, 3) the use of getline to just freaking get the next line, and do a print tailored just to that line, 4) the use of >> for each print statement for the filename (and the fliename *has* to be in quotes, but hahaha find -exec graciously expands the {} to form the filename with the new suffix right in the awk script, as well as sending the filename to awk as the final argument). 5) -execdir instead of -exec which causes the command to be run in the directory where the file was found (and thus the result to be deposited in the same directory)

The Conclusion: After coming up with that bitchin find command and thoroughly testing it on my own system using Cygwin, it bombed when I ran it on the server. That one may be running BSD or something (although I'm not sure since a man for awk on that system returns the page for gawk). The failure that I got was a syntax error on the expanded result. For some reason, even though I fed the following into my -execdir in find,:

awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"{}.ALARMS";getline;print $NF>>"{}.ALARMS"}' {}

The syntax error that I got back would show an extra ./ prepended to the expanded command line being fed to awk:

awk: syntax error in: './/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"input_file.ALARMS";getline;print $NF>>"input_file.ALARMS"}'

The expanded filenames were always correct, and I tried escaping a bunch of stuff to no avail. In the end, getting rid of the expansions for the output file and just using a generic hard-coded output file name mysteriously fixed the problem, and worked out better in the end. So, in the end, I had to go with this:

find . -name "KEYWORD*.log" -execdir awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"ALARMS.txt";getline;print $NF>>"ALARMS.txt"}' {} \;

