CHD's Memory: sed awk join two lines and print nth field from line using awk

Saturday, November 3, 2012

sed awk join two lines and print nth field from line using awk

Just a series of old-unix-guy tricks that I half-remembered from the last time that I had to do it. The result is still bitchin though. My log files have the variable with the bad value on one line, and the value in question on the next line - d'oh! The variable names all have the same prefix, so I realized that I could grep or sed for that. Then all I had to do was find a way to grab the next line and do stuff with it to pull out the bad value in question. To make a single line printout with the time, date, variable name, and value, my command line turned out to be:

sed -n '/PREFIX/{N;s/\n//;p}' 20121022_123632.msg | awk '{print $2 "\t" $3 "\t" $8 "\t" $NF}' > output_file.txt

I remembered that N existed to read in the next line in sed. The key of removing the inner newline with s/\n// was from this page (I had to disregard the otherwise pretty cool-looking looping that the example used):

http://stackoverflow.com/questions/7852132/sed-join-lines-together

The nice thing about using awk is that it defaults to using spaces for field separators automatically. Here's a related example, which is also where I got that cool $NF trick for printing the last field:

http://www.delorie.com/gnu/docs/gawk/gawk_36.html

Printing tab characters in the output is helpful since the variable name widths vary. I was using spaces previously just by trying out using a space inside quote marks; what I learned was that to print the tab characters, they have to still have to be inside quotes. Here's an example (yes, this is essentially the usenet awk manual):

http://www.catonmat.net/blog/wp-content/uploads/2008/09/awk1line.txt

Here are a couple of examples from early in my search for answers to this problem, none of which I went with but which could be helpful to understand:

Has an example with brackets, carrets, and asterixes, no idea how this works: http://www.unix.com/shell-programming-scripting/105070-sed-command-extract-1st-word-every-line.html

Has more brackets, carrets, and astreixes, plus some numbers and nice example of regex construction: http://www.unix.com/shell-programming-scripting/175591-using-sed-extract-word-string.html

Update: I have since continued to work this problem, and have come up with another solution involving only awk. The reason was that I wanted to use my command line with a recursive find such that I could run this command line on each file found, depositing the results for each found file back into a new file in the same directory as each found file with the same name as the found file but with a new suffix. I could either use xargs or use the -exec feature of find to run the command line on the results. What several hours of trying taught me was that piping things between sed and awk wasn't possible to do under either find -exe or xargs. Eventually, I found a way to do all the line manipulation in awk with a single command line without piping and then use it with find -exec which has a much nicer argument expansion capability than xargs. Here's the final result:

find . -name "KEYWORD*.log" -execdir awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"{}.ALARMS";getline;print $NF>>"{}.ALARMS"}' {} \;

The key features here are: 1) regex for matching just like sed, 2) the use of printf to print the result from the first line without a carriage return so that the next thing printed (from the next line) would be printed with the result from the first line, 3) the use of getline to just freaking get the next line, and do a print tailored just to that line, 4) the use of >> for each print statement for the filename (and the fliename *has* to be in quotes, but hahaha find -exec graciously expands the {} to form the filename with the new suffix right in the awk script, as well as sending the filename to awk as the final argument). 5) -execdir instead of -exec which causes the command to be run in the directory where the file was found (and thus the result to be deposited in the same directory)

Here's an example of methodology that I didn't end up going with but which is still cool: It shows saving the current line in awk in a variable, using $0 to get the entire line, and then something that I don't quite get going on with "next" that seems like it should be similar to getline but doesn't seem to be, all to join two lines together. Also hilariously makes fun of using cat when awk accepts an input filename as an argument (Useless Use Of Cat):

http://compgroups.net/comp.unix.shell/need-one-line-awk-or-sed-command-to-condition/1210370

Here's another page with more examples of using "next". Some of these examples are really deep and could use much more careful study:

http://www.theunixschool.com/2012/05/awk-join-or-merge-lines-on-finding.html

Here is the example of using getline. What's neat about this forum thread is somebody suggested the other side of the coin, how to do it in sed! I don't understand the sed syntax suggested, so there's probably something important to learn there:

http://www.linuxquestions.org/questions/programming-9/awk-help-print-matching-line-and-next-three-931966/

Here is the example site that I read for doing output to a file in awk. I'd seen a hint of this on another page where they used ">>":

http://www.unix.com/shell-programming-scripting/55172-awk-print-redirection-variable-file-name.html

Here is a nice sed tutorial that talks about those curly brackets (although it doesn't show them all one line like my command). This was a part of my search for trying to get sed to maybe send its output to a file that I could later maybe read in with awk (before I got smart and just decided to do everything in awk):

http://www.ibm.com/developerworks/linux/library/l-sed2/index.html

And, as part of that search, I ran across this deep page, which aside from all the many other wisardly tips clued me in to the secret that you can have multiple -e arguments to sed, and it just treats the result like one sed script:

http://stackoverflow.com/questions/3472404/how-to-go-from-a-multiple-line-sed-command-in-command-line-to-single-line-in-scr

So, the problem that I was having was having an output filename that I could make using filename expansion based on the original found file. I never got that working in sed (and it was essentially a dead end anyhow) because it wouldn't expand the filename. Here is a site about outputting to a file in sed; it's examples are pretty simple though:

http://www.thegeekstuff.com/2009/10/unix-sed-tutorial-how-to-write-to-a-file-using-sed/

I also spent a long time searching for a way to either call sed from awk or awk from sed. While it seems to be possible, all of the example sites that I found were actually pretty confused and incomplete.

And while searching for solutions to my problem of trying to get sed-piped-to-awk into my find command, I ran across the following two breathtakingly learned (but basically unhelpful) forum threads:

http://www.linuxquestions.org/questions/linux-software-2/using-a-pipe-inside-find-exec-how-does-it-work-and-why-is-it-so-slow-816525/

http://ubuntuforums.org/showthread.php?t=1436654

The Conclusion: After coming up with that bitchin find command and thoroughly testing it on my own system using Cygwin, it bombed when I ran it on the server. That one may be running BSD or something (although I'm not sure since a man for awk on that system returns the page for gawk). The failure that I got was a syntax error on the expanded result. For some reason, even though I fed the following into my -execdir in find,:

awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"{}.ALARMS";getline;print $NF>>"{}.ALARMS"}' {}

The syntax error that I got back would show an extra ./ prepended to the expanded command line being fed to awk:

awk: syntax error in: './/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"input_file.ALARMS";getline;print $NF>>"input_file.ALARMS"}'

The expanded filenames were always correct, and I tried escaping a bunch of stuff to no avail. In the end, getting rid of the expansions for the output file and just using a generic hard-coded output file name mysteriously fixed the problem, and worked out better in the end. So, in the end, I had to go with this:

find . -name "KEYWORD*.log" -execdir awk '/PREFIX/{printf $2"\t"$3"\t"$8"\t">>"ALARMS.txt";getline;print $NF>>"ALARMS.txt"}' {} \;

Oh, and while struggling to figure out this new problem, I came across this lovely site that compares and contrasts using find -exe and find | xargs. I can probably go back and read this top to bottom to better educate myself:

http://www.softpanorama.info/Tools/Find/using_exec_option_and_xargs_in_find.shtml

And here's an adorable little xargs page:

http://offbytwo.com/2011/06/26/things-you-didnt-know-about-xargs.html