Wednesday, December 7, 2011

Stop Cygwin sed adding ^M

So, when I used sed on my files, it would convert windows ascii text to linux ascii text and changing the CRLF on the end of each line of every file it touched to just ^M, regardless of if it made the requested other substitution. This royally screwed up Tortise SVN's ability to tell if the file had been seriously changed. My search for the answer led me to a lot of interesting discoveries about how cygwin deals with file types, ways to finesse how it deals with file types (most of which didn't work), and then the answer itself so simple.

On the cygwin message boards, all the moderators could manage was snarky comments saying "read the FAQ about text mounts":

http://www.cygwin.com/ml/cygwin/2002-01/msg01432.html

Here's some not entirely clear additional info that this is happening during the i/o redirection process and not in sed:

http://www.cygwin.com/ml/cygwin/2002-01/msg01433.html

The basic concept from the above was that the problem I'm having is the standard behavior for cygwin sed; it's supposed to do this. So, what are text mounts? Never really got the feeling I learned for sure, but it seems to have something to do with how the file system was mounted by Cygwin initially? Here may be the faq on text modes:

http://www.cygwin.com/cygwin-ug-net/using-textbinary.html

Somewhere I got the idea that cygwin sed could be forced to output files as windows files if the input filename is given with a colon or backslash in the path name. However, after beating on xargs to make it do that, the output was still not a windows format (backing up ideas from the second link above).

During this process I got more familiar with using xargs with the -I option from this page:

http://www.mkssoftware.com/docs/man1/xargs.1.asp

From this next link I learned two things. First, that the -i argument to sed was the "edit in place" parameter, this was what made it possible to not need to manually command the redirect of sed's output. Secondly, sed has a -b argument to specifiy the file type as binary. The Cygwin man page didn't mention this, as I recall, but it was the answer. This has the effect of telling sed to not convert the line endings, and viola, now the lines in files which sed does not change do not show up as changed anymore:

http://www.gnu.org/software/sed/manual/sed.html