Tuesday, December 29, 2015

Python split line by tab or multiple spaces

During the process of trying to find the best way to pull out all the tokens from a tab or space delimited line no matter how they are separated, I found an awesome answer but also learned a few things.

It turns out that python has a "regular expression" methods library re. re has a split function and you don't have to give it both tab and space characters, \s+ is all you need The \s apparently covers all white space characters and + includes any combination of them. To wit:

line = 'one\ttwo\t\tthree four \tfive \t\tsix'
re.split('\s+', line)
['one', 'two', 'three', 'four', 'five', 'six']

(source: http://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python)

Also learned: This link shows that | is used to delimit multiple split options in the same re:
http://stackoverflow.com/questions/4998629/python-split-string-with-multiple-delimiters