Hoopla!

now with extra whiz-bang!

Hoopla!

Ruby: Parsing CSV files quickly

August 03, 2006 · 10 comments

I seem to have a limitless stream of excel files coming at me from clients. Most of them are of the same format: first line is the column name and the rest of the lines are data.

Ruby has excellent baked-in capabilities for handling files, data, and even CSV stuff. Still, it was pretty hard for me to figure out how to turn a CSV file into an array of hashes where each cell was named with the correct column name.

So here y’are folks: an easy way to turn CSV files into an array of hashes.

def csv_to_array(file_location)
  csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
  fields = csv.shift
  csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
end

Tags:········

10 responses so far ↓

  • 1 Simen // Aug 14, 2006 at 04:44 AM

    Have you had a look at FasterCSV [fastercsv.rubyforge.org]?
  • 2 s. potter // Sep 02, 2006 at 08:54 AM

    what makes FasterCSV faster?
  • 3 Jens // Feb 10, 2007 at 03:29 AM

    Hi, this is about 50 times slower than reading the CSV file into an array of arrays. Is this possible or am I missing something obvious? I was using this code before: def csvread(filespec) out = {} File.open(filespec, 'r') do |infile| while (line = infile.gets) out << line.split(',') end end return out end Using Rails 1.2.1 on OS X 10.4 (Macbook dual 2GHz Core Duo 2). Jens
  • 4 Danger // Feb 17, 2007 at 09:44 AM

    I'll be your code would be loads faster - it looks pretty good (sorry about the poor comment formatting). When I wrote the CSV => array of hashes I valued the hashing of appropriate data far more than speed. I'll probably use your code when I don't need a hash. Thanks!
  • 5 Caleb Jones // Feb 23, 2007 at 02:59 PM

    The problem with simply splitting on ',' is that some fields in a CSV may contain commas themselves. In this case, the convention is the wrap such items in quotes ("). Simply splitting on ',' will break when this occurs. Not sure if the Ruby CSV class handles this or not (though it would seem silly to write a whole parsing hierarchy for CSV files that failed on something this simple).
  • 6 Danger // Feb 23, 2007 at 04:14 PM

    You're right, this is a very basic implementation of CSV parsing. I imagine fasterCSV does a much better job. I threw this up simply because "ruby csv parse" doesn't return any good hits on Google and I wanted to give folks at least *some* way to make it work.
  • 7 Ara Vartanian // Dec 21, 2007 at 05:31 PM

    Lovely. Bravo. Just what I needed.

  • 8 Mike // May 14, 2008 at 04:31 AM

    What’s Up with comments ? :( I may recommend, a book about this post “Parsing CSV files quickly” in Ruby.

    Check it : http://www.springerlink.com/content/j3n125411240r200/

  • 9 Neil Murphy // Jun 04, 2008 at 08:40 AM

    The chapter from the book is charged at $25 – for four pages. Calling it a ripoff seems a serious understatement

  • 10 Scott White // Jun 05, 2008 at 01:59 PM

    I was able to simplify the Hash creation by using the array zip method:

    csv.collect { |record| Hash[*fields.zip(record).flatten ] }

    though it doesn’t turn all the records into strings… but I’m not sure why that’s necessary anyway.

Leave a Comment