Discreet Cosine Transform

A few thoughts on ruby, video and other things

Python, Ruby and Dart Part 3: CSV Data

Next I thought I would tackle parsing CSV data in all three languages. What could be more exciting right? Once again, this was born out of actual need - I was recently crunching some CSV data at work. But, I like it as an example (despite both the boring subject matter and “just look in the standard library” nature of the question) exactly because its very real world. I envy the developer that has never been called on to write ETL code, but I bet a lot of you have. It is that kind annoying task that comes up again and again, at least in my world!

Ruby

So admittedly in Ruby, this is as easy as reaching into the standard library. Way, way back in the day there were gems that offered more features and faster parsing for CSVs than the code in the stdlib, but the Ruby maintainers smartly just integrated that code directly into std.

The documentation is straightforward and you can see the functionality is quite versatile, allowing for reading, writing, from files, from file-like IO objects, and from strings.

Perhaps most importantly, it correctly handles the first and most troublesome issue you always run into with CSV data - some field contains a comma in the data, rather than as markup, and your parsing trips on it. For example:

1
2
3
4
5
require 'csv'
list_data = %Q["red","blue","green"\r\n"cyan","blue","magenta, purple"\r\n"1","2","3"]
CSV.parse(list_data) do |row|
  puts row.inspect
 end

Will output:

1
2
3
["red", "blue", "green"]
["cyan", "blue", "magenta, purple"]
["1", "2", "3"]

Note how the string "magenta, purple" remains a single string and doesn’t get parsed into a row with 4 fields. Also note we threw it Windows-style line endings and it correctly dealt with that without us having to change the line termination field.

Python

Very similar in Python, you can just reach into the stdlib to parse CSV data. On first glance the Python library is a bit more feature-rich than the Ruby one - offering things like sniffing out the format of the CSV file and reading direct into a dictionary instead of just arrays.

Where I got a little stumped though is that the 2.7.9 version of the library doesn’t support operating directly on strings. They give an example of how to achieve this functionality by wrapping the wring as a 1 item array, but this doesn’t seem to work with line ends embedded in the string. So you have to split the line first, unlike Ruby, then parse each line you find:

1
2
3
4
5
6
7
import csv

list_data = '"red","blue","green"\r\n"cyan","blue","magenta, purple"\r\n"1","2","3"'

for line in list_data.splitlines():
    for row in csv.reader([line]):
        print row

Once you get through that though, you once again get the correct data, that is magenta, purple comes out right. Of course you wouldn’t need such gymnastics if you really were reading from a file and like Ruby, the library also supports parsing one line at a time instead of having to read all the data into memory first.

Dart

Trying this in Dart is an interesting look at the maturity of the community surrounding Dart. Dart doesn’t have a CSV parser in its standard library. That is not unexpected, as I keep going back to, given its client-side focus. So, we turn to pub.dartlang.org which is Dart’s packaging and publishing system.

There are a few options for CSV parsing, so this part of my trial and research really became a “do they work?” review. Note with dartlang.org, you don’t have the tools you do in Ruby or Python to guage the maturity of a library: such as number of downloads, for a tool like ruby-toolbox.

Several of the libraries I tried did indeed work, but you have to watch out for the output of print fooling you into thinking that it failed the test on magenta, purple.

Here is an example using csv:

1
2
3
4
5
6
7
8
import 'package:csv/csv.dart';

void main() {
  final String listData = '"red","blue","green"\r\n"cyan","blue","magenta, purple"\r\n"1","2","3"';
  final decoder = new CsvToListConverter();
  print(decoder.convert(listData)); //Note here the toString on the output makes it look like the test failed, but:
  print(decoder.convert(listData)[1][2]); //shows it actually is a discreet value of 'magenta, purple'
}

This will output:

1
2
[[red, blue, green], [cyan, blue, magenta, purple], [1, 2, 3]]
magenta, purple

Here is a complete example using csv_sheet:

1
2
3
4
5
6
7
8
9
10
11
12
import 'package:csv_sheet/csv_sheet.dart';

void main() {
  final String listData = '"red","blue","green"\r\n"cyan","blue","magenta, purple"\r\n"1","2","3"';

  var sheet = new CsvSheet(listData);
  sheet.forEachRow((row) {
      print(row); //the toString method here makes it look like the magenta, purple test failed, but:
      print ("Third item is: " + row[3]);

  });
}

This will output:

1
2
3
4
5
6
[red, blue, green]
Third item is: green
[cyan, blue, magenta, purple]
Third item is: magenta, purple
[1, 2, 3]
Third item is: 3

Note though it would appear this library has no way to discover the length of a row, so you would have to already know that information in your code. That seems like a shortcoming.

Conclusion

All three languages have options to help you parse CSV data - if they didn’t in this day and age, I guess we would be a little worried. Ruby and Python obviously have some maturity in this area that Dart lacks, but that doesn’t mean you don’t have options in Dart that work well. We can also safely conclude that parsing CSV data is a terrible use of your time and skills, and here is hoping you don’t have to do it often!

Comments