Skip to content

File-IO: First Microbenchmark results

June 6, 2012

The improved File-IO is essentially complete. I have implemented new File-IO in C++ using Rcpp. The hyperSpec S4 objects are quite complex, but using such features as Rcpp::ExpressionVector for expressions, and Rcpp::Language for calling R (see previous post) I can declare the final S4 objects in Rcpp, and return them as an unevaluated LANGSXP to R’s calling function, where they are evaluated in the return. This approach keeps copying to a minimum and lessens the general overhead of using R-C++ since no processing is done in R.

A commonly encountered format is the Kaiser.spc case where several files contain spectra with the same wavelengths. I have used the Microbenchmark package to compare the performance of the new rcpp.read.Kaiser function to the preexisting read.spc.KaiserMap. The data set consists of 108 7.3 kB files in default Thermo/Galactic SPC format .

Here is a table of the results of the microbenchmark, drawn from 500 calls on each the functions. Naturally, NewIO stands for the new function rcpp.read.Kaiser() and OriginalIO stands for the original read.spc.KaiserMap.

Unit: milliseconds

function min lq median uq max
1 NewIO 14.75752 16.37736 17.94751 37.71903 65.11417
2 OriginalIO 570.04723 579.44523 582.22526 613.60385 846.42801

Here is the corresponding box plot:

Clearly there has been a significant improvement. While R’s readbin() function is very efficient, it does not have the same capacity for linear formatting of binary data as C/C++. Further, by using Rcpp it is possible to process the data into complex S4 R objects from C++, proving an efficient and elegant solution to FileIO.

From → Uncategorized

Leave a Comment

Leave a comment