SparkR gapply mess - 2017-05-12 08:56:31

Hello, Do not assume anything. Never. Ever. Specially with SparkR (Apache Spark 2.1.0). When using the gapply function, maybe you want to return the key to mark the results in a function as follows: countRows <- function(key, values) { df <- data.frame(key=key, nvalues=nrow(values)) return(df) } count <- gapplyCollect(data, "keyAttribute", countRows) countRows <- function(key, values) { df <- data.frame(key=key, nvalues=nrow(values)) return(df) } count <- gapplyCollect(data, "keyAttribute", countRows) SURPRISE. You can’t. You should get this error:

On R and parallelism - 2016-03-24 01:20:01

R, that language that is has gained its momentum due to the people discovering the need of analyzing data. There are other several alternatives but this is my poison (or poisson!) of choice. In this post we will try to cover how to parallelize your R code with the package parallel.  Why bother? One of my main concerns when I was starting with R is that “WOW! Everything runs in one thread!