Posts Tagged ‘ R ’

Getting started with seplyr

December 14, 2017
By
Getting started with seplyr

A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool. For how and why, please check out our new introductory article.

Read more »

nrow, references and copies

December 10, 2017
By
nrow, references and copies

    Hi all, This post deals with a strange phenomenon in R that I have noticed while working on unbiased MCMC. Reducing the problem to a simple form, consider the following code, which iteratively samples a vector ‘x’ and stores it in a row of a large matrix called ‘chain’ (I’ve kept the MCMC […]

Read more »

How to Avoid the dplyr Dependency Driven Result Corruption

December 6, 2017
By

In our last article we pointed out a dangerous silent result corruption we have seen when using the R dplyr package with databases. To systematically avoid this result corruption we suggest breaking up your dplyr::mutate() statements to be dependency-free (not assigning the same value twice, and not using any value in the same mutate it … Continue reading How to Avoid the dplyr Dependency Driven Result Corruption

Read more »

Please inspect your dplyr+database code

December 2, 2017
By

A note to dplyr with database users: you may benefit from inspecting/re-factoring your code to eliminate value re-use inside dplyr::mutate() statements. If you are using the R dplyr package with a database or with Apache Spark: I respectfully advise you inspect your code to ensure you are not using any values created inside a dplyr::mutate() … Continue reading Please inspect your dplyr+database code

Read more »

Win-Vector LLC announces new “big data in R” tools

November 29, 2017
By
Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster … Continue reading Win-Vector LLC announces new “big data in R” tools

Read more »

Vectorized Block ifelse in R

November 28, 2017
By

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R. From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure. When porting code from one language to another you hope the expressive … Continue reading Vectorized Block ifelse in R

Read more »

sliced Wasserstein estimation of mixtures

November 27, 2017
By
sliced Wasserstein estimation of mixtures

A paper by Soheil Kolouri and co-authors was arXived last week about using Wasserstein distance for inference on multivariate Gaussian mixtures. The basic concept is that the parameter is estimated by minimising the p-Wasserstein distance to the empirical distribution, smoothed by a Normal kernel. As the general Wasserstein distance is quite costly to compute, the […]

Read more »

Arbitrary Data Transforms Using cdata

November 22, 2017
By
Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately: Coordinatized Data: A Fluid Data Specification Data Wrangling at Scale Fluid Data Big Data Transforms. What I want to do now is "write a bit more, so I finally feel I have been concise." The cdata R package supplies general data transform operators. The … Continue reading Arbitrary Data Transforms Using cdata

Read more »

Le Monde puzzle [#1029]

November 21, 2017
By
Le Monde puzzle [#1029]

A convoluted counting Le Monde mathematical puzzle: A film theatre has a waiting room and several projection rooms. With four films on display. A first set of 600 spectators enters the waiting room and vote for their favourite film. The most popular film is projected to the spectators who voted for it and the remaining […]

Read more »

βCEA

November 21, 2017
By
βCEA

Recently, I've been doing a lot of work on the beta version of BCEA (I was after all born in Agrigento $-$ in the picture to the left $-$, which is a Greek city, so a beta version sounds about right...). The new version is only available as a...

Read more »


Subscribe

Email:

  Subscribe