Posts Tagged ‘ statistics ’

A big problem in our community

December 14, 2017
By
A big problem in our community

Hi all, Kristian Lum, who was already one of my Statistics superheroes for her many interesting papers and great talks, bravely wrote the following text about her experience as a young statistician going to conferences: https://medium.com/@kristianlum/statistics-we-have-a-problem-304638dc5de5 I can’t thank Kristian enough for speaking out. Her experience is both shocking and hardly surprising. Many, many academics […]

Read more »

Getting started with seplyr

December 14, 2017
By
Getting started with seplyr

A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool. For how and why, please check out our new introductory article.

Read more »

How can a statistician help a lawyer?

December 9, 2017
By
How can a statistician help a lawyer?

I’ll be presenting at a webinar on Wednesday, December 13 at 1:00 PM Eastern. The title of the presentation is “Seven questions a statistician and answer for an attorney.” I will discuss, among other things, when common sense applies and when correct analysis can be counter-intuitive. There will be ample time at the end of […]

Read more »

The myth of interpretability of econometric models

December 9, 2017
By
The myth of interpretability of econometric models

There are important discussions nowadays about data modeling, to choose between the “two cultures” (as mentioned in Breiman (2001)), i.e. either econometrics models or machine/statistical learning models. We did discuss this issue recently in Econométrie et Machine Learning (so far only in French) with Emmanuel Flachaire and Antoine Ly. One argument often used by econometricians is the interpretability of econometric models. Or at least the attempt to get an interpretable…

Read more »

How to Avoid the dplyr Dependency Driven Result Corruption

December 6, 2017
By

In our last article we pointed out a dangerous silent result corruption we have seen when using the R dplyr package with databases. To systematically avoid this result corruption we suggest breaking up your dplyr::mutate() statements to be dependency-free (not assigning the same value twice, and not using any value in the same mutate it … Continue reading How to Avoid the dplyr Dependency Driven Result Corruption

Read more »

Please inspect your dplyr+database code

December 2, 2017
By

A note to dplyr with database users: you may benefit from inspecting/re-factoring your code to eliminate value re-use inside dplyr::mutate() statements. If you are using the R dplyr package with a database or with Apache Spark: I respectfully advise you inspect your code to ensure you are not using any values created inside a dplyr::mutate() … Continue reading Please inspect your dplyr+database code

Read more »

Win-Vector LLC announces new “big data in R” tools

November 29, 2017
By
Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster … Continue reading Win-Vector LLC announces new “big data in R” tools

Read more »

Vectorized Block ifelse in R

November 28, 2017
By

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R. From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure. When porting code from one language to another you hope the expressive … Continue reading Vectorized Block ifelse in R

Read more »

The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence (i)

November 28, 2017
By
The Conversion of Subjective Bayesian, Colin Howson, & the problem of old evidence (i)

“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674) What? Did the “old evidence” problem cause Colin Howson to recently […]

Read more »

sliced Wasserstein estimation of mixtures

November 27, 2017
By
sliced Wasserstein estimation of mixtures

A paper by Soheil Kolouri and co-authors was arXived last week about using Wasserstein distance for inference on multivariate Gaussian mixtures. The basic concept is that the parameter is estimated by minimising the p-Wasserstein distance to the empirical distribution, smoothed by a Normal kernel. As the general Wasserstein distance is quite costly to compute, the […]

Read more »


Subscribe

Email:

  Subscribe