Posts Tagged ‘ Tutorials ’

Video Tutorial – Rolling 2 Dice: An Intuitive Explanation of The Central Limit Theorem

Video Tutorial – Rolling 2 Dice: An Intuitive Explanation of The Central Limit Theorem

According to the central limit theorem, if random variables, , are independent and identically distributed, is sufficiently large, then the distribution of their sample mean, , is approximately normal, and this approximation is better as increases. One of the most remarkable aspects of the central limit theorem (CLT) is its validity for any parent distribution of […]

Read more »

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to […]

Read more »

Video Tutorial – The Hazard Function is the Probability Density Function Divided by the Survival Function

Video Tutorial – The Hazard Function is the Probability Density Function Divided by the Survival Function

In an earlier video, I introduced the definition of the hazard function and broke it down into its mathematical components.  Recall that the definition of the hazard function for events defined on a continuous time scale is . Did you know that the hazard function can be expressed as the probability density function (PDF) divided by the […]

Read more »

Software Carpentry at UVA, Redux

March 12, 2014
By
Software Carpentry at UVA, Redux

Software Carpentry is an international collaboration backed by Mozilla and the Sloan Foundation comprising a team of volunteers that teach computational competence and basic programming skills to scientists. In addition to a suite of online lessons, ...

Read more »

Less wordy R

March 11, 2014
By
Less wordy R

The Swarm Lab presents a nice comparison of R and Python code for a simple (read ‘one could do it in Excel’) problem. The example works, but I was surprised by how wordy the R code was and decided to check if one could easily produce a shorter version. The beginning is pretty much the […]

Read more »

Can a classifier that never says “yes” be useful?

March 8, 2014
By
Can a classifier that never says “yes” be useful?

Many data science projects and presentations are needlessly derailed by not having set shared business relevant quantitative expectations early on (for some advice see Setting expectations in data science projects). One of the most common issues is the common layman expectation of “perfect prediction” from classification projects. It is important to set expectations correctly so […] Related posts: Setting expectations in data science projects More on ROC/AUC On Being a…

Read more »

Useful Functions in R for Manipulating Text Data

Useful Functions in R for Manipulating Text Data

Introduction In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from […]

Read more »

Bad Bayes: an example of why you need hold-out testing

February 1, 2014
By
Bad Bayes: an example of why you need hold-out testing

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams […] Related posts: Don’t use correlation to track prediction performance Generalized linear models for predicting…

Read more »

Video Tutorial: Breaking Down the Definition of the Hazard Function

Video Tutorial: Breaking Down the Definition of the Hazard Function

The hazard function is a fundamental quantity in survival analysis.  For an event occurring at some time on a continuous time scale, the hazard function, , for that event is defined as , where is the time, is the time of the occurrence of the event. However, what does this actually mean?  In this Youtube […]

Read more »

Coursera Specializations: Data Science, Systems Biology, Python Programming

January 22, 2014
By
Coursera Specializations: Data Science, Systems Biology, Python Programming

I first mentioned Coursera about a year ago, when I hired a new analyst in my core. This new hire came in as a very competent Python programmer with a molecular biology and microbial ecology background, but with very little experience in statistics. I ...

Read more »


Subscribe

Email:

  Subscribe