Homework 10: in which we refine our web-crawler from the previous assignment, by way of further working with regular expressions, and improving our estimates of page-rank. (This assignment ripped off from Vince Vu, with permission.) Introduction...

Homework 10: in which we refine our web-crawler from the previous assignment, by way of further working with regular expressions, and improving our estimates of page-rank. (This assignment ripped off from Vince Vu, with permission.) Introduction...

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \DeclareMathOperator*{\argmin}{argmin} \] (My notes for this lecture are too incomplete to be worth typing up, so here's the sketch.) Methods, Models, Simulations Statistical methods try t...

(My notes from this lecture are too fragmentary to post; here's the sketch.) What should you remember from this class? Not: my mistakes (though remember that I made them). Not: specific packages and ways of doing things (those will change). Not: t...

This was not one of my better performances as a teacher. I felt disorganized and unmotivated, which is a bit perverse, since it's the third time I've taught the class, and I know the material very well by now. The labs were too long, and my attempts...

(My notes for this lecture are too fragmentary to write up properly; here's the sketch.) Two forms of statistical uncertainty: (I) How much would our answers change if the data were different? (II) How diverse are the answers which don't make use hat...

Lecture 18: Deterministic, Unconstrained Optimization. The trade-off of approximation versus time. Newton's method: motivation from Taylor expansion; as gradient descent with adaptive step-size; pros and cons. Coordinate descent instead of multivar...

Lecture 19: Stochastic, Constrained, and Penalized Optimization. Constrained optimization: maximizing multinomial likelihood as an example of why constraints matter. The method of Lagrange multipliers for equality constraints. Lagrange multipliers...

Lecture 20: Text as data. Overview of the character data type, and of strings. Basic string operations: extracting and replacing substrings; splitting strings into character vectors; assembling character vectors into strings; tabulating counts of s...

Lecture 21: Regular expressions. Why we need ways of describing patterns of strings, and not just specific strings. The syntax and semantics of regular expressions: constants, concatenation, alternation, repetition. Back-references and capture group...

Lecture 22: Importing data from webpages. Example: scraping weblinks. Using regular expressions again (with multiple capture groups). Building networks of political books. Introduction to Statistical Computing

(My notes for this lecture are too fragmentary to post. What follows is the sketch.) The "raw data" is often not in the format most useful for the model one wants to work with. Lots of statistical computing work is about moving the information from...

Lecture 25: The idea of a relational database. Tables, fields, keys, normalization. Server-client model. Example of working with a database server. Intro to SQL, especially SELECT. Aggregation in databases is like split/apply/combine. Joining tables...

(My notes from this lecture are too fragmentary to type up; here's the sketch) Programmer time is (usually) much more valuable than computer time; therefore, "premature optimization is the root of all evil". That said, computer time isn't free, ...

Attention conservation notice: Navel-gazing. Paper manuscripts completed: 4 Papers accepted: 3 Papers rejected: 4 (fools! we'll show you all!) Papers in revise-and-resubmit purgatory: 2 Papers in refereeing limbo: 1 Papers with co-authors waitin...

There’s lots of overlap but I put each paper into only one category. Also, I’ve included work that has been published in 2013 as well as work that has been completed this year and might appear in 2014 or later. So you can can think of this list as representing roughly two years’ work. Political […]The post 2013 appeared first on Statistical Modeling, Causal Inference, and Social Science.

I teach several courses every year and the most difficult to pull off is FORE224/STAT202: regression modeling. The academic promotion application form in my university includes a section on one’s ‘teaching philosophy’. I struggle with that part because I suspect I lack anything as grandiose as a philosophy when teaching: as most university lecturers I […]

I often need to build a predictive model that estimates rates. The example of our age is: ad click through rates (how often a viewer clicks on an ad estimated as a function of the features of the ad and the viewer). Another timely example is estimating default rates of mortgages or credit cards. You […] Related posts: What does a generalized linear model do? The equivalence of logistic regression…

Etienne LeBel writes: You’ve probably already seen it, but I thought you could have a lot of fun with this one!! The article, with the admirably clear title given above, is by James McNulty, Michael Olson, Andrea Meltzer, Matthew Shaffer, and begins as follows: For decades, social psychological theories have posited that the automatic processes […]The post “Though They May Be Unaware, Newlyweds Implicitly Know Whether Their Marriage Will Be…

Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. Shravan wrote: Isn’t it true that I am already done if P(theta>0) is much larger than P(thetaThe post No on Yes/No decisions appeared first on Statistical Modeling, Causal Inference, and Social Science.

Posts by page views Interview with a forced convert to R from Matlab A first step towards R from spreadsheets Plot ranges of data in R A statistical review of ‘Thinking, Fast and Slow’ by Daniel Kahneman The 3 dots construct in R Translating between R and SQL: the basics An R debugging example R […] The post Blog recap of 2013 appeared first on Burns Statistics.