You are currently browsing the monthly archive for January 2012.
Terry Speed argues against the temptation to avoid looking hard at larger data sets. Seek…and find answers!
Do some people have a problem looking
at large data sets, and if so, why? I think
the answer is yes, some do, and I offer a
few possible reasons. One is that large data
sets are frequently produced by complex,
multi-step processes, involving technologies
that can be a challenge to understand. As
a result, like the Little Prince—Quand le
mystère est trop impressionnant, on n’ose pas
désobéir—people take such data at face
value. Another possibility is a blind faith
in numbers, a feeling that if there is a lot
of data, the answer that falls out must be
overwhelmingly more probable than any
of the alternatives, and that no artifact will
change the conclusions. My third reason
is that we all need to think harder, because
simply repeating what we used to do with
10 variables is not an option when we have
10,000 variables. A change in perspective
is required. Rather than looking at all our
data, doing some analyses and finishing
off with further looks, with large data sets
the first step is reduced, we need a much
more thorough third step. That is, our focus
needs to be more on looking for things
that might change our conclusions, not
things that support (or fail to support) our
assumptions. Also, we may be unsure what
to do if we see problems. Or, perhaps now
there’s so much data that no single set seems
to warrant as careful consideration as it
might have in the past, before we move on.
Chapman & Hall/CRC
Monographs on Statistics
& Applied Probability
Great Great Great Great Great Great Great Great Great Great!
1, Complete notes of Stanford machine learning course
2, Harmonic means, again again
6, Why ICML? and the summer conferences
7, Sparse Nonparametric Graphical Models
8, Is Bayes Posterior just Quick and Dirty Confidence?
10, Discussions on compressive sensing
11, Top 20 R posts of 2011 (and some R-bloggers statistics)
12, Top ten algorithms preprints of 2011
13, My favorite posts from 2011
14, Music 2011
16, Holiday Readings
17, Some useful extensions for Gmail
18, How to Become an Efficient and Collaborative R Programmer
The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.
Here’s an excerpt:
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 8,700 times in 2011. If it were a concert at Sydney Opera House, it would take about 3 sold-out performances for that many people to see it.
Recent Comments