Terry Speed argues against the temptation to avoid looking hard at larger data sets. Seek…and find answers!

Do some people have a problem looking
at large data sets, and if so, why? I think
the answer is yes, some do, and I offer a
few possible reasons. One is that large data
sets are frequently produced by complex,
multi-step processes, involving technologies
that can be a challenge to understand. As
a result, like the Little Prince—Quand le
mystère est trop impressionnant, on n’ose pas
désobéir—people take such data at face
value. Another possibility is a blind faith
in numbers, a feeling that if there is a lot
of data, the answer that falls out must be
overwhelmingly more probable than any
of the alternatives, and that no artifact will
change the conclusions. My third reason
is that we all need to think harder, because
simply repeating what we used to do with
10 variables is not an option when we have
10,000 variables. A change in perspective
is required. Rather than looking at all our
data, doing some analyses and finishing
off with further looks, with large data sets
the first step is reduced, we need a much
more thorough third step. That is, our focus
needs to be more on looking for things
that might change our conclusions, not
things that support (or fail to support) our
assumptions. Also, we may be unsure what
to do if we see problems. Or, perhaps now
there’s so much data that no single set seems
to warrant as careful consideration as it
might have in the past, before we move on.

Terence’s Stuff: Looking

Advertisements