This post is for JSM2013. I will put useful links here and I will update this post during the meeting.

What I have learned from this meeting (Key words of this meeting):

Big Data, Bayesian, Statistical Efficiency vs Computational Efficiency

I was in Montreal from Aug 1st to Aug 8th for JSM2013 and traveling.

(Traveling in Quebec: Olympic Stadium; Underground City; Quebec City; Montreal City; basilique Nortre-Dame; China Town)

(Talks at JSM2013: Jianqing Fan; Jim Berger; Nate Silver; Tony Cai; Han Liu; Two Statistical Peters)

(My Presentation at JSM2013)

The following is the list for the talks I was there:

JSM

• Aug 4th
• 2:05 PM Analyzing Large Data with R and MonetDB — Thomas Lumley, University of Auckland
• 2:25 PM Empirical Likelihood and U-Statistics in Survival Analysis — Zhigang Zhang, Memorial Sloan-Kettering Cancer Center ; Yichuan Zhao, Georgia State University
• 2:50 PM Joint Unified Confidence Region for the Parameters of Branching Processes with Immigration — Pin Ren ; Anand Vidyashankar, George Mason University
• 3:05 PM Time-Varying Additive Models for Longitudinal Data — Xiaoke Zhang, University of California Davis ; Byeong U. Park, Seoul National University ; Jane-Ling Wang, UC Davis
• 3:20 PM Leveraging as a Paradigm for Statistically Informed Large-Scale Computation — Michael W. Mahoney, Stanford University
• 4:05 PM Joint Estimation of Multiple Dependent Gaussian Graphical Models — Yuying Xie, The University of North Carolina at Chapel Hill ; Yufeng Liu, The University of North Carolina ; William Valdar, UNC-CH Genetics
• 4:30 PM Computational Strategies in Regression of Big Data — Ping Ma, University of Illinois at Urbana-Champaign
• 4:55 PM Programming with Big Data in R — George Ostrouchov, Oak Ridge National Laboratory ; Wei-Chen Chen, Oak Ridge National Laboratory ; Drew Schmidt, University of Tennessee ; Pragneshkumar Patel, University of Tennessee
• 5:20 PM Inference and Optimalities in Estimation of Gaussian Graphical Model — Harrison Zhou, Yale University
• Aug 5th
• 99 Mon, 8/5/2013, 8:30 AM – 10:20 AM CC-710a
• Introductory Overview Lecture: Twenty Years of Gibbs Sampling/MCMC — Other Special Presentation
• 8:35 AM Gibbs Sampling and Markov Chain Monte Carlo: A Modeler’s Perspective — Alan E. Gelfand, Duke University
• 9:25 AM The Theoretical Underpinnings of MCMC — Jeffrey S. Rosenthal, University of Toronto
• 10:15 AM Floor Discussion
• 166 * Mon, 8/5/2013, 10:30 AM – 12:20 PM CC-520c
• Statistical Learning and Data Mining: Winners of Student Paper Competition — Topic Contributed Papers
• 10:35 AM Multicategory Angle-Based Large Margin Classification — Chong Zhang, UNC-CH ; Yufeng Liu, The University of North Carolina
• 10:55 AM Discrepancy Pursuit: A Nonparametric Framework for High-Dimensional Variable Selection — Li Liu, Carnegie Mellon University ; Kathryn Roeder, CMU ; Han Liu, Princeton University
• 11:15 AM PenPC: A Two-Step Approach to Estimate the Skeletons of High-Dimensional Directed Acyclic Graphs — Min Jin Ha ; Wei Sun, UNC Chapel Hill ; Jichun Xie, Temple University
• 11:35 AM An Underdetermined Peaceman-Rachford Splitting Algorithm with Application to Highly Nonsmooth Sparse Learning Problems— Zhaoran Wang, Princeton University ; Han Liu, Princeton University ; Xiaoming Yuan, Hong Kong Baptist University
• 11:55 AM Latent Supervised Learning — Susan Wei, UNC
• 12:15 PM Floor Discussion
• 220 Mon, 8/5/2013, 2:00 PM – 3:50 PM CC-710b
• 2:05 PM Statistics Meets Computation: Efficiency Trade-Offs in High Dimensions — Martin Wainwright, UC Berkeley
• 3:35 PM Floor Discussion
• 267 Mon, 8/5/2013, 4:00 PM – 5:50 PM CC-517ab
• 4:05 PM JSM Welcomes Nate Silver — Nate Silver, FiveThirtyEight.com
• 209305 Mon, 8/5/2013, 6:00 PM – 8:00 PM I-Maisonneuve, JSM Student Mixer, Sponsored by Pfizer — Other Cmte/Business, ASA , Pfizer, Inc.
• 268 Mon, 8/5/2013, 8:00 PM – 9:30 PM CC-517ab
• 8:05 PM Ars Conjectandi: 300 Years Later — Hans Rudolf Kunsch, Seminar fur Statistik, ETH Zurich
• Aug 6th
• 280 * Tue, 8/6/2013, 8:30 AM – 10:20 AM CC-510a
• Statistical Inference for Large Matrices — Invited Papers
• 8:35 AM Conditional Sparsity in Large Covariance Matrix Estimation — Jianqing Fan, Princeton University ; Yuan Liao, University of Maryland ; Martina Mincheva, Princeton University
• 9:05 AM Multivariate Regression with Calibration — Lie Wang, Massachusetts Institute of Technology ; Han Liu, Princeton University ; Tuo Zhao, Johns Hopkins University
• 9:35 AM Principal Component Analysis for High-Dimensional Non-Gaussian Data — Fang Han, Johns Hopkins University ; Han Liu, Princeton University
• 10:05 AM Floor Discussion
• 325 * ! Tue, 8/6/2013, 10:30 AM – 12:20 PM CC-520b
• Modern Nonparametric and High-Dimensional Statistics — Invited Papers
• 10:35 AM Simple Tiered Classifiers — Peter Gavin Hall, University of Melbourne ; Jinghao Xue, University College London ; Yingcun Xia, National University of Singapore
• 11:05 AM Sparse PCA: Optimal Rates and Adaptive Estimation — Tony Cai, University of Pennsylvania
• 11:35 AM Statistical Inference in Compound Functional Models — Alexandre Tsybakov, CREST-ENSAE
• 12:05 PM Floor Discussion
• 392 Tue, 8/6/2013, 2:00 PM – 3:50 PM CC-710a
• Introductory Overview Lecture: Big Data — Other Special Presentation
• 2:05 PM The Relative Size of Big Data — Bin Yu, Univ of California at Berkeley
• 2:55 PM Divide and Recombine (D&R) with RHIPE for Large Complex Data — William S. Cleveland, Purdue Universith
• 3:45 PM Floor Discussion
• 445 Tue, 8/6/2013, 4:00 PM – 5:50 PM CC-517ab
• Deming Lecture — Invited Papers
• 4:05 PM Industrial Statistics: Research vs. Practice — Vijay Nair, University of Michigan
• Aug 7th
• 10:35 AM Bayesian and Frequentist Issues in Large-Scale Inference — Bradley Efron, Stanford University
• 11:20 AM Criteria for Bayesian Model Choice with Application to Variable Selection — Jim Berger, Duke University ; Susie Bayarri, University of Valencia ; Anabel Forte, Universitat Jaume I ; Gonzalo Garcia-Donato, Universidad de Castilla-La Mancha
• 571 Wed, 8/7/2013, 2:00 PM – 3:50 PM CC-511c
• Statistical Methods for High-Dimensional Sequence Data — Invited Papers
• 2:05 PM Linkage Disequilibrium in Sequencing Data: A Blessing or a Curse? — Alkes L. Price, Harvard School of Public Health
• 2:25 PM Statistical Prioritization of Sequence Variants — Lisa Joanna Strug, The Hospital for Sick Children and University of Toronto ; Weili Li, The Hospital for Sick Children and University of Toronto
• 2:45 PM On Some Statistical Issues in Analyzing Whole-Genome Sequencing Data — Dan Liviu Nicolae, The University of Chicago
• 3:05 PM Statistical Methods for Studying Rare Variant Effects in Next-Generation Sequencing Association Studies — Xihong Lin, Harvard School of Public Health
• 3:25 PM Adjustment for Population Stratification in Association Analysis of Rare Variants — Wei Pan, University of Minnesota ; Yiwei Zhang, University of Minnesota ; Binghui Liu, University of Minnesota ; Xiaotong Shen, University of Minnesota
• 3:45 PM Floor Discussion
• 612 Wed, 8/7/2013, 4:00 PM – 5:50 PM CC-517ab
• COPSS Awards and Fisher Lecture — Invited Papers
• 4:05 PM From Fisher to Big Data: Continuities and Discontinuities — Peter Bickel, University of California – Berkeley
• 5:45 PM Floor Discussion
• Aug 8th
• 621 Thu, 8/8/2013, 8:30 AM – 10:20 AM CC-516d
• Recent Advances in Bayesian Computation — Invited Papers
• 8:35 AM An Adaptive Exchange Algorithm for Sampling from Distribution with Intractable Normalizing Constants — Faming Liang, Texas A&M University
• 9:00 AM Efficiency of Markov Chain Monte Carlo for Bayesian Computation — Dawn B Woodard, Cornell University
• 9:25 AM Scalable Inference for Hierarchical Topic Models — John W. Paisley, University of California, Berkeley
• 9:50 AM Augmented Particle Filters — Yuguo Chen, University of Illinois at Urbana-Champaign
• 10:15 AM Floor Discussion
• 661 * ! Thu, 8/8/2013, 10:30 AM – 12:20 PM CC-710b
• Patterns and Extremes: Developments and Review of Spatial Data Analysis — Invited Papers
• 10:35 AM Multivariate Max-Stable Spatial Processes — Marc G. Genton, KAUST ; Simone Padoan, Bocconi University of Milan ; Huiyan Sang, TAMU
• 10:55 AM Approximate Bayesian Computing for Spatial Extremes — Robert James Erhardt, Wake Forest University ; Richard Smith, The University of North Carolina at Chapel Hill

This is from a post Connected objects and a reconstruction theorem:

A common theme in mathematics is to replace the study of an object with the study of some category that can be built from that object. For example, we can

• replace the study of a group  $G$ with the study of its category $G\text{-Rep}$ of linear representations,
• replace the study of a ring $R$ with the study of its category $R\text{-Mod}$ of $R$-modules,
• replace the study of a topological space $X$ with the study of its category $\text{Sh}(X)$ of sheaves,

and so forth. A general question to ask about this setup is whether or to what extent we can recover the original object from the category. For example, if $G$ is a finite group, then as a category, the only data that can be recovered from $G\text{-Rep}$ is the number of conjugacy classes of $G$, which is not much information about $G$. We get considerably more data if we also have the monoidal structure on $G\text{-Rep}$, which gives us the character table of $G$ (but contains a little more data than that, e.g. in the associators), but this is still not a complete invariant of $G$. It turns out that to recover $G$ we need the symmetric monoidal structure on $G\text{-Rep}$; this is a simple form of Tannaka reconstruction.

The evidence in large medical data sets is direct, but indirect as well – and there is just too much of the indirect evidence to ignore. If you want to prove that your drug of choice is good or bad your evidence is not just how it does, it is also how all the other drugs do. And that is a crucial point that doesn’t fit easily into the frequentist world, which is a world of direct evidence (very often, but not always); and it also doesn’t fit extremely well into the formal Bayesian world, because the indirect information isn’t actually the prior distribution, it is evidence of a prior distribution, and that in some sense is not as neat. Neatness counts in science. Things that people can understand and really manipulate are terribly important.

“So I have been very interested in massive data sets not because they are massive but because they seem to offer opportunities to think about statistical inferences from the ground up again.”

The Fisher–Pearson –Neyman paradigm dating from around 1900 was, he says, “like a light being switched on. But it is so beautiful and so almost airtight that it is pretty hard to improve on; and that means that it is very hard to rethink what is good or bad about
statistics.

“Fisher of course had this wonderful view of how you do what I would call small-sample inference. You tend to get very smart people trying to improve on this kind of area, but you really cannot do that very well because there is a limited amount that is available to work on. But now suddenly there are these problems that have a different flavour. It really is quite different doing ten thousand estimates at once. There is evidence always lurking around the edges. It is hard to say where that evidence is, but it’s there. And if you ignore it you are just not going to do a good job.

“Another way to say it is that a Bayesian prior is an assumption of an infinite amount of past relevant experience. It is an incredibly powerful assumption, and often a very useful assumption for moving forward with complicated data analysis. But you cannot forget that you have just made up a whole bunch of  data.

“So of course the trick for Bayesians is to do their ‘making up’ part without really influencing the answer too much. And that is really
tricky in these higher-dimensional problems.”

1. Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ
2. A Funny Thing Happened on the Way to Academia . . .
4. Perspective: “Why C++ Is Not ‘Back’”
5. Is Fourier analysis a special case of representation theory or an analogue?
6. The Beauty of Bioconductor
7. The State of Statistics in Julia
8. Open Source Misfeasance
9. Book review: The Signal and The Noise
10. Should the Cox Proportional Hazards model get the Nobel Prize in Medicine?
11. The most influential data scientists on Twitter
12. Here is an interesting review of Nate Silver’s book. The interesting thing about the review is that it doesn’t criticize the statistical content, but criticizes the belief that people only use data analysis for good. This is an interesting theme we’ve seen before. Gelman also reviews the review.—–Simply Statistics
13. Video : “Matrices and their singular values” (1976)
14. Beyond Computation: The P vs NP Problem – Michael Sipser—-This talk is arguably the very best introduction to computational complexity .
15. What are some of your personal guidelines for writing good, clear code?
16. How do you explain Machine learning and Data Mining to non CS people?
17. Suggested New Year’s resolution: start a blog:  A blog forces you to articulate your thoughts rather than having vague feelings about issues; You also get much more comfortable with writing, because you’re doing it rather than thinking about doing it; If other people read your blog you get to hear what they think too. You learn a lot that way. || Set aside time for your blog every day. Keep notes for yourself on bloggy subjects (write a one-line gmail to yourself with the subject “blog ideas”).
18. The most influential data scientists on Twitter
19. Tips on job market interviews
20. The age of the essay

These days I have been working with computation and programming languages. I want to share something with you here.

1. You cannot expect C++ to magically make your code faster. If speed is of concern, you need profiling to find the bottleneck instead of blind guessing.——Yan Zhou. Thus we have to learn to know how to profile an program in R, Matlab, C++, Python.
2. When something complicated does not work, I generally try to restart with something simpler, and make sure it works.——Dirk Eddelbuettel.
3. If you’re calling your function thousands or millions of times, then it might pay to closely examine your memory allocation strategies and figure out what’s temporary.——Christian Gunning.
4. No, your main issue is not thinking about the computation.  As soon as you write something like
arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
you are in theory land which has very little relationship to practical numerical linear algebra.  If you want to perform linear algebra calculations like weighted least squares you should first take a bit of time to learn about numerical linear algebra as opposed to theoretical linear algebra.  They are very different disciplines.  In theoretical linear algebra you write the solution to a system of linear equations as above, using the inverse of the system matrix.  The first rule of numerical linear algebra is that you never calculate the inverse of a matrix, unless you only plan to do toy examples.  You mentioned sizes of 4000 by 4000 which means that the method you have chosen is doing thousands of times more work than necessary (hint: how do you think that the inverse of a matrix is calculated in practice? – ans: by solving n systems of equations, which you are doing here when you could be solving only one).
Dirk and I wrote about 7 different methods of solving least squares problems in our vignette on RcppEigen.  None of those methods involve taking the inverse of an n by n matrix.
R and Rcpp and whatever other programming technologies come along will never be a “special sauce” that takes the place of thinking about what you are trying to do in a computation.——Douglas Bates. |//[[Rcpp::depends(RcppEigen)]]
| #include <RcppEigen.h>
|
| typedef Eigen::MatrixXd          Mat;
| typedef Eigen::Map<Mat>          MMat;
| typedef Eigen::HouseholderQR<Mat>        QR;
| typedef Eigen::VectorXd             Vec;
| typedef Eigen::Map<Vec>          MVec;
|
| // [[Rcpp::export]]
|
| Rcpp::List wtls(const MMat X, const MVec y, const MVec sqrtwts) {
|     return Rcpp::List::create(Rcpp::

Named(“betahat”) =
|                             QR(sqrtwts.asDiagonal()*X).solve(sqrtwts.asDiagonal()*y));
| }
5. Repeatedly calling an R function is probably not the smartest thing to do in an otherwise complex and hard to decipher program.—-Dirk Eddelbuettel.
6. Computers don’t do random things, unlike human beings. Something worked once, is very likely to work whatever times you repeat it as long as the input is the same (unless the function has side effect). So repeating it 1000 times is the same as once.——Yan Zhou
7. Yan Zhou: Here are a few things people usually do before asking in a mailing list (not just Rcpp list, any such lists like R-help, StackOverflow, etc).
1. I write a program, it crashes,
2. I find out the site of crash
3. I make the program simpler and simpler until it is minimal and the crash is now reproducible.
4. I still cannot figure out what is wrong with that four or five lines that crash the minimal example
8. It does not matter how stupid your questions are. We all asked silly questions before, that is how we learn. But it matters you put in effort to ask the right question. The more effort you put into it, the more specific question you ask and more helpful answers you get.——Yan Zhou.

In my office I have two NIPS posters on the wall, 2011 and 2012. But I have not been there and I am not computer scientist neither. But anyway I like NIPS without reason. Now it’s time for me to organize posts from others:

And among all of the posts, there are several things I have to digest later on:

1. One tutorial on Random Matrices, by Joel Tropp. People concluded in their posts that

Basically, break random matrices down into a sum of simpler, independent random matrices, then apply concentration bounds on the sum.—. The basic result is that if you love your Chernoff bounds and Bernstein inequalities for (sums of) scalars, you can get almost exactly the same results for (sums of) matrices.—.

2. “This year was definitely all about Deep Learning,”  said. The Geomblog mentioned that it’s been in the news recently because of the Google untrained search for youtube cats, the methods of deep learning (basically neural nets without lots of back propagation) have been growing in popularity over a long while. And we have to spend sometime to read Deep Learning and the evolution of data models, which is related with manifold learning.
3. “Another trend that’s been around for a while, but was striking to me, was the detailed study of Optimization methods.”—The Geomblog.  There are at least two different workshops on optimization in machine learning (DISC and OPT), and numerous papers that very carefully examined the structure of optimizations to squeeze out empirical improvements.
4. Kernel distances:  An introduction to kernel distance from The Geomblog. “Scott Aaronson (at his NIPS invited talk) made this joke about how nature loves ℓ2. The  kernel distance is “essentially” the ℓ2 variant of EMD (which makes so many things easier). There’s been a series of papers by Sriperumbudur et al. on this topic, and in a series of works they have shown that (a) the kernel distance captures the notion of “distance covariance” that has become popular in statistics as a way of testing independence of distributions. (b) as an estimator of distance between distributions, the kernel distance has more efficient estimators than (say) the EMD because its estimator can be computed in closed form instead of needing an algorithm that solves a transportation problem and (c ) the kernel that optimizes the efficient of the two-sample estimator can also be determined (the NIPS paper).”
5. Spectral Methods for Latent Models: Spectral methods for latent variable models are based upon the method of moments rather than maximum likelihood.

Besides the papers mentioned in the above hot topics, there are some other papers from Memming‘s post:

1. Graphical models via generalized linear models: Eunho introduced a family of graphical models with GLM marginals and Ising model style pairwise interaction. He said the Poisson-Markov-Random-Fields version must have negative coupling, otherwise the log partition function blows up. He showed conditions for which the graph structure can be recovered with high probability in this family.
2. TCA: High dimensional principal component analysis for non-gaussian data: Using an elliptical copula model (extending the nonparanormal), the eigenvectors of the covariance of the copula variables can be estimated from Kendall’s tau statistic which is invariant to the nonlinearity of the elliptical distribution and the transformation of the marginals. This estimator achieves close to the parametric convergence rate while being a semi-parametric model.

Update: Make sure to check the lectures from the prominent 26th Annual NIPS Conference filmed @ Lake Tahoe 2012. Also make sure to check the NIPS 2012 Workshops, Oral sessions and Spotlight sessions which were collected for the Video Journal of Machine Learning Abstracts – Volume 3.

1. Grad Student’s Guide to Good Coffee+Grad Student’s Guide to Good Tea
2. Favorite Apps for Work and Life
3. estimating a constant (not really)
4. Reinforcement Learning in R: An Introduction to Dynamic Programming
5. The Future of Machine Learning (and the End of the World?)
6. 10 Papers Every Programmer Should Read (At Least Twice)
7. R in the Press
8. On Chomsky and the Two Cultures of Statistical Learning
9. Speech Recognition Breakthrough for the Spoken, Translated Word
10. Frequentist vs Bayesian
11. w4s – the awesomeness we’re experiencing
12. Why is the Gaussian so pervasive in mathematics?
13. C++ Blogs that you Regularly Follow
14. An interview with Brad Efron about scientific writing. I haven’t watched the whole interview, but I do know that Efron is one of my favorite writers among statisticians.
15. Slidify, another approach for making HTML5 slides directly from R.  (1) It is still just a little too hard to change the theme/feel of the slides (2) The placement/insertion of images is still a little clunky, Google Docs has figured this out, if they integrated the best features of Slidify, Latex, etc. into that system, it will be great.
16. Statistics is still the new hotness. Here is a Business Insider list about 5 statistics problems that will“change the way you think about the world”.
17. New Yorker, especially the line,”statisticians are the new sexy vampires, only even more pasty” (via Brooke A.)
18. The closed graph theorem in various categories
19. Got spare time? Watch some videos about statistics
20. About the first Borel-Cantelli lemma
21. Yihui Xie—-The Setup
22. Best Practices for Scientific Computing

Python is great and I think will be also great.  For pure mathematics, it has lots of symbol calculations, since pure mathematics is abstract and powerful, like differential geometry, commutative algebra, algebraic geometry, and so on. However, science is nothing but experiment and computation. We also need powerful computational software to help us to carry out the result by powerful computation. Sage is your choice !  Since Sage claims that

Sage is a free open-source mathematics software system licensed under the GPL. It combines the power of many existing open-source packages into a common Python-based interface.
Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.

Not only for pure mathematics, today I happened to see a blog post about using Sage to calculate high moments of Gaussian:

var('m, s, t')
mgf(t) = exp(m*t + t^2*s^2/2)
for i in range(1, 11):
    derivative(mgf, t, i).subs(t=0)
which leads to the following result:
m
m^2 + s^2
m^3 + 3*m*s^2
m^4 + 6*m^2*s^2 + 3*s^4
m^5 + 10*m^3*s^2 + 15*m*s^4
m^6 + 15*m^4*s^2 + 45*m^2*s^4 + 15*s^6
m^7 + 21*m^5*s^2 + 105*m^3*s^4 + 105*m*s^6
m^8 + 28*m^6*s^2 + 210*m^4*s^4 + 420*m^2*s^6 + 105*s^8
m^9 + 36*m^7*s^2 + 378*m^5*s^4 + 1260*m^3*s^6 + 945*m*s^8
m^10 + 45*m^8*s^2 + 630*m^6*s^4 + 3150*m^4*s^6 + 4725*m^2*s^8 + 945*s^10
Go Python! Go Sage!

Recently, I have heard a lot about the disadvantages of frequentist statistics, including the complain about p value, which is a hot topic due to the God particle.

Professor Kruschke, J.K. gave a talk on Doing Bayesian Data Analysis @ Michigan State University on September. He mentioned a concept “Intention“, including intended hypothesis, intended experiments, intended sampling. Basically he explained lots of frequentist procedure for doing statistics are intended procedure, which is not science, since everything depends on people’s intention. If you want to know more about this, please refer to the paper.

Today I came across the following a blog post from Statistical Modeling, Causal Inference, and Social Science, which is also about the intention issue about frequentist statistics:

Sometimes the problem is that the frequentist criterion being used is not of applied relevance. Consider a simple problem such as estimating a proportion p, given y successes out of n trials, where n=100 and y=0. The best estimate of p will be different if I tell you that p is the probability of a rare disease, compared to if I tell you that p is the proportion of African Americans who plan to vote for Mitt Romney.

I do need some frequentist people to explain this intention issue, since I think it’s kind of reasonable questioning. Any comments?

Update:

The following cartoon caused a fight between Frequentist and Bayesian:

1. A post from Andrew: I don’t like this cartoon
2. A post from Normal Deviate: anti xkcd

And the following is really a point:

Suppose I had a medical test with a 1/6 false positive rate and a 0% false negative rate. That is, if administered to someone without the disease it has a 1/6 chance of reporting positive. The protocol is to administer the test and, if positive, to administer it again. Assuming independence, the probability of two consecutive false positives is 1/36. Some statisticians would reject the null hypothesis (that the patient is disease free) given 2/2 positive tests. That is ridiculous for the same reason the xkcd example is ridiculous (it ignores prior or base rate information) but is is indeed the practice in some circles, I’m told.—–Phil

Also refer to the explanation from Andrew:

In the context of probability mathematics, textbooks carefully explain that p(A|B) != p(B|A), and how a test with a low error rate can have a high rate of errors conditional on a positive finding, if the underlying rate of positives is low, but the textbooks typically confine this problem to the probability chapters and don’t explain its relevance to accept/reject decisions in statistical hypothesis testing.

Update: (Two videos from Professor Kruschke, J.K.)

Bayesian estimation supersedes the t test in 14 minutes of video.+   Bayesian Methods Interpret Data Better

Update:

Examples of Bayesian and frequentist approach giving different answers