You are currently browsing the category archive for the ‘Probability’ category.

There has been a Machine Learning (ML) reading list of books in hacker news for a while, where Professor Michael I. Jordan recommend some books to start on ML for people who are going to devote many decades of their lives to the field, and who want to get to the research frontier fairly quickly. Recently he articulated the relationship between CS and Stats amazingly well in his recent reddit AMA, in which he also added some books that dig still further into foundational topics. I just list them here for people’s convenience and my own reference.

  • Frequentist Statistics
    1. Casella, G. and Berger, R.L. (2001). “Statistical Inference” Duxbury Press.—Intermediate-level statistics book.
    2. Ferguson, T. (1996). “A Course in Large Sample Theory” Chapman & Hall/CRC.—For a slightly more advanced book that’s quite clear on mathematical techniques.
    3. Lehmann, E. (2004). “Elements of Large-Sample Theory” Springer.—About asymptotics which is a good starting place.
    4. Vaart, A.W. van der (1998). “Asymptotic Statistics” Cambridge.—A book that shows how many ideas in inference (M estimation, the bootstrap, semiparametrics, etc) repose on top of empirical process theory.
    5. Tsybakov, Alexandre B. (2008) “Introduction to Nonparametric Estimation” Springer.—Tools for obtaining lower bounds on estimators.
    6. B. Efron (2010) “Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction” Cambridge,.—A thought-provoking book.
  • Bayesian Statistics
    1. Gelman, A. et al. (2003). “Bayesian Data Analysis” Chapman & Hall/CRC.—About Bayesian.
    2. Robert, C. and Casella, G. (2005). “Monte Carlo Statistical Methods” Springer.—about Bayesian computation.
  • Probability Theory
    1. Grimmett, G. and Stirzaker, D. (2001). “Probability and Random Processes” Oxford.—Intermediate-level probability book.
    2. Pollard, D. (2001). “A User’s Guide to Measure Theoretic Probability” Cambridge.—More advanced level probability book.
    3. Durrett, R. (2005). “Probability: Theory and Examples” Duxbury.—Standard advanced probability book.
  • Optimization
    1. Bertsimas, D. and Tsitsiklis, J. (1997). “Introduction to Linear Optimization” Athena.—A good starting book on linear optimization that will prepare you for convex optimization.
    2. Boyd, S. and Vandenberghe, L. (2004). “Convex Optimization” Cambridge.
    3. Y. Nesterov and Iu E. Nesterov (2003). “Introductory Lectures on Convex Optimization” Springer.—A start to understand lower bounds in optimization.
  • Linear Algebra
    1. Golub, G., and Van Loan, C. (1996). “Matrix Computations” Johns Hopkins.—Getting a full understanding of algorithmic linear algebra is also important.
  • Information Theory
    1. Cover, T. and Thomas, J. “Elements of Information Theory” Wiley.—Classic information theory.
  • Functional Analysis
    1. Kreyszig, E. (1989). “Introductory Functional Analysis with Applications” Wiley.—Functional analysis is essentially linear algebra in infinite dimensions, and it’s necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics.

Remarks from Professor Jordan: “not only do I think that you should eventually read all of these books (or some similar list that reflects your own view of foundations), but I think that you should read all of them three times—the first time you barely understand, the second time you start to get it, and the third time it all seems obvious.”

Advertisements
  1. Interview with Nick Chamandy, statistician at Google
  2. You and Your Researchvideo
  3. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained
  4. A Survival Guide to Starting and Finishing a PhD
  5. Six Rules For Wearing Suits For Beginners
  6. Why I Created C++
  7. More advice to scientists on blogging
  8. Software engineering practices for graduate students
  9. Statistics Matter
  10. What statistics should do about big data: problem forward not solution backward
  11. How signals, geometry, and topology are influencing data science
  12. The Bounded Gaps Between Primes Theorem has been proved
  13. A non-comprehensive list of awesome things other people did this year.
  14. Jake VanderPlas writes about the Big Data Brain Drain from academia.
  15. Tomorrow’s Professor Postings
  16. Best Practices for Scientific Computing
  17. Some tips for new research-oriented grad students
  18. 3 Reasons Every Grad Student Should Learn WordPress
  19. How to Lie With Statistics (in the Age of Big Data)
  20. The Geometric View on Sparse Recovery
  21. The Mathematical Shape of Things to Come
  22. A Guide to Python Frameworks for Hadoop
  23. Statistics, geometry and computer science.
  24. How to Collaborate On GitHub
  25. Step by step to build my first R Hadoop System
  26. Open Sourcing a Python Project the Right Way
  27. Data Science MD July Recap: Python and R Meetup
  28. git 最近感悟
  29. 10 Reasons Python Rocks for Research (And a Few Reasons it Doesn’t)
  30. Effective Presentations – Part 2 – Preparing Conference Presentations
  31. Doing Statistical Research
  32. How to Do Statistical Research
  33. Learning new skills
  34. How to Stand Out When Applying for An Academic Job
  35. Maturing from student to researcher
  36. False discovery rate regression (cc NSA’s PRISM)
  37. Job Hunting Advice, Pt. 3: Networking
  38. Getting Started with Git

Today, there will be a talk,  Imaginary Geometry and the Gaussian Free Field, given by Jason Miller from Microsoft Research. I just googled it and found the following interesting materials:

  1. Gaussian free fields for mathematicians
  2. Gaussian free field and conformal field theory: In these expository lectures, it gives an elementary introduction to conformal field theory in the context of probability theory and complex analysis. It considers statistical fields, and defines Ward functionals in terms of their Lie derivatives. Based on this approach, it explains some equations of conformal field theory and outline their relation to SLE theory.
  3. SLE and the free field: Partition functions and couplings
  4. Schramm-Loewner evolution (SLE). See slides by Tom Alberts and 2006 ICM slides by Oded Schramm and St. Flour Lecture Notes by Wendelin Werner . See also Ito’s lemma notes .
There will be a talk,  “Landscape of Random Functions in Many Dimensions via Random Matrix Theory”, next week given by  Antonio Auffinger from the University of Chicago.
Abstract: How many critical values a typical Morse function have on a high dimensional manifold? Could we say anything about the topology of its level sets? In this talk I will survey a joint work with Gerard Ben Arous and Jiri Cerny that addresses these questions in a particular but fundamental example. We investigate the landscape of a general Gaussian random smooth function on the N-dimensional sphere. These corresponds to Hamiltonians of well-known models of statistical physics, i.e spherical spin glasses. Using the classical Kac-Rice formula, this counting boils down to a problem in Random Matrix Theory. This allows us to show an interesting picture for the complexity of these random Hamiltonians, for the bottom of the energy landscape, and in particular a strong correlation between the index and the critical value. We also propose a new invariant for the possible transition between the so-called 1-step replica symmetry breaking and a Full Replica symmetry breaking scheme and show how the complexity function is related to the Parisi functional.
This topic is kind of a combination of my majors, differential geometry, probability and statistics. I am interested in this although I can imagine that it is hard.

Today I want to say something basic:

1, We know in Calculus, taylor expansion is extremely useful, since it’s the polynomial approximation of functions. Thus in particular, for some limits, you could always refer to the taylor expansion first and then everything will be simple.

In probability and statistics, we know that a statistic is nothing but a function of the sample. Let X_{1}, X_{2},…,X_{n} be sample points. Then a statistic could just be expressed as T_{n}=f (X_{1}, X_{2},…,X_{n} ). So if we want to discuss the asymptotic properties of the statistic, a good way is to express the statistic in the taylor expansion first. And I think we should always do like this, i.e. taylor expansion first. Then delta method and slutsky’s lemma could be involved in for you to use together with the central limit theorems, which is the foundation for the discussion of asymptotic properties.

2, Why statistics? What the difference between statistics and probability?

In the reality, everything has noises so that it is difficult for us to see the underlying principle. For statistics, it deals with the raw data to find out the simple rule covered by the noised data. Thus if you want to find out the relationship between the heights and weights of humans,  why use regression method? That is because we regard the data we got are noised, we should not just use all the data points to find out the precise curve through every data point. That curve does not make any sense in reality. We should think of the different heights of some fixed weight as the noised data, and we want to use statistics to find out the simple relationship between these two variables for kind of prediction. Therefore, simple precise mathematics+noise will be statistics. How to model the noise, this is related to the measure theory.

The difference between probability and statistics is kind of probability is mathematics and statistics kind of data management. What does it mean? I mean probability definitely belongs to mathematics, since it is just based on axioms and rules, nothing else. But statistics is just the opposite. Started with raw data, you can deal with the data anyway without any rules. Play with the data as much as you can. But what’s the connection between these two? Statistics as a function of random variables, which is controlled by the underlying unknown rules (probability distributions), could have many properties got from the analysis using probability.

In machine learning, we often take probability for granted. We desire a system for representing uncertainty in the world, and Cox’s theorem tells us that if we accept some basic postulates regarding what is desirable in a system of uncertainty, we will end up with probability.

So that should be the end of the story… right? Well, maybe not. The first Cox postulate is

Divisibility and comparability – The plausibility of a statement is a real number and is dependent on information we have related to the statement,

which seems quite innocent. However, who’s to say that there is anything fundamental about real numbers? Real numbers have strange things like irrational numbers and negative numbers (crazy, I know), but they’re lacking in comparison to imaginary numbers (there’s no operation that you can apply 4 times before returning to your original value, which you can do by multiplying by i with imaginary numbers). It seems kind of arbitrary to choose real numbers. For a fun and interesting read, see the following link. It makes the point better than I can:

Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.

But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?

Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.

http://betterexplained.com/articles/a-visual-intuitive-guide-to-imaginary-numbers/

Imaginary numbers come up in the context of systems of uncertainty when we deal with quantum mechanics. The basic idea is that interactions operate over amplitudes (expressed as complex numbers), then to determine the likelihood of a final configuration, you look at norms of amplitudes. For a relatively straightforward explanation, see here: http://lesswrong.com/lw/pd/configurations_and_amplitude/

So I don’t necessarily have any well-formed thoughts on the matter (yet?), but it’s fun to think about other principled ways of representing uncertainty. I’m curious to know if there are types of interactions useful for machine learning that would be hard to represent with standard probability models but that would be aided by these types of quantum models.

Finally, I leave you with this blog comment from The Blog of Scott Aaronson:

“graphical models with amplitudes instead of probabilities” is a fair definition of a quantum circuit (and therefore a quantum computer).

http://scottaaronson.com/blog/?p=74#comment-1702

That seems to me, worth understanding deeper.

From: http://blog.smellthedata.com/2010/06/uncertainty-probability-and-quantum.html

Today I came across an interesting question on the mathoverflow: what are the biggest problems in probability theory? In the answers, there is one about self-avoiding walks. And the most famous scientist in this field, as far as I know, is Gordon Slade. And several days ago, I also saw a post of this subject. At that time, I have no idea about this area, so I did not have anything feeling about this and then I just skipped it. Now I think I have realized the importance of this field in the whole probability theory. Thus I have to know something about this, at least getting to know what it is. Here I want to share you with the materials I have collected.

http://chromotopy.org/?p=402 (a recent post about the talk given by Professor Slade)

http://gowers.wordpress.com/2010/08/22/icm2010-smirnov-laudatio/ (a post about this area)

http://terrytao.wordpress.com/2010/08/19/lindenstrauss-ngo-smirnov-villani/ (a post about the winners in icm2010, including this area)

I have noticed this concept before. Since I am just new in Probability field, so you should forgive me that I just noticed this academic area several months ago and did not realize the importance of it. Today I attended the regular colloquium of my department and the speaker, Zbigniew J. Jurek, gave a lecture about The Random Integral Representation Conjecture. In this talk, he mentioned free probability. Moreover, he also joked that free statistics will come into being.

I also fount a useful link about the survey of free probability. I hope it will be useful for you. Terry Tao also have a post about this.

Blog Stats

  • 96,228 hits

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 520 other followers

Twitter Updates