You are currently browsing the monthly archive for December 2010.
Top five posts of 2010 witten by John D. Cook in his blog The Endeavour:
- Write-only articles
- Math library functions that seem unnecessary
- Don’t invert that matrix
- What does this code do?
- Why computers have two zeros: +0 and -0
Quotes, Quoting, and Quotations witten by rjlipton in the blog Gödel’s Lost Letter and P=NP.
EDIT: I’ve started a home for machine learning folks powered by Eggsprout at http://machine-learning.eggsprout.com. Please join me and bring others like us together. Don’t be alarmed if the site is a little empty now, all communities have to start somewhere .
If you’re that kind of person and you somehow came across this blog, here’s what I’ve found useful so far:
Stanford CS229 Machine Learning Course on Youtube: Probably the most useful and relevant resource for a beginner. I somewhat enjoy Andrew Ng’s teaching style. I bet he’d be a great mentor to have for research. The way the course is organized is very different from how professor Pedro Domingos taught us though. Not far enough in the series to say whether I like it more or not.
CS229 Lecture Notes: Lecture notes that accompany the Youtube videos.
UW Part-time Masters Lectures: Taught by professor Pedro Domingos, awesome teacher and incredibly genius.
VideoLectures.net: These look like they’d be more advanced but interesting nonetheless. So much to watch… so little time.
ResearchChannel.org: I love this resource for just about any topic. I’ve spent hours on this studying neurobiology back when I first entered college. Hopefully the machine learning topics here are just as interesting.
Machine Learning Data Repository: Nice repository of data for when you’re ready to practice using an algorithm. One of my homework assignments from college was from here.
Ruby A.I Plugins: A few ruby machine learning libraries. I’m a rubyist and it’d be nice to see more of these. Maybe I’ll contribute to these some day .
Tegu: A machine learning system in Ruby developed by David Richards. He looks like an interesting character and I’m keeping an eye on him and his projects. I like his dedication to ruby and machine learning.
Aside: Why do Google searches on “Machine Learning” always turn up results about sewing machines?
From: http://ianma.wordpress.com/2009/07/19/machine-learning-for-beginners/
In my previous post on learning about statistical learning, I took a shot at creating a guide for those who want to learn about statistical learning. I received a lot of great feedback from friends in the field, the comments on my blog, and discussions around the web like the hackernews thread. This revised edition of the post takes these discussions and feedback into account.
I humbly submit this guide, upon material over which I do not have full mastery, in the case that it may be of use to others trying to find their way.
[ Student working in Statistics Machine Room. source: flickr ]
Fragmentation and Ambiguity: what’s in a name?
I renamed this second edition of my post to “Learning about Machine Learning” from “Learning about Statistical Learning.” What is the better name? I don’t know. Things are a bit fragmented. There are some nuances and some of the fragmentation makes sense. In my opinion, most of it doesn’t. You have “computational learning,” “computational statistics,” “statistical learning,” “machine learning” and “learning theory”. You have Statistics vs. Machine Learning, the culture of algorithms vs. machine learning, the stats handicap, and COMPUTATIONAL statistics vs. computational STATISTICS.
It can be confusing. I bring this up to help you preempt the confusion. When you see these different terms, don’t worry about it, just focus on the topics you need to learn. In this guide I will try to help point out where you will see the same thing by different names.
Straight into the Deep End
In the previous list, I thought it would be good to recommend some lighter texts as introductions to topics like probability theory and machine learning. Based upon subsequent discussions and feedback, I’ve changed my view. Straight into the deep end is the way to go.
The caveat is that you need to be honest with yourself about when you encounter something that you do not understand and you are unable to work through despite trying your best. At this point you’ve found your way to some material that you do not have the foundation for. Put the book down, insert a bookmark. Close the book. Walk away. Go find a resource to build the foundation. Come back later. Repeat.
Don’t be afraid to admit you don’t understand something, that is the only way you are going to move forward and really understand the material. As Peter Norvig says about programming, be prepared to make a serious commitment of time and effort if you want to be a researcher.
The Books
I am making a number of changes which I mention along the way. The overall list is now structured by topic with inline notes and there are a number of new topics so that you can pick and choose based upon your areas of interest.
Now on to the books.
I’m moving the foundation in proofs and analysis up to be a the top priority, as well as adding some new references from the analysis list. You won’t get far if you are intimidated by Greek characters and unable to follow proofs. If you build an intuition for programming as proof writing, that will also bend your mind in the right direction. I’m including a link to one of the fun puzzle books that I find helpful – logic puzzles can help get you into the flow of proofs. I’m not including a book on functional analysis here, but I have a few you can choose from on the analysis list if you are interested.
Introduction to Analysis – Maxwell Rosenlicht
Mathematical Analysis – Apostol
How to Prove It: A Structured Approach – Daniel J. Velleman
How to Solve it – Gyorgy Polya
Proofs are Programs [pdf] – Philip Wadler
The Lady and the Tiger – Raymond Smullyan
You will be working with algorithms constantly, so work through a couple books from the algorithms and data structures list. We are also in an age of rapidly increasing data volume, so I am also recommending a book to help you learn how decompose algorithms and work with Map Reduce style parallelism – a great read which is not published yet but you can get the freePDF online.
Introduction to Algorithms by Charles E. Leiserson
The Algorithms Design Manual – Steven Skiena
Data-Intensive Text Processing with MapReduce – Jimmy Lin and Chris Dyer
Map–Reduce for Machine Learning on Multicore – from a group in the Stanford CS department
Linear algebra is critical – matrices are your friends. A lot of what you will see is represented and computed naturally in matrix notation – it is convenient for both theory and practice. You can find a variety of items in the matrix fu list, and you can also find Gilbert Strag’s lectures online from MIT courseware. Matrix algebra, like algorithms, is a fundamental building block that you will work with on a daily basis.
Introduction to Linear Algebra, Fourth Edition by Gilbert Strang
Matrix Algebra – James Gentle
In this second edition of my post, I’m adding a strong focus on linear models, as they are the foundation for a lot of models that you will encounter in machine learning. The books in the machine learning section below will help to put linear models in a broader setting. For example Chapter 2 of The Elements of Statistical Learning has a nice overview section of least squares and nearest neighbors that helps you understand how many other techniques are based upon these two simple building blocks. Some sources also have nice discussions of the relationship between linear models and other approaches like neural networks. Starting with a strong foundation in linear models is wise. I had not done so properly when jumping into machine learning, and it has required a lot of backtracking.
Linear Models With R – Julian James Faraway
An Introduction to Generalized Linear Models – Annette J. Dobson
Applied Linear Statistical Models – John Neter
There are many titles to choose from the probability list, in order to build a base in probability theory. I’m also adding a reference for looking at probability from the Bayesian perspective. Probability can be very counter-intuitive. A great way to get around this is to train yourself to understand how we misinterpret things. Read one of theKaneman and Tversky books from my finance, trading and economics list, and work through some puzzle books. Aside from psycological biases, a lot of the trickiness of empirical probability, especially conditional probabilities, is knowing what and how to count. If you want to explore counting, there a number of good books on the probability list.
A Course in Probability Theory, Revised Edition, Second Edition by Kai Lai Chung
First Look at Rigorous Probability Theory by Jeffrey S. Rosenthal
Probability Theory: The Logic of Science – E. T. Jaynes
When you are reading to build an understanding of statistics on top of probability theory, move on to the statistics list. Statistics and inference are critical in machine learning, and it is also nice to understand how information theory fits into this picture. I haven’t included an information theory book here, but you can find a nice one that fits information theory with statistics on the statistics list. A lot of bits and pieces on statistics will have been covered already with probability theory and linear models. I’ve added a reference on hypothesis testing and multivariate statistics. I’ve also created a new new section below on robust and non-parametric statistics.
All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics) by Larry Wasserman
Testing Statistical Hypothesis – E. L. Lehmann
Bayesian Statistics – William M. Bolstad
Modern Multivariate Statistical Techniques – Alan Julian Izenman
Non-parametric and robust statistics are very neat topics. Non-parametric statistics, can mean a few things but the core intuition is that the techniques do not assume a parametrized model for the probability distribution underlying the observed data. The notion of robust statistics stems from the desired to create statistical methods that are 1) resistant to the deviation from assumptions that empirical data often exhibit, and 2) tolerant to outliers as can be measured by the breakdown point – the point at which infinitely large outliers causes an estimator to yield an infinitely large result.
Robust Statistics – Peter J. Huber
Robust Statistics: The Approach Based on Influence Functions – Frank R. Hampel
Robust Regression – Peter J. Rousseeuw
All of Nonparametric Statistics – Larry Wasserman
I’ve replaced Mitchell with Bishop as a companion for EoSL. There are more to choose from on the machine learning list if you’re feeling motivated, but the three below give a very solid coverage of most of the core material. MacKay’s book is a great help with information theory if you haven’t yet gotten a solid understanding by the time you get to this point.
Pattern Recognition and Machine Learning – Christopher Bishop
Information Theory, Inference and Leaning Algorithms – David J. C. MacKay
The Elements of Statistical Learning by Robert Tibshirani, et al.
Understand features (or predictors, input variables, whatever you prefer to call them) and feature selection. Know your different strategies for dealing with continuous vs. discrete features. Feature selection (especially “embedded methods”), “sparsification” and “shrinkage methods” are all more or less different sides of the same hyper-coin. These ideas seek to eliminate the impact of irrelevant features upon your model by either marginalizing them out entirely or at least reducing their influence by reducing the parameter values for less relevant predictors. The two books below are focused on this problem, but you will also find a lot about regularized regression and other such techniques in other machine learning and statistical learning texts.
Feature Extraction – Isabelle Guyon, et al.
Computational Methods of Feature Selection – Huan Liu, Hiroshi Motoda
Heuristic search and optimization are important topics, and fortunately I have found a great survey text that I am working my way through right now, which is included below from the heuristic search and optimization list. I am also including a link that a friend pointed out to a book and lecture slides on convex optimization that you can get online.
Introduction to Stochastic Search and Optimization by James C. Spall
Convex Optimization (online) – Stephen Boyd
I’ve added two texts on Bayesian networks and graphical models. Graphical models are a cool family of techniques that let you model rich dependence structures between input variables by relying on a graph structure to represent conditional independence.
Modeling and Reasoning with Bayesian Networks – Adnan Darwiche
Probabilistic Graphical Models – Nir Friedman
For those working with text, I’ve added a books from the IR and NLP list – each of which I think represents the strongest single introductory text and reference. Information Retrieval (IR) and Natural Language Processing (NLP) are active areas of study that continue to become and increasingly important as a result of the internet. We use IR every day in search engines, but I think new forms of retrieval systems using statistical NLP are just around the corner.
Introduction to Information Retrieval – Christopher D. Manning (available free online)
Foundations of Statistical Natural Language Processing – Christopher D. Manning
Network Theory is another booming area accelerated by the Internet. Social, information and transportation networks all present very challenging and important problems. As especially ripe and tricky topic is the evolution of networks over time, or what is sometimes called “longitudinal analysis” by the social networks folks. Network theory is extremely useful for determining the importance of information or people within networks – for example, link analysis algorithms like PageRank are based upon network theory. I’ve tossed in a book on graph theory in case you need to brush up – you will need it for networks.
Introductory Graph Theory – Gary Chartrand
Social Network Analysis – Stanley Wasserman
Statistical Analysis of Network Data – Eric D. Kolaczyk
Network Flows – Ravindra K. Ahuja
Fundamentals in Queuing Theory – Donald Gross, et al.
You will have to write a lot of code. Learning how to do good engineering work is more important than people realize. Learn some programming languages; python (numpy/scipy), R, and at least one nice functional language (probably Haskell, Clojure, or OCaml). Learn how to work with Linux. Learn about design and testing. Picking up some things on the lean and agile software development approach will likely help – I use it for research projects just as much as software. As I mentioned earlier, algorithms and data structures are critical, and you may also want to learn about parallelism and distributed systems if you work with large data sets.
[ Students working with large data sets. source: flickr ]
All along the way, you might enjoy watching lectures. I have found videolectures.net and google video to be the best overall sources. I am very happy to hear others if you have any recommendations.
You can find a lot more on my Amazon lists and ever-expanding Amazon wish lists. If you are involved in theoretical or applied research in academia or industry, and you disagree with some of these book selections, have alternate recommendations, or you think I am missing some topics, I’d love to hear.
Lastly, here are Michael Jordan‘s recommended readings via the hackernews thread that sprung up following my last post. He knows a lot more about machine learning than I do, so perhaps you should just ignore the above recommendations and listen to him.
I personally think that everyone in machine learning should be (completely) familiar with essentially all of the material in the following intermediate-level statistics book:
1.) Casella, G. and Berger, R.L. (2001). “Statistical Inference” Duxbury Press.
For a slightly more advanced book that’s quite clear on mathematical techniques, the following book is quite good:
2.) Ferguson, T. (1996). “A Course in Large Sample Theory” Chapman & Hall/CRC.
You’ll need to learn something about asymptotics at some point, and a good starting place is:
3.) Lehmann, E. (2004). “Elements of Large-Sample Theory” Springer.
Those are all frequentist books. You should also read something Bayesian:
4.) Gelman, A. et al. (2003). “Bayesian Data Analysis” Chapman & Hall/CRC.
and you should start to read about Bayesian computation:
5.) Robert, C. and Casella, G. (2005). “Monte Carlo Statistical Methods” Springer.
On the probability front, a good intermediate text is:
6.) Grimmett, G. and Stirzaker, D. (2001). “Probability and Random Processes” Oxford.
At a more advanced level, a very good text is the following:
7.) Pollard, D. (2001). “A User’s Guide to Measure Theoretic Probability” Cambridge.
The standard advanced textbook is Durrett, R. (2005). “Probability: Theory and Examples” Duxbury.
Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:
8.) Bertsimas, D. and Tsitsiklis, J. (1997). “Introduction to Linear Optimization” Athena.
And then you can graduate to:
9.) Boyd, S. and Vandenberghe, L. (2004). “Convex Optimization” Cambridge.
Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in
10.) Golub, G., and Van Loan, C. (1996). “Matrix Computations” Johns Hopkins.
It’s good to know some information theory. The classic is:
11.) Cover, T. and Thomas, J. “Elements of Information Theory” Wiley.
Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven’t already). Functional analysis is essentially linear algebra in infinite dimensions, and it’s necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here’s a book that I find very readable:
12.) Kreyszig, E. (1989). “Introductory Functional Analysis with Applications” Wiley.
From: http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
Teach Yourself Programming in Ten Years
Peter Norvig
Why is everyone in such a rush?Walk into any bookstore, and you’ll see how to Teach Yourself Java in 7 Days alongside endless variations offering to teach Visual Basic, Windows, the Internet, and so on in a few days or hours. I did the following power search at Amazon.com: pubdate: after 1992 and title: days and (title: learn or title: teach yourself) and got back 248 hits. The first 78 were computer books (number 79 was Learn Bengali in 30 days). I replaced “days” with “hours” and got remarkably similar results: 253 more books, with 77 computer books followed by Teach Yourself Grammar and Style in 24 Hours at number 78. Out of the top 200 total, 96% were computer books.The conclusion is that either people are in a big rush to learn about computers, or that computers are somehow fabulously easier to learn than anything else. There are no books on how to learn Beethoven, or Quantum Physics, or even Dog Grooming in a few days. Felleisen et al. give a nod to this trend in their book How to Design Programs, when they say “Bad programming is easy. Idiots can learn it in 21 days, even if they are dummies. Let’s analyze what a title like Learn C++ in Three Days could mean:
Teach Yourself Programming in Ten YearsResearchers (Bloom (1985), Bryan & Harter (1899), Hayes (1989), Simmon & Chase (1973)) have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, telegraph operation, painting, piano playing, swimming, tennis, and research in neuropsychology and topology. The key is deliberative practice: not just doing it again and again, but challenging yourself with a task that is just beyond your current ability, trying it, analyzing your performance while and after doing it, and correcting any mistakes. Then repeat. And repeat again. There appear to be no real shortcuts: even Mozart, who was a musical prodigy at age 4, took 13 more years before he began to produce world-class music. In another genre, the Beatles seemed to burst onto the scene with a string of #1 hits and an appearance on the Ed Sullivan show in 1964. But they had been playing small clubs in Liverpool and Hamburg since 1957, and while they had mass appeal early on, their first great critical success, Sgt. Peppers, was released in 1967. Malcolm Gladwell reports that a study of students at the Berlin Academy of Music compared the top, middle, and bottom third of the class and asked them how much they had practiced:
So it may be that 10,000 hours, not 10 years, is the magic number. Samuel Johnson (1709-1784) thought it took longer: “Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price.” And Chaucer (1340-1400) complained “the lyf so short, the craft so long to lerne.” Hippocrates (c. 400BC) is known for the excerpt “ars longa, vita brevis”, which is part of the longer quotation “Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile”, which in English renders as “Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult.” Although in Latin, ars can mean either art or craft, in the original Greek the word “techne” can only mean “skill”, not “art”. Here’s my recipe for programming success:
With all that in mind, its questionable how far you can get just by book learning. Before my first child was born, I read all the How To books, and still felt like a clueless novice. 30 Months later, when my second child was due, did I go back to the books for a refresher? No. Instead, I relied on my personal experience, which turned out to be far more useful and reassuring to me than the thousands of pages written by experts.Fred Brooks, in his essay No Silver Bullet identified a three-part plan for finding great software designers:
This assumes that some people already have the qualities necessary for being a great designer; the job is to properly coax them along. Alan Perlis put it more succinctly: “Everyone can be taught to sculpt: Michelangelo would have had to be taught how not to. So it is with the great programmers”.So go ahead and buy that Java book; you’ll probably get some use out of it. But you won’t change your life, or your real overall expertise as a programmer in 24 hours, days, or even months. ReferencesBloom, Benjamin (ed.) Developing Talent in Young People, Ballantine, 1985. Brooks, Fred, No Silver Bullets, IEEE Computer, vol. 20, no. 4, 1987, p. 10-19. Hayes, John R., Complete Problem Solver Lawrence Erlbaum, 1989. Chase, William G. & Simon, Herbert A. “Perception in Chess” Cognitive Psychology, 1973, 4, 55-81. Lave, Jean, Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life, Cambridge University Press, 1988. AnswersApproximate timing for various operations on a typical PC:
Appendix: Language ChoiceSeveral people have asked what programming language they should learn first. There is no one answer, but consider these points:
Given these criteria, my recommendations for a first programming language would be Python or Scheme. But your circumstances may vary, and there are other good choices. If your age is a single-digit, you might prefer Alice or Squeak (older learners might also enjoy these). The important thing is that you choose and get started. Appendix: Books and Other ResourcesSeveral people have asked what books and web pages they should learn from. I repeat that “book learning alone won’t be enough” but I can recommend the following:
NotesT. Capey points out that the Complete Problem Solver page on Amazon now has the “Teach Yourself Bengali in 21 days” and “Teach Yourself Grammar and Style” books under the “Customers who shopped for this item also shopped for these items” section. I guess that a large portion of the people who look at that book are coming from this page. Thanks to Ross Cohen for help with Hippocrates. |
Comment: This is a ‘getting started’ list. The references below all have their own references, that will take you in many directions.
1. Readings for the non-specialist
(a) Hoste, Thistlethwaite and Weeks, The First 1,701,936 Knots, Scientific American, 20, No. 4, 1998. A delightful article which conveys much about the state of knowledge in 1999 about knot classification, by describing how the three authors accomplished their enormous task of classifying knots up to 16 crossings.
(b) Artin, E., The Theory of Braids, American Scientist,38, 1950,, pp. 112-119. A delightful and very accessible article on an aspect of knot theory which has become very important in all areas of mathematics. Passages from this article were used in the GRE exams in the mid 1950’s! Directed at an audience of non-mathematicians.
(c) Lickorish and Millett, The New Polynomial Invariant of Knots and Links, Mathematics Magazine, 61, No. 1, February 1988. The authors were awarded the Chauvenet Prize for ”Expository Writing” by the MAA, for this article. It’s at a level which is accessible to bright high school students.
(d) Livingston, Knot Theory, MAA Carus Monographs No. 24, 1993. A survey, directed at mathematicians with backgrounds in other areas. Not a textbook. Clear, and a good guide to further reading.
(e) Kauffman, L. States Models and the Jones Polynomial, Topology, 26, 1987, 395-407. An important research article which can be read with a minimum of background in the subject.
(f) Sumners, D.W. The Role of Knot Theory in DNA Research, 297-317, Geometry and Topology, Editors McCrory and Shifrin, Marcel Decker 1987. A talk given to a group of mathematicians at a recent conference, to tell them about problems which are of interest to biologists. It contains a useful ”dictionary” to translate the lingo of chemistry and biology into mathematical terms. It also has references, if you want to learn more about the biology.
2. Readings for Mathematics graduate students or upper level undergrads
(a) Rolfsen, D. Knots and Links, Publish or Perish, 1978. A graduate textbook on knot theory, out of date, but excellent.
(b) Hempel, John 3-manifolds, Annals of Math Studies #86, Princeton University Press 1976. Very basic stuff.
(c) Prasolov, V.V. and Sossinsky, A.B. Knots, Links, Braids and 3-manifolds, AMS Translations of Mathematical Monographs 154, 1996.
(d) Saveliev, Nikolai, Lectures on the topology of 3-manifolds: an introduction to the Casson invariant, deGruyter 1996.
(e) Manturov, Vassily, Knot Theory, Chapman and Hall 2004.
(f) Scott, Peter, The Geometry of 3-Manifolds, Bull. London Math. Soc. 15, 1983,401-487.
(g) Birman, J.S. New points of view in knot and link theory, Bull. AMS APRIL 1993, A survey article on the state of the art as regards finite type invariants, in 1993. Directed at an audience of graduate students and research mathematicians. Won the Chauvenet Prize.
From: http://www.math.columbia.edu/~jb/readinglist.pdf
In machine learning, we often take probability for granted. We desire a system for representing uncertainty in the world, and Cox’s theorem tells us that if we accept some basic postulates regarding what is desirable in a system of uncertainty, we will end up with probability.
So that should be the end of the story… right? Well, maybe not. The first Cox postulate is
Divisibility and comparability – The plausibility of a statement is a real number and is dependent on information we have related to the statement,
which seems quite innocent. However, who’s to say that there is anything fundamental about real numbers? Real numbers have strange things like irrational numbers and negative numbers (crazy, I know), but they’re lacking in comparison to imaginary numbers (there’s no operation that you can apply 4 times before returning to your original value, which you can do by multiplying by i with imaginary numbers). It seems kind of arbitrary to choose real numbers. For a fun and interesting read, see the following link. It makes the point better than I can:
Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.
But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?
Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.
http://betterexplained.com/articles/a-visual-intuitive-guide-to-imaginary-numbers/
Imaginary numbers come up in the context of systems of uncertainty when we deal with quantum mechanics. The basic idea is that interactions operate over amplitudes (expressed as complex numbers), then to determine the likelihood of a final configuration, you look at norms of amplitudes. For a relatively straightforward explanation, see here: http://lesswrong.com/lw/pd/configurations_and_amplitude/
So I don’t necessarily have any well-formed thoughts on the matter (yet?), but it’s fun to think about other principled ways of representing uncertainty. I’m curious to know if there are types of interactions useful for machine learning that would be hard to represent with standard probability models but that would be aided by these types of quantum models.
Finally, I leave you with this blog comment from The Blog of Scott Aaronson:
“graphical models with amplitudes instead of probabilities” is a fair definition of a quantum circuit (and therefore a quantum computer).
That seems to me, worth understanding deeper.
From: http://blog.smellthedata.com/2010/06/uncertainty-probability-and-quantum.html
From the comments on my last post:
scott would you be so kind, if you have some spare time, to post a list of textbooks that you’ve read in math and cs?
Now this is the kind of blog topic I like: zero expenditure of emotional energy required; lends itself to snarky one-liners. So here’s a list of math and CS books that I’ve read and you should too. Pitifully incomplete, but enough to get you started.
- Computational Complexity, by Christos Papadimitriou
The Iliad of complexity. An epic poem to read and reread until you can quote it by section number, until the pages fall out of the spine. Christos is not a textbook writer but a bard — and the only problem with his tale is that it ends around the late 1980′s. Christos, if you’re reading this: we want an Odyssey!
- Gems of Theoretical Computer Science, by Uwe Schöning and Randall Pruim
The proofs are terse, but I love the division into short, digestible “gems.” Keep this one on your night table or by the toilet. (But not until you’ve mastered Papadimitriou and are ready to wield BP operators like a Fortnow.)
- Computational Complexity: A Modern Approach, by Sanjeev Arora
The newest entrant in the arena. Best of all is that the current draft is free and online. I love Sanjeev’s use of “lowerbound” as one word.
- Lecture notes by Luca Trevisan, Umesh Vazirani, John Preskill, …
I don’t know if these count as textbooks, but read them anyway.
- Quantum Computation and Quantum Information, by “Mike & Ike” (Michael Nielsen and Isaac Chuang)
The best quantum computing book. I open it to the section on fidelity and trace distance pretty much every time I write a paper. (I’ve heard the other sections are excellent too.)
- Randomized Algorithms, by Rajeev Motwani and Prabhakar Raghavan
Chernoff bounds, random walks, second eigenvalues, PCP’s … a 1-1/e fraction of what you need to know about randomness.
- Computers and Intractability: A Guide to the Theory of NP-Completeness, by Michael Garey and David Johnson
Changed my life when I read it at fifteen. Since then it’s sat on my shelf, but I’d take it down if I actually had to prove something NP-complete.
- An Introduction to Computational Learning Theory, by Michael Kearns and Umesh Vazirani
Since this book was co-authored by my advisor, I’ll refrain from saying anything about it except that it’s excellent. (And short.)
- Artificial Intelligence: A Modern Introduction, by Stuart Russell and Peter Norvig
Almost (but not quite) made me go into AI. My favorite chapter is the last one, which carefully demolishes the arguments of John Searle and the other confuseniks.
- Complexity and Real Computation, by Lenore Blum, Felix Cucker, Michael Shub, and Steve Smale
Decidability of the Mandelbrot set? P versus NP over the complex numbers? I may be a Boolean chauvinist, but I knows an elegant theory when I sees one.
- The Art of Computer Programming, by Donald Knuth
Three-volume set looks mighty impressive on a shelf, but beware of MMIX and constant factors.
- Set Theory and the Continuum Hypothesis, by Paul Cohen
The book that gave me the illusion of understanding logic. Short and hard. Prerequisites: None.
- Topology: A First Course, by James Munkres
What every math book should be.
- The Book of Numbers, by John Conway and Richard Guy
Since this is a popular book, obviously I couldn’t have learned anything new from it, but it was nice to “refresh my memory” about octonions, Heegner numbers, and why eπ sqrt(163) is within 0.00000000000075 of an integer.
- The Road to Reality: A Complete Guide to the Laws of the Universe, by Roger Penrose
Preface: “Even if you hated fractions in elementary school, have no fear! I’ve tried to make this book accessible to you as well.”
Chapter 2: “Consider a Lie algebra of sheaves over the holomorphic fibre bundle PZL(Zn,5)…” (Not really, but close.)
I struggled through Penrose’s masterpiece, but by the end, I felt like I’d come as close as I ever had (and possibly ever will) to understanding “post-1920′s” particle physics and the math underlying it. If you’re willing to invest the effort, you’ll find The Road to Reality so excellent that it “cancels out” Shadows of the Mind, like an electron annihilating its positronic companion.
Q4: Choosing a First Machine Learning Project: Start by Reading or by Doing?
A4: [From http://blog.smellthedata.com/2010/07/choosing-first-machine-learning-project.html]
Sarath writes about doing a project during his final year of university related to machine learning:
I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.
So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?
There are others who are likely more qualified than I am to answer this one, but here are my two cents:
That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).
I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:
- Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
- Set up a framework so you are organized when running experiments
- Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
- Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.
All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.
In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.
If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html
Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.Sarath writes about doing a project during his final year of university related to machine learning:
I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.
So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?
There are others who are likely more qualified than I am to answer this one, but here are my two cents:
That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).
I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:
- Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
- Set up a framework so you are organized when running experiments
- Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
- Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.
All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.
In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.
If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html
Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.
Q5: What are some good resources for learning about machine learning?
A5: [From http://blog.smellthedata.com/2010/06/resources-for-learning-about-machine.html http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning]
There were some good answers, even some of which I didn’t know about. Here’s a sampling of the answers:
My answer was Andrew Ng’s YouTube videos:
http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599
Some other good ones:
Jie Tang says…
Mike Jordan and his grad students teach a course at Berkeley called Practical Machine Learning which presents a broad overview of modern statistical machine learning from a practitioner’s perspective. Lecture notes and homework assignments from last year are available at
http://www.cs.berkeley.edu/~jordan/courses/294-fall09/A Google search will also turn up material from past years
Ben Newhouse says…
The textbook “Elements of Statistical Learning” has an obscene amount of material in it and is freely available in PDF form via http://www-stat.stanford.edu/~tibs/ElemStatLearn/
While more niche than general Machine Learning, I recently ripped through “Natural Image Statistics” (also downloadable at http://www.naturalimagestatistics.net/ ). It’s a great read both for its explanations of your standard ML algo’s (PCA, ICA, mixed gaussians etc) and for its real-world applications/examples in trying to understand the models used for analysis in our neural vision system
Jeremy Leibs gives the staple of David MacKay’s book (I believe David MacKay would say that machine learning is just information theory), right?:
“Information Theory, Inference, and Learning Algorithms” by David MacKay has some decent introductory material if I remember. Available online:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
Incidentally, I haven’t read Programming Collective Intelligence, but it seems popular amongst non researchers. Do any of you know more about it?
Recent Comments