You are currently browsing the monthly archive for December 2010.

Top five posts of 2010 witten by John D. Cook in his blog The Endeavour:

  1. Write-only articles
  2. Math library functions that seem unnecessary
  3. Don’t invert that matrix
  4. What does this code do?
  5. Why computers have two zeros: +0 and -0

Quotes, Quoting, and Quotations witten by rjlipton in the blog Gödel’s Lost Letter and P=NP.

Advertisements

 

EDIT: I’ve started a home for machine learning folks powered by Eggsprout at http://machine-learning.eggsprout.com. Please join me and bring others like us together. Don’t be alarmed if the site is a little empty now, all communities have to start somewhere .

If you’re that kind of person and you somehow came across this blog, here’s what I’ve found useful so far:

Stanford CS229 Machine Learning Course on Youtube: Probably the most useful and relevant resource for a beginner. I somewhat enjoy Andrew Ng’s teaching style. I bet he’d be a great mentor to have for research. The way the course is organized is very different from how professor Pedro Domingos taught us though. Not far enough in the series to say whether I like it more or not.

CS229 Lecture Notes: Lecture notes that accompany the Youtube videos.

UW Part-time Masters Lectures: Taught by professor Pedro Domingos, awesome teacher and incredibly genius.

VideoLectures.net: These look like they’d be more advanced but interesting nonetheless. So much to watch… so little time.

ResearchChannel.org: I love this resource for just about any topic. I’ve spent hours on this studying neurobiology back when I first entered college. Hopefully the machine learning topics here are just as interesting.

Machine Learning Data Repository: Nice repository of data for when you’re ready to practice using an algorithm. One of my homework assignments from college was from here.

Ruby A.I Plugins: A few ruby machine learning libraries. I’m a rubyist and it’d be nice to see more of these. Maybe I’ll contribute to these some day .

Tegu: A machine learning system in Ruby developed by David Richards. He looks like an interesting character and I’m keeping an eye on him and his projects. I like his dedication to ruby and machine learning.

Aside: Why do Google searches on “Machine Learning” always turn up results about sewing machines?

From: http://ianma.wordpress.com/2009/07/19/machine-learning-for-beginners/

http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

 http://anyall.org/blog/2008/12/statistics-vs-machine-learning-fight/

In my previous post on learning about statistical learning, I took a shot at creating a guide for those who want to learn about statistical learning.  I received a lot of great feedback from friends in the field, the comments on my blog, and discussions around the web like the hackernews thread.   This revised edition of the post takes these discussions and feedback into account.

I humbly submit this guide, upon material over which I do not have full mastery, in the case that it may be of use to others trying to find their way.

[ Student working in Statistics Machine Room. source: flickr ]

Fragmentation and Ambiguity: what’s in a name?

I renamed this second edition of my post to “Learning about Machine Learning” from “Learning about Statistical Learning.”  What is the better name?  I don’t know.  Things are a bit fragmented.  There are some nuances and some of the fragmentation makes sense.  In my opinion, most of it doesn’t.  You have “computational learning,” “computational statistics,” “statistical learning,” “machine learning” and “learning theory”.  You have Statistics vs. Machine Learning, the culture of algorithms vs. machine learning, the stats handicap,  and COMPUTATIONAL statistics vs. computational STATISTICS.

It can be confusing.  I bring this up to help you preempt the confusion.  When you see these different terms, don’t worry about it, just focus on the topics you need to learn.  In this guide I will try to help point out where you will see the same thing by different names.

Straight into the Deep End

In the previous list, I thought it would be good to recommend some lighter texts as introductions to topics like probability theory and machine learning.  Based upon subsequent discussions and feedback, I’ve changed my view.  Straight into the deep end is the way to go.

The caveat is that you need to be honest with yourself about when you encounter something that you do not understand and you are unable to work through despite trying your best.  At this point you’ve found your way to some material that you do not have the foundation for.  Put the book down, insert a bookmark.  Close the book.  Walk away.  Go find a resource to build the foundation.  Come back later.  Repeat.

Don’t be afraid to admit you don’t understand something, that is the only way you are going to move forward and really understand the material.  As Peter Norvig says about programming, be prepared to make a serious commitment of time and effort if you want to be a researcher.

The Books

I am making a number of changes which I mention along the way.  The overall list is now structured by topic with inline notes and there are a number of new topics so that you can pick and choose based upon your areas of interest.

Now on to the books.

I’m moving the foundation in proofs and analysis up to be a the top priority, as well as adding some new references from the analysis list.  You won’t get far if you are intimidated by Greek characters and unable to follow proofs.  If you build an intuition for programming as proof writing, that will also bend your mind in the right direction.  I’m including a link to one of the fun puzzle books that I find helpful – logic puzzles can help get you into the flow of proofs.  I’m not including a book on functional analysis here, but I have a few you can choose from on the analysis list if you are interested.

Introduction to Analysis – Maxwell Rosenlicht
Mathematical Analysis – Apostol
How to Prove It: A Structured Approach – Daniel J. Velleman
How to Solve it – Gyorgy Polya
Proofs are Programs [pdf] – Philip Wadler
The Lady and the Tiger – Raymond Smullyan

You will be working with algorithms constantly, so work through a couple books from the algorithms and data structures list.  We are also in an age of rapidly increasing data volume, so I am also recommending a book to help you learn how decompose algorithms and work with Map Reduce style parallelism – a great read which is not published yet but you can get the freePDF online.

Introduction to Algorithms by Charles E. Leiserson
The Algorithms Design Manual – Steven Skiena
Data-Intensive Text Processing with MapReduce – Jimmy Lin and Chris Dyer
MapReduce for Machine Learning on Multicore – from a group in the Stanford CS department

Linear algebra is critical – matrices are your friends.  A lot of what you will see is represented and computed naturally in matrix notation – it is convenient for both theory and practice.  You can find a variety of items in the matrix fu list, and you can also find Gilbert Strag’s lectures online from MIT courseware.  Matrix algebra, like algorithms, is a fundamental building block that you will work with on a daily basis.

Introduction to Linear Algebra, Fourth Edition by Gilbert Strang
Matrix Algebra – James Gentle

In this second edition of my post, I’m adding a strong focus on linear models, as they are the foundation for a lot of models that you will encounter in machine learning.  The books in the machine learning section below will help to put linear models in a broader setting.  For example Chapter 2 of The Elements of Statistical Learning has a nice overview section of least squares and nearest neighbors that helps you understand how many other techniques are based upon these two simple building blocks.  Some sources also have nice discussions of the relationship between linear models and other approaches like neural networks.  Starting with a strong foundation in linear models is wise.  I had not done so properly when jumping into machine learning, and it has required a lot of backtracking.

Linear Models With R – Julian James Faraway
An Introduction to Generalized Linear Models – Annette J. Dobson
Applied Linear Statistical Models – John Neter

There are many titles to choose from the probability list, in order to build a base in probability theory.  I’m also adding a reference for looking at probability from the Bayesian perspective.  Probability can be very counter-intuitive.  A great way to get around this is to train yourself to understand how we misinterpret things.  Read one of theKaneman and Tversky books from my finance, trading and economics list, and work through some puzzle books.  Aside from psycological biases, a lot of the trickiness of empirical probability, especially conditional probabilities, is knowing what and how to count.  If you want to explore counting, there a number of good books on the probability list.

A Course in Probability Theory, Revised Edition, Second Edition by Kai Lai Chung
First Look at Rigorous Probability Theory by Jeffrey S. Rosenthal
Probability Theory: The Logic of Science – E. T. Jaynes

When you are reading to build an understanding of statistics on top of probability theory, move on to the statistics list.  Statistics and inference are critical in machine learning, and it is also nice to understand how information theory fits into this picture.  I haven’t included an information theory book here, but you can find a nice one that fits information theory with statistics on the statistics list.  A lot of bits and pieces on statistics will have been covered already with probability theory and linear models.  I’ve added a reference on hypothesis testing and multivariate statistics.  I’ve also created a new new section below on robust and non-parametric statistics.

All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics) by Larry Wasserman
Testing Statistical Hypothesis – E. L. Lehmann
Bayesian Statistics – William M. Bolstad
Modern Multivariate Statistical Techniques – Alan Julian Izenman

Non-parametric and robust statistics are very neat topics.  Non-parametric statistics, can mean a few things but the core intuition is that the techniques do not assume a parametrized model for the probability distribution underlying the observed data.  The notion of robust statistics stems from the desired to create statistical methods that are 1) resistant to the deviation from assumptions that empirical data often exhibit, and 2) tolerant to outliers as can be measured by the breakdown point – the point at which infinitely large outliers causes an estimator to yield an infinitely large result.

Robust Statistics – Peter J. Huber
Robust Statistics: The Approach Based on Influence Functions – Frank R. Hampel
Robust Regression – Peter J. Rousseeuw
All of Nonparametric Statistics – Larry Wasserman

I’ve replaced Mitchell with Bishop as a companion for EoSL.  There are more to choose from on the machine learning list if you’re feeling motivated, but the three below give a very solid coverage of most of the core material.  MacKay’s book is a great help with information theory if you haven’t yet gotten a solid understanding by the time you get to this point.

Pattern Recognition and Machine Learning – Christopher Bishop
Information Theory, Inference and Leaning Algorithms – David J. C. MacKay
The Elements of Statistical Learning by Robert Tibshirani, et al.

Understand features (or predictors, input variables, whatever you prefer to call them) and feature selection.  Know your different strategies for dealing with continuous vs. discrete features. Feature selection (especially “embedded methods”),  “sparsification” and “shrinkage methods” are all more or less different sides of the same hyper-coin.  These ideas seek to eliminate the impact of irrelevant features upon your model by either marginalizing them out entirely or at least reducing their influence by reducing the parameter values for less relevant predictors.  The two books below are focused on this problem, but you will also find a lot about regularized regression and other such techniques in other machine learning and statistical learning texts.

Feature Extraction – Isabelle Guyon, et al.
Computational Methods of Feature Selection – Huan Liu, Hiroshi Motoda

Heuristic search and optimization
are important topics, and fortunately I have found a great survey text that I am working my way through right now, which is included below from the heuristic search and optimization list.  I am also including a link that a friend pointed out to a book and lecture slides on convex optimization that you can get online.

Introduction to Stochastic Search and Optimization by James C. Spall
Convex Optimization (online) – Stephen Boyd

I’ve added two texts on Bayesian networks and graphical modelsGraphical models are a cool family of techniques that let you model rich dependence structures between input variables by relying on a graph structure to represent conditional independence.

Modeling and Reasoning with Bayesian Networks – Adnan Darwiche
Probabilistic Graphical Models – Nir Friedman

For those working with text, I’ve added a books from the IR and NLP list – each of which I think represents the strongest single introductory text and reference.  Information Retrieval (IR) and Natural Language Processing (NLP) are active areas of study that continue to become and increasingly important as a result of the internet.  We use IR every day in search engines, but I think new forms of retrieval systems using statistical NLP are just around the corner.

Introduction to Information Retrieval – Christopher D. Manning (available free online)

Foundations of Statistical Natural Language Processing – Christopher D. Manning
Network Theory is another booming area accelerated by the Internet.  Social, information and transportation networks all present very challenging and important problems.  As especially ripe and tricky topic is the evolution of networks over time, or what is sometimes called “longitudinal analysis” by the social networks folks.  Network theory is extremely useful for determining the importance of information or people within networks – for example, link analysis algorithms like  PageRank are based upon network theory.  I’ve tossed in a book on graph theory in case you need to brush up – you will need it for networks.

Introductory Graph Theory – Gary Chartrand
Social Network Analysis – Stanley Wasserman
Statistical Analysis of Network Data – Eric D. Kolaczyk
Network Flows – Ravindra K. Ahuja
Fundamentals in Queuing Theory – Donald Gross, et al.

You will have to write a lot of code.  Learning how to do good engineering work is more important than people realize.  Learn some programming languages; python (numpy/scipy), R, and at least one nice functional language (probably Haskell, Clojure, or OCaml).  Learn how to work with Linux.  Learn about design and testing.  Picking up some things on the lean and agile software development approach will likely help – I use it for research projects just as much as software.  As I mentioned earlier, algorithms and data structures are critical, and you may also want to learn about parallelism and distributed systems if you work with large data sets.

[ Students working with large data sets.  source: flickr ]

All along the way, you might enjoy watching lectures.  I have found videolectures.net and google video to be the best overall sources.  I am very happy to hear others if you have any recommendations.

You can find a lot more on my Amazon lists and ever-expanding Amazon wish lists.  If you are involved in theoretical or applied research in academia or industry, and you disagree with some of these book selections, have alternate recommendations, or you think I am missing some topics, I’d love to hear.

Lastly, here are Michael Jordan‘s recommended readings via the hackernews thread that sprung up following my last post.  He knows a lot more about machine learning than I do, so perhaps you should just ignore the above recommendations and listen to him.

I personally think that everyone in machine learning should be (completely) familiar with essentially all of the material in the following intermediate-level statistics book:

1.) Casella, G. and Berger, R.L. (2001). “Statistical Inference” Duxbury Press.

For a slightly more advanced book that’s quite clear on mathematical techniques, the following book is quite good:

2.) Ferguson, T. (1996). “A Course in Large Sample Theory” Chapman & Hall/CRC.

You’ll need to learn something about asymptotics at some point, and a good starting place is:

3.) Lehmann, E. (2004). “Elements of Large-Sample Theory” Springer.

Those are all frequentist books. You should also read something Bayesian:

4.) Gelman, A. et al. (2003). “Bayesian Data Analysis” Chapman & Hall/CRC.

and you should start to read about Bayesian computation:

5.) Robert, C. and Casella, G. (2005). “Monte Carlo Statistical Methods” Springer.

On the probability front, a good intermediate text is:

6.) Grimmett, G. and Stirzaker, D. (2001). “Probability and Random Processes” Oxford.

At a more advanced level, a very good text is the following:

7.) Pollard, D. (2001). “A User’s Guide to Measure Theoretic Probability” Cambridge.

The standard advanced textbook is Durrett, R. (2005). “Probability: Theory and Examples” Duxbury.

Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:

8.) Bertsimas, D. and Tsitsiklis, J. (1997). “Introduction to Linear Optimization” Athena.

And then you can graduate to:

9.) Boyd, S. and Vandenberghe, L. (2004). “Convex Optimization” Cambridge.

Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in

10.) Golub, G., and Van Loan, C. (1996). “Matrix Computations” Johns Hopkins.

It’s good to know some information theory. The classic is:

11.) Cover, T. and Thomas, J. “Elements of Information Theory” Wiley.

Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven’t already). Functional analysis is essentially linear algebra in infinite dimensions, and it’s necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here’s a book that I find very readable:

12.) Kreyszig, E. (1989). “Introductory Functional Analysis with Applications” Wiley.

From: http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

Teach Yourself Programming in Ten Years

Peter Norvig

Why is everyone in such a rush?

Walk into any bookstore, and you’ll see how to Teach Yourself Java in 7 Days alongside endless variations offering to teach Visual Basic, Windows, the Internet, and so on in a few days or hours. I did the following power search at Amazon.com:

     pubdate: after 1992 and title: days and
      (title: learn or title: teach yourself)

and got back 248 hits. The first 78 were computer books (number 79 was Learn Bengali in 30 days). I replaced “days” with “hours” and got remarkably similar results: 253 more books, with 77 computer books followed by Teach Yourself Grammar and Style in 24 Hours at number 78. Out of the top 200 total, 96% were computer books.The conclusion is that either people are in a big rush to learn about computers, or that computers are somehow fabulously easier to learn than anything else. There are no books on how to learn Beethoven, or Quantum Physics, or even Dog Grooming in a few days. Felleisen et al. give a nod to this trend in their book How to Design Programs, when they say “Bad programming is easy. Idiots can learn it in 21 days, even if they are dummies.

Let’s analyze what a title like Learn C++ in Three Days could mean:

  • Learn: In 3 days you won’t have time to write several significant programs, and learn from your successes and failures with them. You won’t have time to work with an experienced programmer and understand what it is like to live in a C++ environment. In short, you won’t have time to learn much. So the book can only be talking about a superficial familiarity, not a deep understanding. As Alexander Pope said, a little learning is a dangerous thing. 
  • C++: In 3 days you might be able to learn some of the syntax of C++ (if you already know another language), but you couldn’t learn much about how to use the language. In short, if you were, say, a Basic programmer, you could learn to write programs in the style of Basic using C++ syntax, but you couldn’t learn what C++ is actually good (and bad) for. So what’s the point? Alan Perlis once said: “A language that doesn’t affect the way you think about programming, is not worth knowing”. One possible point is that you have to learn a tiny bit of C++ (or more likely, something like JavaScript or Flash’s Flex) because you need to interface with an existing tool to accomplish a specific task. But then you’re not learning how to program; you’re learning to accomplish that task. 
  • in Three Days: Unfortunately, this is not enough, as the next section shows.

Teach Yourself Programming in Ten Years

Researchers (Bloom (1985), Bryan & Harter (1899), Hayes (1989), Simmon & Chase (1973)) have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, telegraph operation, painting, piano playing, swimming, tennis, and research in neuropsychology and topology. The key is deliberative practice: not just doing it again and again, but challenging yourself with a task that is just beyond your current ability, trying it, analyzing your performance while and after doing it, and correcting any mistakes. Then repeat. And repeat again. There appear to be no real shortcuts: even Mozart, who was a musical prodigy at age 4, took 13 more years before he began to produce world-class music. In another genre, the Beatles seemed to burst onto the scene with a string of #1 hits and an appearance on the Ed Sullivan show in 1964. But they had been playing small clubs in Liverpool and Hamburg since 1957, and while they had mass appeal early on, their first great critical success, Sgt. Peppers, was released in 1967. Malcolm Gladwell reports that a study of students at the Berlin Academy of Music compared the top, middle, and bottom third of the class and asked them how much they had practiced:

Everyone, from all three groups, started playing at roughly the same time – around the age of five. In those first few years, everyone practised roughly the same amount – about two or three hours a week. But around the age of eight real differences started to emerge. The students who would end up as the best in their class began to practise more than everyone else: six hours a week by age nine, eight by age 12, 16 a week by age 14, and up and up, until by the age of 20 they were practising well over 30 hours a week. By the age of 20, the elite performers had all totalled 10,000 hours of practice over the course of their lives. The merely good students had totalled, by contrast, 8,000 hours, and the future music teachers just over 4,000 hours.

So it may be that 10,000 hours, not 10 years, is the magic number. Samuel Johnson (1709-1784) thought it took longer: “Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price.” And Chaucer (1340-1400) complained “the lyf so short, the craft so long to lerne.” Hippocrates (c. 400BC) is known for the excerpt “ars longa, vita brevis”, which is part of the longer quotation “Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile”, which in English renders as “Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult.” Although in Latin, ars can mean either art or craft, in the original Greek the word “techne” can only mean “skill”, not “art”.

Here’s my recipe for programming success:

  • Get interested in programming, and do some because it is fun. Make sure that it keeps being enough fun so that you will be willing to put in ten years. 
  • Talk to other programmers; read other programs. This is more important than any book or training course. 
  • Program. The best kind of learning is learning by doing. To put it more technically, “the maximal level of performance for individuals in a given domain is not attained automatically as a function of extended experience, but the level of performance can be increased even by highly experienced individuals as a result of deliberate efforts to improve.” (p. 366) and “the most effective learning requires a well-defined task with an appropriate difficulty level for the particular individual, informative feedback, and opportunities for repetition and corrections of errors.” (p. 20-21) The book Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life is an interesting reference for this viewpoint. 
  • If you want, put in four years at a college (or more at a graduate school). This will give you access to some jobs that require credentials, and it will give you a deeper understanding of the field, but if you don’t enjoy school, you can (with some dedication) get similar experience on the job. In any case, book learning alone won’t be enough. “Computer science education cannot make anybody an expert programmer any more than studying brushes and pigment can make somebody an expert painter” says Eric Raymond, author of The New Hacker’s Dictionary. One of the best programmers I ever hired had only a High School degree; he’s produced a lot of great software, has his own news group, and made enough in stock options to buy his own nightclub.
  • Work on projects with other programmers. Be the best programmer on some projects; be the worst on some others. When you’re the best, you get to test your abilities to lead a project, and to inspire others with your vision. When you’re the worst, you learn what the masters do, and you learn what they don’t like to do (because they make you do it for them). 
  • Work on projects after other programmers. Be involved in understanding a program written by someone else. See what it takes to understand and fix it when the original programmers are not around. Think about how to design your programs to make it easier for those who will maintain it after you. 
  • Learn at least a half dozen programming languages. Include one language that supports class abstractions (like Java or C++), one that supports functional abstraction (like Lisp or ML), one that supports syntactic abstraction (like Lisp), one that supports declarative specifications (like Prolog or C++ templates), one that supports coroutines (like Icon or Scheme), and one that supports parallelism (like Sisal). 
  • Remember that there is a “computer” in “computer science”. Know how long it takes your computer to execute an instruction, fetch a word from memory (with and without a cache miss), read consecutive words from disk, and seek to a new location on disk. (Answers here.
  • Get involved in a language standardization effort. It could be the ANSI C++ committee, or it could be deciding if your local coding style will have 2 or 4 space indentation levels. Either way, you learn about what other people like in a language, how deeply they feel so, and perhaps even a little about why they feel so. 
  • Have the good sense to get off the language standardization effort as quickly as possible.

With all that in mind, its questionable how far you can get just by book learning. Before my first child was born, I read all the How To books, and still felt like a clueless novice. 30 Months later, when my second child was due, did I go back to the books for a refresher? No. Instead, I relied on my personal experience, which turned out to be far more useful and reassuring to me than the thousands of pages written by experts.Fred Brooks, in his essay No Silver Bullet identified a three-part plan for finding great software designers:

  1. Systematically identify top designers as early as possible. 
  2. Assign a career mentor to be responsible for the development of the prospect and carefully keep a career file. 
  3. Provide opportunities for growing designers to interact and stimulate each other. 

This assumes that some people already have the qualities necessary for being a great designer; the job is to properly coax them along. Alan Perlis put it more succinctly: “Everyone can be taught to sculpt: Michelangelo would have had to be taught how not to. So it is with the great programmers”.So go ahead and buy that Java book; you’ll probably get some use out of it. But you won’t change your life, or your real overall expertise as a programmer in 24 hours, days, or even months.


References

Bloom, Benjamin (ed.) Developing Talent in Young People, Ballantine, 1985.

Brooks, Fred, No Silver Bullets, IEEE Computer, vol. 20, no. 4, 1987, p. 10-19.

Bryan, W.L. & Harter, N. “Studies on the telegraphic language: The acquisition of a hierarchy of habits. Psychology Review, 1899, 8, 345-375

Hayes, John R., Complete Problem Solver Lawrence Erlbaum, 1989.

Chase, William G. & Simon, Herbert A. “Perception in Chess” Cognitive Psychology, 1973, 4, 55-81.

Lave, Jean, Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life, Cambridge University Press, 1988.


Answers

Approximate timing for various operations on a typical PC:

execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

Appendix: Language Choice

Several people have asked what programming language they should learn first. There is no one answer, but consider these points: 

  • Use your friends. When asked “what operating system should I use, Windows, Unix, or Mac?”, my answer is usually: “use whatever your friends use.” The advantage you get from learning from your friends will offset any intrinsic difference between OS, or between programming languages. Also consider your future friends: the community of programmers that you will be a part of if you continue. Does your chosen language have a large growing community or a small dying one? Are there books, web sites, and online forums to get answers from? Do you like the people in those forums?
  • Keep it simple. Programming languages such as C++ and Java are designed for professional development by large teams of experienced programmers who are concerned about the run-time efficiency of their code. As a result, these languages have complicated parts designed for these circumstances. You’re concerned with learning to program. You don’t need that complication. You want a language that was designed to be easy to learn and remember by a single new programmer.
  • Play. Which way would you rather learn to play the piano: the normal, interactive way, in which you hear each note as soon as you hit a key, or “batch” mode, in which you only hear the notes after you finish a whole song? Clearly, interactive mode makes learning easier for the piano, and also for programming. Insist on a language with an interactive mode and use it.

Given these criteria, my recommendations for a first programming language would be Python or Scheme. But your circumstances may vary, and there are other good choices. If your age is a single-digit, you might prefer Alice or Squeak (older learners might also enjoy these). The important thing is that you choose and get started.


Appendix: Books and Other Resources

Several people have asked what books and web pages they should learn from. I repeat that “book learning alone won’t be enough” but I can recommend the following: 

 


Notes

T. Capey points out that the Complete Problem Solver page on Amazon now has the “Teach Yourself Bengali in 21 days” and “Teach Yourself Grammar and Style” books under the “Customers who shopped for this item also shopped for these items” section. I guess that a large portion of the people who look at that book are coming from this page. Thanks to Ross Cohen for help with Hippocrates.

From: http://norvig.com/21-days.html

Comment: This is a ‘getting started’ list. The references below all have their own references, that will take you in many directions.

1. Readings for the non-specialist

(a) Hoste, Thistlethwaite and Weeks, The First 1,701,936 Knots, Scientific American, 20, No. 4, 1998. A delightful article which conveys much about the state of knowledge in 1999 about knot classification, by describing how the three authors accomplished their enormous task of classifying knots up to 16 crossings. 

(b) Artin, E., The Theory of Braids, American Scientist,38, 1950,, pp. 112-119. A delightful and very accessible article on an aspect of knot theory which has become very important in all areas of mathematics. Passages from this article were used in the GRE exams in the mid 1950’s! Directed at an audience of non-mathematicians.

(c) Lickorish and Millett, The New Polynomial Invariant of Knots and Links, Mathematics Magazine, 61, No. 1, February 1988. The authors were awarded the Chauvenet Prize for ”Expository Writing” by the MAA, for this article. It’s at a level which is accessible to bright high school students.

(d) Livingston, Knot Theory, MAA Carus Monographs No. 24, 1993. A survey, directed at mathematicians with backgrounds in other areas. Not a textbook. Clear, and a good guide to further reading.

(e) Kauffman, L. States Models and the Jones Polynomial, Topology, 26, 1987, 395-407. An important research article which can be read with a minimum of background in the subject.

(f) Sumners, D.W. The Role of Knot Theory in DNA Research, 297-317, Geometry and Topology, Editors McCrory and Shifrin, Marcel Decker 1987. A talk given to a group of mathematicians at a recent conference, to tell them about problems which are of interest to biologists. It contains a useful ”dictionary” to translate the lingo of chemistry and biology into mathematical terms. It also has references, if you want to learn more about the biology.

 2. Readings for Mathematics graduate students or upper level undergrads

(a) Rolfsen, D. Knots and Links, Publish or Perish, 1978. A graduate textbook on knot theory, out of date, but excellent.

(b) Hempel, John 3-manifolds, Annals of Math Studies #86, Princeton University Press 1976. Very basic stuff.

(c) Prasolov, V.V. and Sossinsky, A.B. Knots, Links, Braids and 3-manifolds, AMS Translations of Mathematical Monographs 154, 1996.

(d) Saveliev, Nikolai, Lectures on the topology of 3-manifolds: an introduction to the Casson invariant, deGruyter 1996. 

(e) Manturov, Vassily, Knot Theory, Chapman and Hall 2004.

(f) Scott, Peter, The Geometry of 3-Manifolds, Bull. London Math. Soc. 15, 1983,401-487.

(g) Birman, J.S. New points of view in knot and link theory, Bull. AMS APRIL 1993, A survey article on the state of the art as regards finite type invariants, in 1993. Directed at an audience of graduate students and research mathematicians. Won the Chauvenet Prize.

From: http://www.math.columbia.edu/~jb/readinglist.pdf 

 

In machine learning, we often take probability for granted. We desire a system for representing uncertainty in the world, and Cox’s theorem tells us that if we accept some basic postulates regarding what is desirable in a system of uncertainty, we will end up with probability.

So that should be the end of the story… right? Well, maybe not. The first Cox postulate is

Divisibility and comparability – The plausibility of a statement is a real number and is dependent on information we have related to the statement,

which seems quite innocent. However, who’s to say that there is anything fundamental about real numbers? Real numbers have strange things like irrational numbers and negative numbers (crazy, I know), but they’re lacking in comparison to imaginary numbers (there’s no operation that you can apply 4 times before returning to your original value, which you can do by multiplying by i with imaginary numbers). It seems kind of arbitrary to choose real numbers. For a fun and interesting read, see the following link. It makes the point better than I can:

Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.

But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?

Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.

http://betterexplained.com/articles/a-visual-intuitive-guide-to-imaginary-numbers/

Imaginary numbers come up in the context of systems of uncertainty when we deal with quantum mechanics. The basic idea is that interactions operate over amplitudes (expressed as complex numbers), then to determine the likelihood of a final configuration, you look at norms of amplitudes. For a relatively straightforward explanation, see here: http://lesswrong.com/lw/pd/configurations_and_amplitude/

So I don’t necessarily have any well-formed thoughts on the matter (yet?), but it’s fun to think about other principled ways of representing uncertainty. I’m curious to know if there are types of interactions useful for machine learning that would be hard to represent with standard probability models but that would be aided by these types of quantum models.

Finally, I leave you with this blog comment from The Blog of Scott Aaronson:

“graphical models with amplitudes instead of probabilities” is a fair definition of a quantum circuit (and therefore a quantum computer).

http://scottaaronson.com/blog/?p=74#comment-1702

That seems to me, worth understanding deeper.

From: http://blog.smellthedata.com/2010/06/uncertainty-probability-and-quantum.html

From the comments on my last post:

scott would you be so kind, if you have some spare time, to post a list of textbooks that you’ve read in math and cs?

Now this is the kind of blog topic I like: zero expenditure of emotional energy required; lends itself to snarky one-liners. So here’s a list of math and CS books that I’ve read and you should too. Pitifully incomplete, but enough to get you started.

  • Computational Complexity, by Christos Papadimitriou
    The Iliad of complexity. An epic poem to read and reread until you can quote it by section number, until the pages fall out of the spine. Christos is not a textbook writer but a bard — and the only problem with his tale is that it ends around the late 1980′s. Christos, if you’re reading this: we want an Odyssey!
  • Gems of Theoretical Computer Science, by Uwe Schöning and Randall Pruim
    The proofs are terse, but I love the division into short, digestible “gems.” Keep this one on your night table or by the toilet. (But not until you’ve mastered Papadimitriou and are ready to wield BP operators like a Fortnow.)
  • Quantum Computation and Quantum Information, by “Mike & Ike” (Michael Nielsen and Isaac Chuang)
    The best quantum computing book. I open it to the section on fidelity and trace distance pretty much every time I write a paper. (I’ve heard the other sections are excellent too.)
  • Randomized Algorithms, by Rajeev Motwani and Prabhakar Raghavan
    Chernoff bounds, random walks, second eigenvalues, PCP’s … a 1-1/e fraction of what you need to know about randomness.
  • Artificial Intelligence: A Modern Introduction, by Stuart Russell and Peter Norvig
    Almost (but not quite) made me go into AI. My favorite chapter is the last one, which carefully demolishes the arguments of John Searle and the other confuseniks.
  • Complexity and Real Computation, by Lenore Blum, Felix Cucker, Michael Shub, and Steve Smale
    Decidability of the Mandelbrot set? P versus NP over the complex numbers? I may be a Boolean chauvinist, but I knows an elegant theory when I sees one.
  • The Book of Numbers, by John Conway and Richard Guy
    Since this is a popular book, obviously I couldn’t have learned anything new from it, but it was nice to “refresh my memory” about octonions, Heegner numbers, and why eπ sqrt(163) is within 0.00000000000075 of an integer.
  • The Road to Reality: A Complete Guide to the Laws of the Universe, by Roger Penrose
    Preface: “Even if you hated fractions in elementary school, have no fear! I’ve tried to make this book accessible to you as well.”
    Chapter 2: “Consider a Lie algebra of sheaves over the holomorphic fibre bundle PZL(Zn,5)…” (Not really, but close.)
    I struggled through Penrose’s masterpiece, but by the end, I felt like I’d come as close as I ever had (and possibly ever will) to understanding “post-1920′s” particle physics and the math underlying it. If you’re willing to invest the effort, you’ll find The Road to Reality so excellent that it “cancels out” Shadows of the Mind, like an electron annihilating its positronic companion.

From: http://www.scottaaronson.com/blog/?p=74

Q4: Choosing a First Machine Learning Project: Start by Reading or by Doing?

A4: [From http://blog.smellthedata.com/2010/07/choosing-first-machine-learning-project.html]

Sarath writes about doing a project during his final year of university related to machine learning:

I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?

There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:

  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.

All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.

In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.Sarath writes about doing a project during his final year of university related to machine learning:

I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?

There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:

  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.

All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.

In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.

Q5: What are some good resources for learning about machine learning?

A5: [From http://blog.smellthedata.com/2010/06/resources-for-learning-about-machine.html            http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning]

There were some good answers, even some of which I didn’t know about. Here’s a sampling of the answers:

My answer was Andrew Ng’s YouTube videos:
http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599

Some other good ones:
Jie Tang says…

Mike Jordan and his grad students teach a course at Berkeley called Practical Machine Learning which presents a broad overview of modern statistical machine learning from a practitioner’s perspective. Lecture notes and homework assignments from last year are available at
http://www.cs.berkeley.edu/~jordan/courses/294-fall09/

A Google search will also turn up material from past years

Ben Newhouse says…

The textbook “Elements of Statistical Learning” has an obscene amount of material in it and is freely available in PDF form via http://www-stat.stanford.edu/~tibs/ElemStatLearn/

While more niche than general Machine Learning, I recently ripped through “Natural Image Statistics” (also downloadable at http://www.naturalimagestatistics.net/ ). It’s a great read both for its explanations of your standard ML algo’s (PCA, ICA, mixed gaussians etc) and for its real-world applications/examples in trying to understand the models used for analysis in our neural vision system

Jeremy Leibs gives the staple of David MacKay’s book (I believe David MacKay would say that machine learning is just information theory), right?:

“Information Theory, Inference, and Learning Algorithms” by David MacKay has some decent introductory material if I remember. Available online:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Incidentally, I haven’t read Programming Collective Intelligence, but it seems popular amongst non researchers. Do any of you know more about it?

Blog Stats

  • 94,879 hits

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 520 other followers

Twitter Updates