Q4: Choosing a First Machine Learning Project: Start by Reading or by Doing?

A4: [From http://blog.smellthedata.com/2010/07/choosing-first-machine-learning-project.html]

Sarath writes about doing a project during his final year of university related to machine learning:

I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?

There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:

  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.

All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.

In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.Sarath writes about doing a project during his final year of university related to machine learning:

I am writing this email to ask for some advice. well the thing is i haven’t decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?

There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don’t need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver’s Combinatorial Optimization, but it’s not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I’d be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you’ve chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:

  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.

All the while, be on the lookout for specific cases where an algorithm doesn’t work very well, assumptions that seem strange, or connections between the approach you’re working on to other algorithms or problems that you’ve run across before. Any of these can be the seed of a good research project.

In my estimation, I’d think graduate schools would be more impressed by a relevant, carefully done project, even if it’s not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you’re looking for project ideas, check out recent projects that have been done by students of Andrew Ng’s machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.

Q5: What are some good resources for learning about machine learning?

A5: [From http://blog.smellthedata.com/2010/06/resources-for-learning-about-machine.html            http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning]

There were some good answers, even some of which I didn’t know about. Here’s a sampling of the answers:

My answer was Andrew Ng’s YouTube videos:
http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599

Some other good ones:
Jie Tang says…

Mike Jordan and his grad students teach a course at Berkeley called Practical Machine Learning which presents a broad overview of modern statistical machine learning from a practitioner’s perspective. Lecture notes and homework assignments from last year are available at
http://www.cs.berkeley.edu/~jordan/courses/294-fall09/

A Google search will also turn up material from past years

Ben Newhouse says…

The textbook “Elements of Statistical Learning” has an obscene amount of material in it and is freely available in PDF form via http://www-stat.stanford.edu/~tibs/ElemStatLearn/

While more niche than general Machine Learning, I recently ripped through “Natural Image Statistics” (also downloadable at http://www.naturalimagestatistics.net/ ). It’s a great read both for its explanations of your standard ML algo’s (PCA, ICA, mixed gaussians etc) and for its real-world applications/examples in trying to understand the models used for analysis in our neural vision system

Jeremy Leibs gives the staple of David MacKay’s book (I believe David MacKay would say that machine learning is just information theory), right?:

“Information Theory, Inference, and Learning Algorithms” by David MacKay has some decent introductory material if I remember. Available online:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Incidentally, I haven’t read Programming Collective Intelligence, but it seems popular amongst non researchers. Do any of you know more about it?

Advertisements