You are currently browsing the monthly archive for March 2011.
Today, the machine learning class started teaching the theory of machine learning. So I want to collect some materials related to it here. And I think I will update this post along with the class.
1, S. Boucheron, O. Bousquet, and G. Lugosi, (2005). Theory of Classification: a Survey of Recent Advances. ESAIM: Probability and Statistics, 9:323375. (PDF,POSTSCRIPT)
This paper discusses a lot of theory behind classification, which is one main part of machine learning.
2, S. Boucheron, O. Bousquet, and G. Lugosi, (2004), Concentration inequalities. (PDF,POSTSCRIPT) in O. Bousquet, U.v. Luxburg, and G. Rätsch (editors), Advanced Lectures in Machine Learning, Springer, pp. 208–240, 2004.
This is the content of the first class on the theory of machine learning. And the author also has the following lecture notes on this topic:
Concentrationofmeasure inequalities presented at the
Machine Learning Summer School 2003, Australian National University, Canberra,
at the Workshop on combinatorics, probability and algorithms 2003, held at the CRM in Montreal,
at the Winter School on Probabilistic Methods in High Dimension Phenomena , Toulouse, January 1014, 2005
and at the Workshop de Combinatória e Concentração de Medida IMPA, Rio de Janeiro, February 2325, 2005 (PDF,POSTSCRIPT).
3, O. Bousquet, S. Boucheron, and G. Lugosi, (2004), Introduction to statistical learning theory. (PDF,POSTSCRIPT) in O. Bousquet, U.v. Luxburg, and G. Rätsch (editors), Advanced Lectures in Machine Learning, Springer, pp. 169–207, 2004.
4, S. Kulkarni, G. Lugosi, and S. Venkatesh (1998), Learning Pattern Classification—A Survey. (POSTSCRIPT) 1948–1998 Special Commemorative Issue of IEEE Transactions on Information Theory. , vol.44, 2178–2206. Reprinted in S. Verdú, S.W. McLaughlin (editors.), Information Theory: 50 Years of Discovery, IEEE Press, New York, 1999.
So far these materials all come from one professor GÁBOR LUGOSI.
For concentration inequalities, there is also a chapter from one book: Old and new concentration inequalities, whose author is Fan Chung Graham. In addition, there is a survey about this: Concentration inequalities and martingale inequalities — a survey .
Videolectures on concentration inequalities with machine learning applications
There is a course: STAT 598Y: Statistical Learning Theory given in Statistics Department at Purdue University. It focuses more on the machine learning theory.
A useful concentration inequality: (α,β) AzumaHoeffding
VCEntropy:
Metric spaces, VCdimension, and metric entropy.— Ken Clarkson’s magnificent survey
Foundations of Statistical Learning Theory : Empirical Inference in highdimention spaces (videolecture)
the slides of statstical learning theory given by Olivier Bousquet
Rademacher complexity:
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
Peter L. Bartlett, Shahar Mendelson; 3(Nov):463482, 2002.
Rademacher Complexity—Marcel Lüthi, Thomas Vetter
Topics in Artificial Intelligence (Learning Theory) – Spring 2008
Algebraic Geometry and Statistical Learning Theory (Cambridge Monographs on Applied and Computational Mathematics)
Neural Network Learning: Theoretical Foundations 

by: Martin Anthony, Peter L. Bartlett 
What is the curse of dimensionality?
Highly recommend reading the lecture delivered by David Donoho titled “High Dimensional Data Analysis: on the Curses and Blessings of Dimensionality“. The “blessings” of data in high dimensions is that the information of interest effectively lies in a lower dimension which we can then leverage, typically, this involves sparse representations or low dimensional manifolds.
Here are some papers that greatly enhanced mathematical understanding of dimensionality reduction
 An Elementary Proof of the JohnsonLindenstrauss Lemma by Dasgupta & Gupta [1]
 A Global Geometric Framework for Nonlinear Dimensionality Reduction (ISOMAP) by Tennenbaum, Silva and Langford [2]
 Laplacian Eigenmaps for Dimensionality Reduction and Data Representation by Niyogi and Belkin [3]
Linear models:
 Bayesian Face Recognition by Moghaddam, Jebara & Pentland [1]
 Object indexing using an iconic sparse distributed memory by Rao & Ballard [2]
Nonlinear dimensionality reduction
 An Introduction to Locally Linear Embedding by Saul & Roweis [3]
 Nonlinear component analysis as a kernel eigenvalue problem by by Scholkopf, Smola & Muller [4]
 Learning a Kernel Matrix for Nonlinear Dimensionality Reduction by Weinberger, Sha & Saul[5]
 Minimum Volume Embedding by Shaw & Jebara[6]
Reference: Advanced Machine Learning course materials by Tony Jebara: http://www.cs.columbia.edu/~jeba…
Also see
 The lectures on large scale machine learning by Sanjiv Kumar: http://www.sanjivk.com/EECS6898/…
 Lv’s paper on compact data structures (and well as the rest of her work): http://citeseerx.ist.psu.edu/vie…
 Indyk’s and his colleagues’ work on LSH: portal.acm.org/citation.cfm?id=6…
 Materials from Michael Jordan’s course “Practical Machine Learning”: http://www.cs.berkeley.edu/~jord…
 Jolliffe, Principal Component Analysis: http://www.amazon.com/Principal…
 Stone, Independent Component Analysis: A Tutorial Introduction:
http://www.amazon.com/Independen…  Lee & Verleysen, Nonlinear Dimensionality Reduction : http://www.amazon.com/Nonlinear…
 Samet, Foundations of Multidimensional and Metric Data Structures, ISBN
0123694469
,http://www.amazon.com/Foundation…
List of books in computational geometry
10 Papers Every Programmer Should Read (At Least Twice)
What are good books for Computation Geometry?
in Probabilistic approach to geometry, Adv. Stud. Pure Math. 57, Math. Soc. Japan (2010), 343–381.This is a gentle introduction to the context and results of my article Ricci curvature of Markov chains on metric spaces. It begins with a description of classical Ricci curvature in Riemannian geometry, as well as a reminder for discrete Markov chains. A number of open problems are mentioned.
Binary Cumulant Varieties
Authors: Bernd Sturmfels, Piotr Zwiernik
Subjects: Combinatorics (math.CO); Algebraic Geometry (math.AG); Statistics Theory (math.ST)
Algebraic statistics for binary random variables is concerned with highly structured algebraic varieties in the space of 2x2x…x2tensors. We demonstrate the advantages of representing such varieties in the coordinate system of binary cumulants. Our primary focus lies on hidden subset models. Parametrizations and implicit equations in cumulants are derived for hyperdeterminants, for secant and tangential varieties of Segre varieties, and for certain contextspecific independence models. Extending work of Rota and collaborators, we explore the polynomial inequalities satisfied by cumulants.
Recent Comments