You are currently browsing the monthly archive for February 2011.

I will not go to the generalized linear model class any more this semester, since I have known what the class is talking about and I think from now on it will be techniquly talking about binary response models and count response models. I will learn it later on by myself since the instructor is also the first time to teach this class and he is not familiar with this course too, and thus I think I can not learn more from him than from the textbook. If anyone has good materials to recommend to me, I will appreciate much. Thanks.

### Data Analysis using Computational Topology and Geometric Statistics

A standard paradigm assumes that the data comes from some underlying geometric structure, such as a curved submanifold or a singular algebraic variety. The observed data is obtained as a random sample from this space, and the objective is to statistically recover features of the underlying space and/or the distribution that generated the sample.

In Geometric Statistics one uses the underlying Riemannian structure to recover quantitative information concerning the probability distribution and/or functionals thereof. The idea is to extend statistical estimation techniques to functions over Riemannian manifolds, utilizing spectral methods adapted to the Riemannian structure.

One then considers the magnitude of the statistical accuracy of these estimators. Considerable progress has been achieved in terms of optimal estimation in the minimax sense. These ideas have far reaching implications in the analysis of high-dimensional data such as, for example, in astronomy, biomechanics, medical imaging, microwave engineering and texture analysis.

In Computational Topology, one attempts to recover more qualitative global features of the underlying data instead, such as connectedness, or the number of holes, or the existence of obstructions to certain constructions, based upon the random sample. In other words, one hopes to recover the underlying topology. An advantage of topology is that it is stable under deformations and thus insensitive to errors introduced in the sampling.

A combinatorial construction such as the alpha complex or the Cˇ ech complex converts the discrete data into an object for which it is possible to compute the topology. However, it is quickly apparent that such a construction and its calculated topology depend on the scale at which one considers the data. A multiscale solution to this problem is the technique of persistent homology. It quantifies the persistence of topological features as the scale changes. Persistent homology is useful for visualization, feature detection and object recognition. It has been successfully applied to analyze natural images, neurological data, gene-chip data, protein binding and sensor networks.

Although Geometric Statistics and Computational Topology have a disparate appearance and seem to have different objectives, it has recently been noticed that they share a commonality through statistical sampling. In particular it has been noticed that the metric distance of persistent homology in Computational Topology, is intimately related to the sup-norm metric between the underlying density that generates a random sample on a Riemannian manifold, and its statistical estimator. Consequently, the qualitative and quantitative data analyses are intimately linked, which is not surprising because of the close connection between geometry and topology traditionally.

The use of geometric and topological methods for statistical data analysis is currently being pursued in the three allied fields of computer science, mathematics and statistics. Although each field has their own particular approach and questions of interest, the amount of similarity is striking and this workshop was able to synthesize all three fields together. The open problems that were considered was the development of computational and statistical algorithms and methods using aspects of geometry and topology when data over the geometric object was only available.

We can summarize the type of investigations as it pertains to the aforementioned three fields. A more detailed description is provided in the following section:

•  In computer science the pursuit naturally focused on efficient algorithms and visualization. Some specific items discussed included algorithms for the discrete approximation of the Laplacian, algorithms for approximating cut-locus, data reduction techniques, and recovery from noisy data;
• In mathematics the interest focused on certain constructions. Here such topics included zigzag persistence, Hodge theory, and recovering the topology over a random field;
• In statistics parameter estimation was the main interest and topics included bootstrapping and MCMC on manifolds, geodesic PCA, asymptotic minimaxity, conditional independence, statistical multiscale analysis and analysis over the Euclidean motion group.

Additionally, some physical applications were also discussed such as brain mapping, network analysis and biomechanics of osteoarthritis.

(Submitted on 17 Feb 2011)

Abstract: We present a geometrical method for analyzing sequential estimating procedures. It is based on the design principle of the second-order efficient sequential estimation provided in Okamoto, Amari and Takeuchi (1991). By introducing a dual conformal curvature quantity, we clarify the conditions for the covariance minimization of sequential estimators. These conditions are further elabolated for the multidimensional curved exponential family. The theoretical results are then numerically examined by using typical statistical models, von Mises-Fisher and hyperboloid models.

(Submitted on 17 Feb 2011)

Abstract: The current definition of a conditional probability distribution enables one to update probabilities only on the basis of stochastic information. This paper provides a definition for conditional probability distributions with non-stochastic information. The definition is derived as a solution of a decision theoretic problem, where the information is connected to the outcome of interest via a loss function. We shall show that the Kullback-Leibler divergence plays a central role. Some illustrations are presented.

This semester I am learning theoretical statistics, especially on testing statistical hypotheses. The main textbook is Lehmann’s classical book. But I hope you could help me get an excellent course note or other books on this topic. If you have some recommendations, please let me know. Thanks very much.

Machine Learning 大家(1)：M. I. Jordan

在我的眼里，M Jordan无疑是武林中的泰山北斗。他师出MIT，现在在berkeley坐镇一方，在附近的两所名校（加stanford）中都可以说无出其右者，stanford的Daphne Koller虽然也声名遐迩，但是和Jordan比还是有一段距离。

Jordan身兼stat和cs两个系的教授，从他身上可以看出Stat和ML的融合。

Jordan 最先专注于mixtures of experts，并迅速奠定了自己的地位，我们哈尔滨工业大学的校友徐雷跟他做博后期间，也在这个方向上沾光不少。Jordan和他的弟子在很多方面作出了开创性的成果，如spectral clustering， Graphical model和nonparametric Bayesian。现在后两者在ML领域是非常炙手可热的两个方向，可以说很大程度上是Jordan的lab一手推动的。

更难能可贵的是， Jordan不仅自己武艺高强，并且揽钱有法，教育有方，手下门徒众多且很多人成了大器，隐然成为江湖大帮派。他的弟子中有10多人任教授，个人认

总的说来，我觉得 Jordan现在做的主要还是graphical model和Bayesian learning，他去年写了一本关于graphical model的书，今年由mit press出版，应该是这个领域里程碑式的著作。3月份曾经有人答应给我一本打印本看看，因为Jordan不让他传播电子版，但后来好像没放在心上（可见美国人也不是很守信的），人不熟我也不好意思问着要，可以说是一大遗憾. 另外发现一个有趣的现象就是Jordan对hierarchical情有独钟，相当多的文章都是关于hierarchical的，所以能 hierarchical大家赶快hierarchical，否则就让他给抢了。

用我朋友话说看jordan牛不牛，看他主页下面的Past students and postdocs就知道了。

Machine Learning大家（2）：D. Koller

D. Koller是1999年美国青年科学家总统奖(PECASE)得主，IJCAI 2001 Computers and Thought Award(IJCAI计算机与思维奖，这是国际人工智能界35岁以下青年学者的最高奖)得主，2004 World Technology Award得主。

最先知道D koller是因为她得了一个大奖，2001年IJCAI计算机与思维奖。Koller因她在概率推理的理论和实践、机器学习、计算博弈论等领域的重要贡献，成为继Terry Winograd、David Marr、Tom Mitchell、Rodney Brooks等人之后的第18位获奖者。说起这个奖挺有意思的，IJCAI终身成就奖（IJCAI Award for Research Excellence），是国际人工智能界的最高荣誉; IJCAI计算机与思维奖是国际人工智能界35岁以下青年学者的最高荣誉。早期AI研究将推理置于至高无上的地位; 但是1991年牛人Rodney Brooks对推理全面否定，指出机器只能独立学习而得到了IJCAI计算机与思维奖; 但是koller却因提出了Probabilistic Relational Models 而证明机器可以推理论知而又得到了这个奖，可见世事无绝对，科学有轮回。

Koller的研究主要都集中在probabilistic graphical model，如Bayesian网络，但这玩意我没有接触过，我只看过几篇他们的markov network的文章，但看了也就看了，一点想法都没有，这滩水有点深，不是我这种非科班出身的能趟的，并且感觉难以应用到我现在这个领域中。

Machine learning 大家（3）:J. D. Lafferty

大家都知道NIPS和ICML向来都是由大大小小的山头所割据，而John Lafferty无疑是里面相当高的一座高山，这一点可从他的publication list里的NIPS和ICML数目得到明证。虽然江湖传说计算机重镇CMU现在在走向衰落，但这无碍Lafferty拥有越来越大的影响力，翻开AI兵器谱排名第一的journal of machine learning research的很多文章，我们都能发现author或者editor中赫然有Lafferty的名字。

Lafferty给人留下的最大的印象似乎是他2001年的conditional random fields，这篇文章后来被疯狂引用，广泛地应用在语言和图像处理，并随之出现了很多的变体，如Kumar的discriminative random fields等。虽然大家都知道discriminative learning好，但很久没有找到好的discriminative方法去处理这些具有丰富的contextual inxxxxation的数据，直到Lafferty的出现。

而现在Lafferty做的东西好像很杂，semi－supervised learning， kernel learning，graphical models甚至manifold learning都有涉及，可能就是像武侠里一样只要学会了九阳神功，那么其它的武功就可以一窥而知其精髓了。这里面我最喜欢的是semi－ supervised learning，因为随着要处理的数据越来越多，进行全部label过于困难，而完全unsupervised的方法又让人不太放心，在这种情况下 semi－supervised learning就成了最好的。这没有一个比较清晰的认识，不过这也给了江湖后辈成名的可乘之机。到现在为止，我觉得cmu的semi－ supervised是做得最好的，以前是KAMAL NIGAM做了开创性的工作，而现在Lafferty和他的弟子作出了很多总结和创新。

Lafferty 的弟子好像不是很多，并且好像都不是很有名。不过今年毕业了一个中国人，Xiaojin Zhu(上海交通大学校友)，就是做semi－supervised的那个人，现在在wisconsin-madison做assistant professor。他做了迄今为止最全面的Semi-supervised learning literature survey，大家可以从他的个人主页中找到。这人看着很憨厚，估计是很好的陶瓷对象。另外我在（1）中所说的Jordan的牛弟子D Blei今年也投奔Lafferty做博后，就足见Lafferty的牛了。

Lafferty做NLP是很好的，著名的Link Grammar Parser还有很多别的应用。其中language model在IR中应用，这方面他的另一个中国学生ChengXiang Zhai(南京大学校友，2004年美国青年科学家总统奖(PECASE)得主)，现在在uiuc做assistant professor。

Machine learning 大家（4):Peter L. Bartlett

鄙人浅薄之见，Jordan比起同在berkeley的Peter Bartlett还是要差一个层次。Bartlett主要的成就都是在learning theory方面，也就是ML最本质的东西。他的几篇开创性理论分析的论文，当然还有他的书Neural Network Learning: Theoretical Foundations。

UC Berkeley的统计系在强手如林的北美高校中一直是top3，这就足以证明其肯定是群星荟萃，而其中，Peter L. Bartlett是相当亮的一颗星。关于他的研究，我想可以从他的一本书里得到答案：Neural Network Learning: Theoretical Foundations。也就是说，他主要做的是Theoretical Foundations。基础理论虽然没有一些直接可面向应用的算法那样引人注目，但对科学的发展实际上起着更大的作用。试想vapnik要不是在VC维的理论上辛苦了这么多年，怎么可能有SVM的问世。不过阳春白雪固是高雅，但大多数人只能听懂下里巴人，所以Bartlett的文章大多只能在做理论的那个圈子里产生影响，而不能为大多数人所广泛引用。

Bartlett在最近两年做了大量的Large margin classifiers方面的工作，如其convergence rate和generalization bound等。并且很多是与jordan合作，足见两人的工作有很多相通之处。不过我发现Bartlett的大多数文章都是自己为第一作者，估计是在教育上存在问题吧，没带出特别牛的学生出来。

Bartlett的个人主页的talk里有很多值得一看的slides,如Large Margin Classifiers: Convexity and Classification；Large Margin Methods for Structured Classification: Exponentiated Gradient Algorithms。大家有兴趣的话可以去下来看看。

Machine learning 大家（5): Michael Collins

Michael Collins (http://people.csail.mit.edu/mcollins/)
自然语言处理(NLP)江湖的第一高人。出身Upenn，靠一身叫做Collins Parser的武功在江湖上展露头脚。当然除了资质好之外，其出身也帮了不少忙。早年一个叫做Mitchell P. Marcus的师傅传授了他一本葵花宝典-Penn Treebank。从此，Collins整日沉迷于此，终于练成盖世神功。

学成之后，Collins告别师傅开始闯荡江湖，投入了一个叫AT&T Labs Research的帮会，并有幸结识了Robert Schapire、Yoram Singer等众多高手。大家不要小瞧这个叫AT&T Labs Research的帮会，如果谁没有听过它的大名总该知道它的同父异母的兄弟Bell Labs吧。

言归正传，话说Collins在这里度过了3年快乐的时光。其间也奠定了其NLP江湖老大的地位。并且练就了Discriminative Reranking， Convolution Kernels，Discriminative Training Methods for Hidden Markov Models等多种绝技。然而，世事难料，怎奈由于帮会经营不善，这帮大牛又不会为帮会拼杀，终于被一脚踢开，大家如鸟兽散了。Schapire去了 Princeton， Singer 也回老家以色列了。Collins来到了MIT，成为了武林第一大帮的六袋长老，并教授一门叫做的Machine Learning Approaches for NLP(http://www.ai.mit.edu/courses/6.891-nlp/) 的功夫。虽然这一地位与其功力极不相符，但是这并没有打消Collins的积极性，通过其刻苦打拼，终于得到了一个叫Sloan Research Fellow的头衔，并于今年7月，光荣的升任7袋Associate Professor。

在其下山短短7年时间内，Collins共获得了4次世界级武道大会冠军(EMNLP2002, 2004, UAI2004, 2005)。相信年轻的他，总有一天会一统丐帮，甚至整个江湖。

看过Collins和别人合作的一篇文章，用conditional random fields 做object recogntion。还这么年轻，admire to death！

Machine learning 大家（6): Dan Roth

Dan Roth (http://l2r.cs.uiuc.edu/~danr/)
统计NLP领域的众多学者后，我得出了一个惊人的结论，就是叫Daniel的牛人特别多: 大到MT领域成名已久的Prof. Dan Melamed，小到Stanford刚刚毕业的Dan Klein，

这位老兄也是极其年轻的，Harvard博士毕业整十年，带领其团队撑起了UIUC Machine Learning以及NLP领域的一片灿烂天空。其领导开发的SNoW工具可谓是一把绝世好剑，基本达到了”又想马儿跑，又想马儿不吃草”的境界，在不损失分类精度的条件下，学习和预测速度空前。什么？你不知道SNoW？它和白雪公主有什么关系？看来我也得学学”超女”的粉丝们，来一个扫盲了: SNoW是Sparse Network of Winnows的简称，其中实现了Winnow算法，但是记住Sparse Network才是其重点，正是因为有了这块玄铁，SNoW之剑才会如此锋利。

近年来Roth也赶时髦，把触角伸向了Structured Data学习领域，但与其他人在学习的时候就试图加入结构化信息(典型的如CRF)不同，Roth主张在预测的最后阶段加入约束进行推理，这可以使的学习效率极大的提高，同时在某些应用上，还取得了更好的结果。还有就是什么Kernel学习，估计他也是学生太多，安排不下了，所以只好开疆扩土。

Harvard出身的Roth，理论功底也极其深厚，好多涉及统计学习理论的工作就不是我这种学工科的人关心的了。

个人补充一点：南京大学的一个Machine Learning的牛人网名也叫Daniel是不是跟文中的叙述有关呢，呵呵～

From HERE.

Today is the first day of Chinese new year 2011. And I also found some good articles and want to share with you:

Abstract. We introduce vector diffusion maps (VDM), a new mathematical framework for organizing and analyzing massive high dimensional data sets, images and shapes. VDM is a mathematical and algorithmic generalization of di usion maps and other non-linear dimensionality reduction methods, such as LLE, ISOMAP and Laplacian eigenmaps. While existing methods are either directly or indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel for vector elds. VDM provides tools for organizing complex data sets, embedding them in a low dimensional space, and interpolating and regressing vector elds over the data. In particular, it equips the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning setup, where the data set is distributed on (or near) a low dimensional manifold Md embedded in Rp, we prove the relation between VDM and the connection-Laplacian operator for vector elds over
the manifold.

D. K. Biss (Topology and its Applications 124 (2002) 355-371) introduced the topological fundamental group and presented some interesting basic properties of the notion. In this article we intend to extend the above notion to homotopy groups and try to prove some similar basic properties of the topological homotopy groups. We also study more on the topology of the topological homotopy groups in order to find necessary and sufficient conditions for which the topology is discrete. Moreover, we show that studying topological homotopy groups may be more useful than topological fundamental groups.

This paper describes the structure of the moduli space of holomorphic curves and constructs Gromov Witten invariants in the category of exploded manifolds. This includes defining Gromov Witten invariants relative to normal crossing divisors and proving the associated gluing theorem which involves summing relative invariants over a count of tropical curves.

This is a survey article on moduli of affine schemes equipped with an action of a reductive group. The emphasis is on examples and applications to the classification of spherical varieties.

These are lecture notes that arose from a representation theory course given by the first author to the remaining six authors in March 2004 within the framework of the Clay Mathematics Institute Research Academy for high school students, and its extended version given by the first author to MIT undergraduate math students in the Fall of 2008. The notes cover a number of standard topics in representation theory of groups, Lie algebras, and quivers, and contain many problems and exercises. They should be accessible to students with a strong background in linear algebra and a basic knowledge of abstract algebra, and may be used for an undergraduate or introductory graduate course in representation theory.

ps:In the latest version, misprints and errors were corrected and new exercises were added, in particular ones suggested by Darij Grinberg

It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets.

7, Fuzzy Logic vs Probability