You are currently browsing the monthly archive for October 2011.

Today I want to say something basic:

1, We know in Calculus, taylor expansion is extremely useful, since it’s the polynomial approximation of functions. Thus in particular, for some limits, you could always refer to the taylor expansion first and then everything will be simple.

In probability and statistics, we know that a statistic is nothing but a function of the sample. Let X_{1}, X_{2},…,X_{n} be sample points. Then a statistic could just be expressed as T_{n}=f (X_{1}, X_{2},…,X_{n} ). So if we want to discuss the asymptotic properties of the statistic, a good way is to express the statistic in the taylor expansion first. And I think we should always do like this, i.e. taylor expansion first. Then delta method and slutsky’s lemma could be involved in for you to use together with the central limit theorems, which is the foundation for the discussion of asymptotic properties.

2, Why statistics? What the difference between statistics and probability?

In the reality, everything has noises so that it is difficult for us to see the underlying principle. For statistics, it deals with the raw data to find out the simple rule covered by the noised data. Thus if you want to find out the relationship between the heights and weights of humans, why use regression method? That is because we regard the data we got are noised, we should not just use all the data points to find out the precise curve through every data point. That curve does not make any sense in reality. We should think of the different heights of some fixed weight as the noised data, and we want to use statistics to find out the simple relationship between these two variables for kind of prediction. Therefore, simple precise mathematics+noise will be statistics. How to model the noise, this is related to the measure theory.

The difference between probability and statistics is kind of probability is mathematics and statistics kind of data management. What does it mean? I mean probability definitely belongs to mathematics, since it is just based on axioms and rules, nothing else. But statistics is just the opposite. Started with raw data, you can deal with the data anyway without any rules. Play with the data as much as you can. But what’s the connection between these two? Statistics as a function of random variables, which is controlled by the underlying unknown rules (probability distributions), could have many properties got from the analysis using probability.

I gradually find that the following things are good for you to choose if you have many choices:

- R for Statistics
- Python for scientific computing
- Gimp for graphing
- Tex for typing
- WordPress for blogging
- Gmail, google+, google sites, google reader,……
- Mendeley for managing your papers
- Delicious for discovering and collecting the web resources

All of them have a characteristic, i.e. Open Source or Free. And they all have big communities:

This is the site of the Student Seminar on Statistics at MSU:

https://sites.google.com/site/statssmsu/

I hope it is useful for you all.

“One can get into great philosophical debates on **what** **is** **randomness**. Information that we can’t compress. Information that’s unpredictable. Information that we are willing to bet on. ”

Because of the existence of uncertainty, we have to do some work to make ourselves sure about something to some extent. So we have to involve the measure structure to analyze what we are concerned. I think the analysis of the difference between the function and random variable is a good way to get the idea why we want to involve the randomness.

The random variable is nothing but a function with some measure structure sitting behind. For example, there is a function on the real line: y=f(x)=x. If we have the measure structure on the real line, for example, point mass on 2, then y could only take 2 almost surely. That is to say, the uncertainty on x leads to the uncertainty for the prediction for y. However, people always want to find out something which are with 100% or 99% certainty among the uncertain information. And you probably know the 3 sigma rule in the normal model, that is the variance could help a lot to make you more sure about something. And now what do you think of the SLLN and WLLN?

## Recent Comments