This page is for the collection of useful posts from others
20110205:
Aaron:
Suresh
 POTD: Reproducing Kernel Banach Spaces with the ℓ1 Norm
 Sample Complexity for epsapproximations of Range Spaces
 IMA Special Year on the mathematics of information
Terry
Djalil
 Concentration for empirical spectral distributions
 Computational Rough Paths [CoRoPa]
 Singular values and rows distances
 The MarchenkoPastur law
Alex
Frank
Andrew
Meena
Dick
John
Vladimir
Machine Vision 4 Users
From David’s Twitter stream:
 Want a deeper analysis of the Intel Sandy Bridge chipset bug?
 Multispectral Imaging At Up To 1 Billion Frames Per Second
Gonzalo VazquezVilar
Hal Daume III
Alex
Andrew Gelman
David Brady
Greg: MIT IAP ’11 radar course SAR example, imaging with coffee cans, wood, and the audio input from your laptop
Gustavo (Greg’s student): Cleaner Ranging and Multiple Targets
Anand: Privacy and entropy (needs improvement)
Vladimir (ISW): Albert Theuwissen Reports from EI 2011 – Part 1 and 3D Sensing Forum at ISSCC 2011
Suresh
Jordan: Compressed sensing, compressed MDS, compressed clustering, and my talk tomorrow
Bob: Paper of the Day (Po’D): Performance Limits of Matching Pursuit Algorithms Edition
Terry: An introduction to measure theory
Arxiv blog: The Nuclear Camera Designed to Spot Hidden Radiation Sources
Meena:
 White matter fiber analysis: why we need statistical summaries (means)
 Tractbased quantitative white matter fiber analysis: Mathematical frameworks
 Quantitative white matter fiber analysis: a short history (Part III)
 Quantitative white matter fiber analysis: a short history (Part II)
 Quantitative white matter fiber analysis: a short history (Part I)
Bob:
 Tuning frequency determination
 Experiments in tuning frequency determination
 “I’m a math person.” echoes my “I am a safety stewart” 🙂 Safety is our job Number…errr… make that Number 6
 Conference craze 2011
 Some distributions of distances in highdimensional musical spaces
 Digging deeper into minimum distances in highdimensional musical spaces, pt. 2
 Digging deeper into minimum distances in highdimensional musical spaces, pt. 1
 Paper of the Day (Po’D): Music Cover Song Identification Edition, pt. 5
 Paper of the Day (Po’D): Clustering Beatchroma Patterns in Music Databases Edition
Sarah:
John: Accelerated learning
ISW: Aptina Demos Wafer Level Camera Technology
Alex:
 The operator norm and the decomposition of matrices in positive and negative parts,
 Moments of a product of random variables,
 Integrality of a sum, Stair partition problem
Greg: Paper posted to IEEE explorer: An Ultrawideband (UWB) SwitchedAntennaArray Radar Imaging System
Suresh: All FOCS talks are online
Arthur: Generating a quasi Poisson distribution, version 2
Franck: Polynomial Learning of Distribution Families
Tara N. Sainath wrote in the SLTC Newsletter, November 2010 on Sparse
 Piotr: Geometry @ Barriers
 Image Sensors World: Caeleste on XRay Photon Counting Sensors
 Andrew: Data Stream Algorithms slides
 Bob: Postdoc opportunity at INRIA
 Frank: Entropy of exponential families
 Terry: A first draft of a nontechnical article on universality
 Hal: Manifold Assumption versus Margin Assumption
Bob Sturm wrote about the recent Probabilistic Matching Pursuit algorithm featured here recently in :
 Some Experiments with Probabilistic Orthogonal Matching Pursuit
 Paper of the Day (Po’D): The Other Probabilistic Matching Pursuits Edition
I mentioned Random Matrix Theory a while back, Terry Tao has some news results that he explains in Random matrices: Localization of the eigenvalues and the necessity of four moments. He makes a reference to the book An Introduction to Random Matrices by Greg Anderson, Alice Guionnet and Ofer Zeitouni. Of related interest:
 Statistical Mechanics and Random Matrices by Alice Guionnet and
 Mean Field Models for Spin Glasses by Michel Talagrand.
Djalil Chafai:
Terry Tao
Gonzalo Vazquez Vilar
John Langford
Djalil Chafai
Sofia Dahl and Bob Sturm in
 Séminaire à Paris
 Paper of the Day (Po’D): Transients Detection Edition
 A Funny Thing Happened on the Way to the Computer
 Meeting reports: Sonification and urban soundscapes in Stockholm
Alex Gittens in
I also found the following noteworthy papers, enjoy!
 First, I agree with Andrew, this essay by Mandelbrot is fascinating
A maverick’s apprenticeship. The Wolf Prize for Physics. Edited by David Thouless. Singapore: World Scientific, 2004. [ PDF (154.4 KB) ]
 Alex: Find a generating function for the Stirling partition numbers and Random matrix sparsification, comparison of current results
 Djalil: Back to basics: total variation distance
 Bob: CFP: 8th Sound and Music Computing (SMC) Conference 2011, and Sound Quality Seminar, Papers of the Day (Po’D): Finding or Not Finding Rules in Time Series Edition
 Muthu: Romance Leads to Insights
 Dick: Strong Codes For Weak Channels
 Gregory: 2011 MIT IAP course, build a synthetic aperture radar in 4 weeks
 Brian: Solving Resistor Networks Using Gaussian Elimination — An Illustration and Inverting A’CA
 ISW: IsInvariant Proposes New Sensor Technology (increasing dynamic range, I can see how CS would benefit from that)
 Meena: Quantitative white matter fiber analysis: a short history (Part III)
 Jason: A Simple and Computationally Efficient Sampling Approach to Covariate Adjustment for Multifactor Dimensionality Reduction Analysis of Epistasis
 Frank: Statistical manifold: Dual conjugate connections
 Arthur: Mandelbrot, fractals and counterexamples in applied probability, Margin of error, and comparing proportions in the same sample
 Gonzalo: Comments on information theory
 Terahertz Technology: ConverTec Corp. releases TeraLaz CO2 terahertz laser system
Now on to the sites to check for any news on compressive sensing, here is the (incomplete list).
 Arxiv
 Google (Compressive Sensing / Compressed Sensing) 24 hours, week, month.
 Rice University Compressive Sensing repository
Q&As
MathOverflow:
MetaOptimize:
LinkedIn:
TheoreticalCS:(not yet working)
BioStar
Friendfeed/Twitter
 Compressed Sensing and Compressive Sensing in FriendFeed
 Compressed Sensing, Compressive Sensing on Twitter.
20110206:
Fast algorithms for nonconvex compressive sensing by Rick Chartrand, LANL
20110212:
here are blogs/papers to reflect on, enjoy!:
 Dick Gordon’s blog
 Gigapixel News Journal
 Machine Vision 4 Users
 Quomodocumque
 What’s new
 Image Sensors World
 natural language processing blog
 Xi’an’s Og
 MAKE Magazine
 Freakonometrics
 The Secrets of Consulting
 Hack a Day
 Statistical Modeling, Causal Inference, and Social Science
 KinectHacks.net
 Decision Science News
 Machine Learning, etc
 Gödel’s Lost Letter and P=NP
 The Endeavour
 The Geomblog
 ChapterZero
 Mr. Vacuum Tube
 the polylogblog
 Terahertz Technology
 Epistasis Blog
 Brain Windows
 Harvest Imaging Blog
 An Ergodic Walk
 Collective for Research in Interaction, Sound, and Signal Processing
 CyberGi
 my slice of pizza
 Blog: La vertu d’un la – the virtue of an A, a fortunate hive
 Libres pensées d’un mathématicien ordinaire
 Electron&Holes twitter stream
 Olivier Grisel Twitter stream
 Twitter list of people interested in compressive sensing.
Back in December, I asked What was the most interesting paper on Compressive Sensing you read in 2010 ? Here is a compilation of y’alls answers:
 T. T. Cai, L. Wang and G. Xu, “New bounds for restricted isometry constants,” IEEE Trans. Inf. Theory, vol. 59(6), pp. 4388 – 4394, Sept., 2010.
 E.J. Candes and M.B. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Processing Magazine, vol. 25, Mar. 2008, pp. 2130.
 M. Mishali, Y.C. Eldar, O. Dounaevsky, E. Shoshan, “Xampling: Analog to Digital at SubNyquist Rates”, CCIT Report #751 Oct09, EE Pub No. 1708, EE Dept., Technion – Israel Institute of Technology,
 “A probabilistic and RIPless theory of compressed sensing” by Emmanuel Candes and Yaniv Plan
 J. T. O’Brien and W. P. Kamp and G. M. Hoover, Signbit amplitude recovery with applications to seismic data, Geophysics, 1982
 Challenging Restricted Isometry Constants with Greedy Pursuit, with Peyre, G., and Fadili, J., Proc. of ITW’09, pp.475479, 2009. ISBN: 9781424449828.
 Mark Davenport, Jason Laska, Petros Boufounos, and Richard Baraniuk, A simple proof that random matrices are democratic. (Rice University ECE Department Technical Report TREE0906, November 2009)
 T. Blumensath, M. E. Davies, Iterative hard thresholding for compressed sensing. (Preprint, 2008)
 Boufounos P. T., “Universal RateEfficient Scalar Quantization“
 “Dequantizing Compressed Sensing: When Oversampling and NonGaussian Constraints Combine.“
 Real versus complex null space properties for sparse vector recovery
 Davenport, M.A.; Boufounos, P.T.; Wakin, M.B.; Baraniuk, R.G.; , “Signal Processing With Compressive Measurements,” Selected Topics in Signal Processing, IEEE Journal of , vol.4, no.2, pp.445460, April 2010.
Recent entries I’ll probably be rereading include:
 The Dip
 Reading the DonohoTanner Diagram
 Compressive Sensing Landscape version 0.2
 “…I found this idea of CS sketchy,…”
 Islands of Knowledge
 CS: Just throw away your lenses .. but not before you perform some calibration.
 CS: “..how come your browser can’t read JPEG2000 ?..” , Q&As and some papers
 CS: Teaching Compressed Sensing (Part 1)
 CS: SMALL Workshop posters and Videos of the Talks
 CS: SMALL Workshop slides
 CS: Low Rank Compressive Spectral Imaging and a multishot CASSI
 NIPS videos
 Compressive autoindexing in femtosecond nanocrystallography

CS: The Long Post of the Week
 Infinity Matters: Generalized Sampling and Infinite Dimensional Compressed Sensing

CS: Calibration for Ultrasound Breast Tomography Using Matrix Completion
20110222
“…I found this idea of CS sketchy,…”
20110301
CS: Would you like that entry Supersized? there are dozens of articles on compressive sensing
20110302:
Open Source Software for iPad and iPhone
20110307:
 There’s a wonderful interview at the Notices with last year’s Abel Prize winner John Tate (video here). He blames the fact that his name is on so many mathematical results and concepts on Serge Lang. The 2011 Abel Prize winner will be announced on March 23rd.
 Sir Michael Atiyah’s February 1 talk at the College de France titled A Geometer Explores the Universe is now online.

Matthew Emerton posts really good answers
2011330:
Compressed Sensing: the L1 norm finds sparse solutions
201143:
Socalled Bayesian hypothesis testing is just as bad as regular hypothesis testing
2011417:
videolectures
Physics
 Harvard Physics: Quantum Field Theory by Sidney Coleman – 50 videos
 University of New Mexico: Physics 524 Quantum Field Theory II 27 videos
 University of New Mexico: Physics 521 Quantum Mechanics – 32 videos
 UCSD Quantum Physics 130A, 130B, 130C ~ 25 videos each
 University of South Carolina PHYS 729 – Applied Group Theory – 22 Videos, The Foundations of Theoretical Physics Using Lie Groups & Algebras
 Florida Atlantic University: PHY 6938 General Relativity — Fall 2007 – 28 videos
 Brookhaven National Laboratory Streaming Video: Cosmology for Beginners 5 videos
 MIT OpenCourseWare  Physics  Video Lectures – Physics I: Classical Mechanics, 8.02 E & M, 8.03 Vibrations and Waves, 8.224 GR & Astrophysics
 Oregon State University – Physics 464/564, Computational Physics – 23 videos, based on “A Survey of Computational Physics”, Landau, Paez, Bordeianu
 Cambridge University Video – Thermodynamics and Phase Diagrams with Harry Bhadeshia – 7 videos
 University of New Mexico: Prof. Ivan H. Deutsch, Short Course in Quantum Information 8 videos
 The Vega Science Trust – Astrophysical Chemistry by Harry Kroto – 8 videos
 CERN: Introduction to String Theory – W. Lerche – 4 videos
 CERN: String Theory – Johnson, C. (University of Southern California) – 5 videos
 CERN: String Theory for Pedestrians – Zwiebach, B. (MIT) 3 videos, author of “A First Course in String Theory”
 CERN Short Courses in Particle Physics – Accelerators, Detectors, Bubble Chambers, Feynman Diagrams, etc.
Mathematics
 Stanford EE364a: Optimization Lecture Videos
 Stanford EE263: Linear Dynamical Systems Lecture Videos
 MIT Courseware: Godel, Escher, Bach: A Mental Space Odyssey
 Constraint Programming Summer School 2007
 University of Colorado at Colorado Springs UCCS – Mathematics Video Courses – Requires free registration.. lots of courses
 UCCS Math 432 Modern Analysis II  Spring 2008
 UCCS Math 311 Number Theory  Spring 2008
 UCCS Math 535 Applied Functional Analysis  Spring 2006
 Texas A&M University – Math 614 Dynamical Systems and Chaos
 MIT OpenCourseWare  Mathematics  Video Lectures– 18.03 Differential Equations, 18.06 Linear Algebra, 18.085 Computational Science and Engineering I, 18.086 Mathematical Methods for Engineers II
Computer Science & Engineering
 Information Retrieval / Web Crawling Course – University of Freiburg
 Advanced Topics in Algorithms and Datastructures 2006 – University of Freiburg
 University of Freiburg – Advanced Topics in Algorithms and Datastructures 2005: Parallel Algorithms
 MIT Structure and Interpretation of Computer Programs, Video Lectures
 CS 251: Intermediate Software Design with C++ – Vanderbilt University
 MIT OpenCourseWare  Electrical Engineering and Computer Science  6.046J Introduction to Algorithms (SMA 5503), Fall 2005  Lecture Notes
 Algorithms Video Lectures from ArsDigita University
 Theory of Computation Video Lectures from ArsDigita University
 University of Washington CSE 582: Compilers
 University of Washington CSE P505: Programming Languages
 nanoHUB – Scientific Computing with Python
 CSE567M: Computer Systems Analysis (2006) – Washington University in St Louis Comparing systems using measurement, simulation, and queueing models
 NJIT Distance Learning Class Videos for CS 631 Data Management System Design
 NJIT Distance Learning Class Videos for CIS 375_602 Applications Development and Java
 NJIT Distance Learning Class Videos for CS 630 Operating Systems
 Wireless Sensor Networks – University of Freiburg – 2006
 UC Santa Cruz CMPE 118 – Introduction to Mechatronics
 RPI – ECSE6961: Fundamentals of Wireless Broadband Networks. Spring 2007.
Machine Learning
 UC Berkeley Machine Learning Workshop 11 lectures
 CS 281A / Stat 241A: Statistical Learning Theory
 U Washington Machine Learning Videos
 University of Freiburg – Advanced AI Techniques – Reinforcement Learning, NLP, Bayesian Networks
Neuroscience & Biology
 Graduate Summer School: Probabilistic Models of Cognition: The Mathematics of Mind
 UCSD: Quantitative Molecular Biology – Physics 172/272
 University of Illinois at UrbanaChampaign – NSF Biophysics Summer School Lectures
 nanoHUB – Resources > Courses
 ITP Program on Dynamics of Neural Networks– Dynamics of Neural Networks: From Biophysics to Behavior
 Harvard School of Public Health: Bioinformatics Core
 UC Berkeley Webcasts  Video and Podcasts: MCB 130 Cell Biology
 UC Berkeley Webcasts  Video and Podcasts: MCB 110: General Biochemistry and Molecular Biology
 Univeristy of South Carolina – Microbiology and Immunology – Streaming Video
 Univeristy of South Carolina – Microbology Video Index
Finance and Econometrics
 University of Toronto ACT 460 / STA2502 – Stochastic Methods for Actuarial Science – S. Jaimungal, Department of Statistics and Mathematical Finance Program
 Economics 421 – Econometrics– Mark Thoma: Department of Economics, University of Oregon
 Course Video Lectures: Latent Variable Analysis Professor Bengt Muthén of the UCLA Graduate School of Education & Information Studies
 INFO 747 – Social and Economic Data – Cornell Record Linkage Course Lecture Videos Prof. John M. Abowd
 UC Berkeley Webcasts: Econometrics 244 – Discrete Choice Methods with Simulation
Seminars, Talks, and Conference Videos:
See http://del.icio.us/pskomoroch/talk+video for more links…
Physics
 View Past Public Lectures – Perimeter Institute for Theoretical Physics
 African Summer Theory Institute (ASTI): Online Lectures
 Rutgers Physics: NHETC video seminars
 UW Math: Milliman Lectures Archive
 The Vega Science Trust – Richard Feynman Videos
 Kavli Institute for Theoretical Physics (KITP) Online Conferences, Lectures and Seminars
Mathematics
 MSRI Video Archive
 Duke University Mathematics Department Video Archive
 Michigan State University Math Department – Video Lectures
Computer Science & Engineering
Machine Learning
 DeepLearningWorkshopNIPS2007 < Public < TWiki
 NIPS : Conferences : 2006 : Program : NIPS 2006 Schedule
 NIPS : Conferences : 2006 : Media : NIPS 2006 Media
 NIPS : Conferences : 2005 : Tutorial Videos
 NATO Advanced Study Institute on Mining Massive Data Sets for Security
Neuroscience & Biology
 UC Irvine International Imaging Genetics Conference
 Hebrew University of Jerusalem: Heller Lecture Series in Computational Neuroscience
 NIH VideoCasting: Past Events
 U Texas. Colection of Online Neuroscience Lectures
 Internet Archive Search: 2007+brain+network+dynamics
 Conference on Brain Network Dynamics 2007 – University of California Berkeley
 nanoHUB – Resources > Online Presentations
 Mathematical Biosciences Institute: Workshop on Biophysics and Mathematical Models of Calcium Channels
Finance and Economics
 International Tax Lecture Series – University of Connecticut School of Law
 Daniel Kahneman – Nobel Prize Lecture: Maps of Bounded Rationality
Open Courseware Directories and Other Video Lecture Roundup Posts
 Berkeley Course Webcasts
 MIT OpenCourseWare Videos
 Stanford University Lecture Videos
 Open Yale Courses
 VideoLectures – exchange ideas & share knowledge
 Free Science and Video Lectures Online!
 Lecturefox: free university lectures – computer science, mathematics, physics
 Business Intelligence, Data Mining & Machine Learning: Machine Learning OnLine Lectures – Machine Learning OnLine Lectures
 Yet Another Machine Learning Blog » Machine learning videos [Pierre Dangauthier]
 obousquet – ML Videos – Online videos of talks or lectures about Machine Learning related topics
Ways to prove the fundamental theorem of algebra
2011527:
Distinguished and Plenary Talks
 Rothschild Lecture, Isaac Newton Institute for Mathematical Sciences, Cambridge, March 28, 2011
 Albert Einstein Memorial Lecture, Israel Academy of Sciences and Humanities, Jerusalem, March 14, 2011
 Fields Institute Distinguished Lectures, Toronto, September 1416, 2010
 Distinguished Lecture, University of Rochester, October 30, 2009
 Levi L. Conant Lecture, Worcester Polytechnic Institute, September 24, 2009
 Sackler Distinguished Lectures in Mathematics, Tel Aviv University, March 913, 2009
 Distinguished Lecture, Rutgers University, December 4, 2008
 Distinguished Lecture Colloquium, PennState, November 19, 2008
 Asprey Distinguished Lecture Series, Vassar College, March 23, 2008
 Toyota Technological Institute of Chicago Distinguished Lecture Series, March 6, 2008
 UCLA Mathematics Department Distinguished Lecture Series, January 9,10 and 11, 2008
 Gibbs Lecture, Joint AMSMAA Meeting, San Diego, January 6, 2008
 CISE Distinguished Lecture at NSF, Washington, DC, September 27, 2007
 Keynote lecture at FCRC,San Diego, CA, June 13, 2007
 KAM Mathematical Colloquim, Prague, Czech Republic, April 27, 2007
 Distinguished Lecture Series, University of Haifa, February 27 – March 1, 2007
 Louis Clark Vanuxem Lectures, Princeton University, February 13, 14 and 15, 2007
 Distinguished Lecture Series, University of Wisconsin, Madison, October 18, 2006
 IEEE Conference on Computational Complexity, Prague, Czech Republic, July 1620, 2006
 Horizons of Truth Goedel Centenary 2006, University of Vienna, April 29, 2006
 Radcliff Institute for Advanced Study Science Lecture Series, October 9, 2003
The Institute of Advanced Studies’ Women and Mathematics series of lectures and seminars featured the following interesting presentations this year:
 Rebecca Willett‘s 5/17 lecture 1 Methods for sparse analysis of highdimensional data, I
 Rebecca Willett‘s 5/18 lecture 2 Sparsity: Correcting Error in Data
 Rebecca Willett‘s 5/19 lecture 3 Sparsity: Compressed Sensing
 Rebecca Willett‘s 5/20 lecture 4 Sparsity: Generalized Sparsity Measures and Applications
 Sofya Raskhodnikova‘s 5/17 lecture 1 SublinearTime Algorithms
 Sofya Raskhodnikova‘s 5/18 lecture 2 SublinearTime Algorithms
 Sofya Raskhodnikova‘s 5/19 lecture 3 SublinearTime Algorithms
 Sofya Raskhodnikova‘s 5/20 lecture 4 SublinearTime Algorithms
 Rachel Ward‘s 5/24 lecture 1 Methods for sparse analysis of highdimensional data, II
 Anna Gilbert‘s 5/24 lecture 1 Background on sparse approximation
 Anna Gilbert‘s 5/25 lecture 2 Hardness results for sparse approximation problems
 Anna Gilbert‘s 5/26 lecture 3 Dictionary geometry, greedy algorithms, and convex relaxation
 Peter Sarnak Mobius Function Lecture Three Lectures on the Mobius Function Randomness and Dynamics
 Peter Sarnak Integral Apollonian Packings
2011529:
Geometric Tools for Identifying Structure in Large Social and Information Networks
201163:
Math
Elementary Applied Topology draft textbook
Introduction to category theory
Mathematical model of walking
Statistics and machine learning
Machine learning demos
On the accuracy of statistical procedures in Excel 2007
R reference card for data mining
Wisdom of statistically manipulated crowds
2011622:
Videos of talks by Friedman and Macintyre
2011623:
Deviance, DIC, AIC, crossvalidation, etc
The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing
############################################################################
2012219——2012226：
 Active Bayesian Optimization
 There are several videos from the meeting on the Group Testing Designs, Algorithms, and Applications to Biology IMA meeting. Enjoy!
 Emergence of MCMC Bayesian Computation
 Two Interesting Short Volumes on the (Graph) Laplacian
 Socalled Bayesian hypothesis testing is just as bad as regular hypothesis testing
 Prediction: the Lasso vs. just using the top 10 predictors
 Getting Genetics Done: Golden Helix: A Hitchhiker’s Guide to Next Generation Sequencing
 Stanford Unsupervised Feature Learning and Deep Learning Tutorial
 What does a compressive sensing approach bring to the table ?
 What is Mahalanobis distance?
 Large scale SVM (support vector machine)
 Abstractions
 Monkeying with Bayes’ theorem
 Coming to agreement on philosophy of statistics
 probit posterior mean
 GraphLab v2 @ Big Learning Workshop
 Basic Introduction to ggplot2
 Bayesian statistics made simple
 Courses in CS this spring
 A Numerical Tour of Signal Processing
 Reading List for Feb and March 2012 This is about the materials on concentration and geometric techniques used in compressed sensing.
 simulated annealing for Sudokus
 Djalil talks about A random walk on the unitary group, Brownian Motion and From seductive theory to concrete applications (which got Nuit Blanche thinking about writing this entry: Whose heart doesn’t sink at the thought of Dirac being inferior to Theora ?)
 Lectures on Gaussian approximations with Malliavin calculus
 Useful R snippets
 Special Section: Minimax Shrinkage Estimation: A Tribute to Charles Stein
 Excellent Papers for 2011
 Creating a designer’s CV in LaTeX
 Is NGS the Answer?
 Sequence Analysis Methods Not Just for Sequence Data
 DNA Variant Analysis of Complete Genomics’ NextGeneration Sequencing Data
 Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process
 Best Written Paper
 Online SVD/PCA resources
 Probabilistic Topic Models
 Social Network Analysis with R
 Publicly available large data sets for database research
 Around the blogs in 80 hours and Random Thoughts (some are about sequencing data)
 Change margins of a single page (latex)
 Bootstrap example
 Exciting News on Three Dimensional Manifolds
 Dr. Perou on Next Generation Sequencing Technology
 analyzingcomplexplantgenomeswiththenewestnextgenerationdnasequencingtechniques
 RNASeq Methods & March Twitter Roundup
 Introduction to Statistical Thought
 An R programmer looks at Julia
 The slides and video can help you get a flavor of the language Julia.
 Why and How People Use R
 Wang, Landau, Markov, and others…
 Linear mixed models in R
 Least Absolute Gradient Selector: Statistical Regression via PseudoHard Thresholding
 Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
 C++ at Facebook
 Calling C++ from R
 C++ Renaissance
 Why haven’t we cured cancer yet? (Revisited): Personalized medicine versus evolution
 Getting ppt figures into LaTeX
 Latex Allergy Cured by knitr
 Melbourne R Users
 sixty twominute r twotorials
 LDA explained
 Counting the total number of…
 Significance Test for Kendall’s Taub
 dimension reduction in ABC [a review’s review]
 9 essential LaTeX packages everyone should use
 Linguistic Notation Inside of R Plots! about knitr
 knitr Elegant, flexible and fast dynamic report generation with R
 knitr Performance ReportAttempt 1
 knitr Performance ReportAttempt 2
 Question: Why you need perl/python if you know R/Shell [NGS data analysis]
 SPAMS (SPArse Modeling Software) now with Python and R
 Largescale Inference and empirical Bayes, they are related with multiple testing
 My setup about some softwares and editors
 Fancy HTML5 Slides with knitr and pandoc
 John talks about Random is as random does
 MCMC at ICMS (1)
 MCMC at ICMS (2)
 MCMC at ICMS (3)
 John Cook: Why and How People Use R
 An Introduction to 6 Machine Learning Models
 Machine Learning: Algorithms that Produce Clusters
 Dirichlet Process for dummies
 A Really Nice Talk About PDE, Numerics (and Pyramids)
 Analysis of Boolean Functions
 Nextgeneration genome sequencers compared
 why noninformative priors?
 Data Scientists Get Ranked
 90+ TwoMinute Videos on R
 Turing Centennial Celebration – Day 1
 Turing Centennial Celebration – Day 2
 Turing Centennial Celebration – Day 3
 Online resources for handling big data and parallel computing in R
 Source RScript from Dropbox
 Excel in Statistics and Operations Research
 Dynamic Content with RStudio, Markdown, and Marked.
 Five minute guide to LaTeX
 Interactive reports in R with knitr and RStudio
 What Programming language are they using ?
 Generating reports for different data sets using brew and knitr
 Reproducible research with markdown, knitr and pandoc
 Getting Started with R Markdown, knitr, and Rstudio 0.96
 My experiences with Rcpp
 A Personal Perspective on Machine Learning
 The differing perspectives of statistics and machine learning
 Kernel Methods and Support Vector Machines deMystified
 I love this article in the WSJ about the crisis at JP Morgan. The key point it highlights is that looking only at the highlevel analysis and summaries can be misleading, you have to look at the raw data to see the potential problems. As data become more complex, I think its critical we stay in touch with the raw data, regardless of discipline. At least if I miss something in the raw data I don’t lose a couple billion. Spotted by Leonid K.
 On the other hand, this article in the Times drives me a little bonkers. It makes it sound like there is one mathematical model that will solve the obesity epidemic. Lines like this are ridiculous: “Because to do this experimentally would take years. You could find out much more quickly if you did the math.” The obesity epidemic is due to a complex interplay of cultural, sociological, economic, and policy factors. The idea you could “figure it out” with a set of simple equations is laughable. If you check out their model this is clearly not the answer to the obesity epidemic. Just another example of why statistics is not math. If you don’t want to hopelessly oversimplify the problem, you need careful data collection, analysis, and interpretation. For a broader look at this problem, check out this article on Science vs. PR. Via Andrew J.
 Some cool applications of the raster package in R. This kind of thing is fun for student projects because analyzing images leads to results that are easy to interpret/visualize.
 Check out John C.’s really fascinating post on determining when a whitecollar worker is great. Inspired by Roger’s post on knowing when someone is good at data analysis.
 knitR Performance Report 3 (really with knitr) and dprint
 Unix doesn’t follow the Unix philosophy
 Advice on writing research articles
 knitr Performance Report–Attempt 3
 Permutation tests in R
 Understanding Bayesian Statistics – By MichaelPaul Agapow
 knitr, Slideshows, and Dropbox
 Generate LaTeX tables from CSV files (Excel)
 The Tomato Genome
 Optimization
 Sichuan Agricultural University and LC Sciences Uncover the Epigenetics of Obesity
 How to Stay Current in Bioinformatics/Genomics
 Interactive HTML presentation with R, googleVis, knitr, pandoc and slidy
 The RPodcast Episode 7: Best Practices for Workflow Management
 What is the point of statistics and operations research?
 Question: C/C++ libraries for bioinformatics?
 5 Hidden Skills for Big Data Scientists
 Protocol – Computational Analysis of RNASeq
201263–
 An easy way to think about priors on linear regression
 Combining priors and downweighting in linear regression
 Metropolis Hastings MCMC when the proposal and target have differing support
 Slidify: Things are coming together fast
 How to Convert Sweave LaTeX to knitr R Markdown: Winter Olympic Medals Example
 Testing R Markdown with R Studio and posting it on RPubs.com
 Announcing The R markdown Package
 Announcing RPubs: A New Web Publishing Service for R
 Approximate Bayesian computation
 Load Packages Automatically in RStudio
 Practical advice for machine learning: bias, variance and what to do next
 The overview article on “Approximate Computation and Implicit Regularization for Very Largescale Data Analysis” associated with the invited talk at the upcoming PODS 2012 meeting is on the arXiv here.
 The monograph on “Randomized Algorithms for Matrices and Data” is available in NOW’s “Foundations and Trends in Machine Learning” series here, and it is also available on the arXiv here.
 Click here for information (including the slides and video!) on the Tutorial on “Geometric Tools for Identifying Structure in Large Social and Information Networks,” given originally at ICML10 and KDD10 and subsequently at many other places. (The slides are also linked to below.)
 The overview chapter on “Algorithmic and Statistical Perspectives on LargeScale Data Analysis” is finally on the arXiv here; the book in which it will appear is in press; and a video of the associated talk is here.
 Recent teaching: Fall 2009: CS369M: Algorithms for Massive Data Set Analysis
 Confidence distributions
 Making a singular matrix nonsingular
 Statistics Versus Machine Learning
 How to post R code on WordPress blogs
 Causation
 Pro Tips for Grad Students in Statistics/Biostatistics (Part 1)
 Pro Tips for Grad Students in Statistics/Biostatistics (Part 2)
 Why You Shouldn’t Conclude “No Effect” from Statistically Insignificant Slopes
 For those interested in knitr with Rmarkdown to beamer slides
 Notes from A Recent Spatial R Class I Gave
 Sparse Bayesian Methods for LowRank Matrix Estimation and Bayesian GroupSparse Modeling and Variational Inference – implementation
 The Battle of the Bayes
 Ockham Workshop, Day 1
 Ockham Workshop, Day 2
 Ockham Workshop, Day 3
 Ockham’s Razor
 Occam
 Simplicity is hard to sell
 SelfRepairing Bayesian Inference
 Praxis and Ideology in Bayesian Data Analysis
 Inconsistent Bayesian inference
 Big Data Generalized Linear Models with Revolution R Enterprise
 Quants, Models, and the Blame Game
 Fun with the googleVis Package for R
 Topological Data Analysis
 The Winners of the LaTeX and Graphics Contest
 Is Machine Learning Losing Impact?
 Machine Learning Doesn’t Matter?
 Components of Statistical Thinking and Implications for Instruction and Assessment
 XiaoLi Meng and Xianchao Xie rethink asymptotics
 Higgs boson and five sigma
 What is the Statistics Department 25 Years From Now?
 Statistics: Your chance for happiness (or misery)
 Manifolds: motivation and definition
 Why Emacs is important to me? : ESS and orgmode
 Interesting Emacs linkfest
 Devs Love Bacon: Everything you need to know about Machine Learning in 30 minutes or less
 Visualizing Galois Fields
 Visualizing Galois Fields (Followup)
 Statistical Reasoning on iTunes U
 Computing log gamma differences
 Where to start if you’re going to revise statistics
 Power laws and the generalized CLT
 Open problems in nextgen sequence analysis
 More equations, less citations?
 Talk: Some Introductory Remarks on Bayesian Inference
2012/7/16—2012/8/12:
 Getting Started with the WordPress Competition
 Simple Made Easy
 An Education Tsunami—Will online courses destroy universities?
 Universities Reshaping Education on the Web
 Explanation or Prediction? An Amazing Quote from Phil Schrodt
 Should you apply PCA to your data?
 Which classifiers are fast enough for exploring mediumsized data?
 Quick classifiers for exploring mediumsized data (redux)
 Is C++ worth it?
 Unbiased estimators can be terrible
 Things You Should Never Do, Part I
 The Joel Test: 12 Steps to Better Code
 Methodologists’ Audience
 Bayesian Methodology in the Genetic Age
 Interview with Michael Hammel, author of The Artist’s Guide to GIMP
 Being Happy in Grad School
 10 Fresh Tips for Finding Time to Blog
 A Quick Guide to Using Tumblr for Business
 Statistics Done Wrong
 Top N Reasons To Do A Ph.D. or PostDoc in Bioinformatics/Computational Biology
 Interview(s) with Vladimir Voevodsky with an introduction on motivic homotopy along with the video and transcript.
 Are there examples of nonorientable manifolds in nature?
 Kolmogorov Complexity – A Primer
 Adventures at My First JSM (Joint Statistical Meetings) #JSM2012
 Yes, I was hacked. Hard.
 Does Julia have any hope of sticking in the statistical community?
 How Genome Sequencing is Revolutionizing Clinical Diagnostics, from the ISMB Conference
 Advice for an Undergraduate
 4 things you should know about choosing examiners for your thesis
 The long tail of free online education : The author also plans to teach a class on graph partitioning, expander graphs, and random walks online in Winter 2013.
 Teaching the World to Search
 Beyond Pinterest and Instagram – ten visual social networks that should be on your radar
 Making Ubuntu 12.04 useable
 Basic Understanding of Compressed Sensing
2012/8/13—2012/9/23:
 Towards Better PDF Management with the Filesystem
 What is life like for PhDs in computer science who go into industry?
 Online REPL for 17 programming languages
 Logistic regression vs. multiple regression—–Many statisticians seem to advise the use of logistic regression over multiple regression by invoking this logic: “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” While we can understand this argument, our feeling is that, in the applied fields we toil in, that argument is not a very practical one. In fact a seasoned statistics professor we know says (in effect): “What’s the big deal? If multiple regression yields any predicted values less than 0, consider them 0. If multiple regression yields any values greater than 1, consider them 1. End of story.” We agree.
 Scientific Python
 An everyday essential: the timer+My personal productivity rules
 Bill Thurston—by Terrace Tao; Bill Thurston, 19462012—by Peter Woit; Bill Thurston 19462012—by David Speyer.
 Surviving a PhD: 10 top tips that shows how to survive your PhD
 How different PhD’s work:Differences and similarities between departments about PhD process
 Countdown Begins: Countdown starts for submission of the thesis
 PhD Life is Wonderful:Doing PhD at Warwick University is a wonderful experience
 Too Many Emails In Your Inbox: Use Outlook folders to manage your emails
 Introduction to REX Facility: Videos for introducing Wolfson Research Exchange and its facilities
 Power of Supervisors: Control,inner happiness and optimisim
 Unorthodox Tools of a Researcher: Reflection and examples of unorthodox tools that helps you PhD period
 Homesickness and Culture Clashes: Homesickness of international students and cultural differences
 Choosing Your PhD Examiners: Tips for choosing the relevant examiners for PhD Viva
 Effective Research Tools: Examples of useful research tools
 PhD,Risks and Murphy’s Law: “Anything that can go wrong will go wrong” according to Murphy’s Law
 Will Data Scientists Be Replaced by Tools?
 Update: TeX Writer for iPad (+ LaTeX + AMS)
 Why physicists like models, and why biologists should
 The ENCODE project: lessons for scientific publication
 Perspectives From A Postdoc: What is a Postdoc?
 Chris Blattman gives advice on PhD students’ NSF applications
 ENCODE floods the news networks…
 Maybe mostly useful for me, but for other people with Tumblr blogs, here is a way to insert Latex.—From Simply Statistics
 Harvard Business school is getting in on the fun, calling the data scientist the sexy profession for the 21st century. Although I am a little worried that by the time it gets into a Harvard Business document, the hype may be outstripping the real promise of the discipline. Still, good news for statisticians! (via Rafa via Francesca D.’s Facebook feed).—From Simply Statistics
 The counterpoint is this article which suggests that data scientists might be able to be replaced by tools/software. I think this is also a bit too much hype for my tastes. Certain things will definitely be automated and we may even end up with a deterministic statistical machine or two. But there will continually be new problems to solve which require the expertise of people with data analysis skills and good intuition (link via Samara K.)—From Simply Statistics
2012/9/24—2012/11/28:
 Grad Student’s Guide to Good Coffee+Grad Student’s Guide to Good Tea
 Favorite Apps for Work and Life
 estimating a constant (not really)
 Reinforcement Learning in R: An Introduction to Dynamic Programming
 The Future of Machine Learning (and the End of the World?)
 10 Papers Every Programmer Should Read (At Least Twice)
 R in the Press
 On Chomsky and the Two Cultures of Statistical Learning
 Speech Recognition Breakthrough for the Spoken, Translated Word
 Frequentist vs Bayesian
 w4s – the awesomeness we’re experiencing
 Why is the Gaussian so pervasive in mathematics?
 C++ Blogs that you Regularly Follow
 An interview with Brad Efron about scientific writing. I haven’t watched the whole interview, but I do know that Efron is one of my favorite writers among statisticians.
 Slidify, another approach for making HTML5 slides directly from R. (1) It is still just a little too hard to change the theme/feel of the slides (2) The placement/insertion of images is still a little clunky, Google Docs has figured this out, if they integrated the best features of Slidify, Latex, etc. into that system, it will be great.
 Statistics is still the new hotness. Here is a Business Insider list about 5 statistics problems that will“change the way you think about the world”.
 New Yorker, especially the line,”statisticians are the new sexy vampires, only even more pasty” (via Brooke A.)
 The closed graph theorem in various categories
 Got spare time? Watch some videos about statistics
 About the first BorelCantelli lemma
 Yihui Xie—The Setup
 Best Practices for Scientific Computing
2012/12/5—2013/1/20:
 Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ
 A Funny Thing Happened on the Way to Academia . . .
 Advice for students on the academic job market (2013 edition)
 Perspective: “Why C++ Is Not ‘Back’”
 Is Fourier analysis a special case of representation theory or an analogue?
 The Beauty of Bioconductor
 The State of Statistics in Julia
 Open Source Misfeasance
 Book review: The Signal and The Noise
 Should the Cox Proportional Hazards model get the Nobel Prize in Medicine?
 The most influential data scientists on Twitter
 Here is an interesting review of Nate Silver’s book. The interesting thing about the review is that it doesn’t criticize the statistical content, but criticizes the belief that people only use data analysis for good. This is an interesting theme we’ve seen before. Gelman also reviews the review.—–Simply Statistics
 Video : “Matrices and their singular values” (1976)
 Beyond Computation: The P vs NP Problem – Michael Sipser—This talk is arguably the very best introduction to computational complexity .
 What are some of your personal guidelines for writing good, clear code?
 How do you explain Machine learning and Data Mining to non CS people?
 Suggested New Year’s resolution: start a blog: A blog forces you to articulate your thoughts rather than having vague feelings about issues; You also get much more comfortable with writing, because you’re doing it rather than thinking about doing it; If other people read your blog you get to hear what they think too. You learn a lot that way.  Set aside time for your blog every day. Keep notes for yourself on bloggy subjects (write a oneline gmail to yourself with the subject “blog ideas”).
 The most influential data scientists on Twitter
 Tips on job market interviews
 The age of the essay
2013/2/16—2014/2/25:
 Interview with Nick Chamandy, statistician at Google
 You and Your Research + video
 Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained
 A Survival Guide to Starting and Finishing a PhD
 Six Rules For Wearing Suits For Beginners
 Why I Created C++
 More advice to scientists on blogging
 Software engineering practices for graduate students
 Statistics Matter
 What statistics should do about big data: problem forward not solution backward
 How signals, geometry, and topology are influencing data science
 The Bounded Gaps Between Primes Theorem has been proved
 A noncomprehensive list of awesome things other people did this year.
 Jake VanderPlas writes about the Big Data Brain Drain from academia.
 Tomorrow’s Professor Postings
 Best Practices for Scientific Computing
 Some tips for new researchoriented grad students
 3 Reasons Every Grad Student Should Learn WordPress
 How to Lie With Statistics (in the Age of Big Data)
 The Geometric View on Sparse Recovery
 The Mathematical Shape of Things to Come
 A Guide to Python Frameworks for Hadoop
 Statistics, geometry and computer science.
 How to Collaborate On GitHub
 Step by step to build my first R Hadoop System
 Open Sourcing a Python Project the Right Way
 Data Science MD July Recap: Python and R Meetup
 git 最近感悟
 10 Reasons Python Rocks for Research (And a Few Reasons it Doesn’t)
 Effective Presentations – Part 2 – Preparing Conference Presentations
 Doing Statistical Research
 How to Do Statistical Research
 Learning new skills
 How to Stand Out When Applying for An Academic Job
 Maturing from student to researcher
 False discovery rate regression (cc NSA’s PRISM)
 Job Hunting Advice, Pt. 3: Networking
 Getting Started with Git
2014/2/26—2014/9/11
 Some R Resources for GLMs
 失联搜救中的统计数据分析
 The gap between data mining and predictive models
 Data Mining, machine learning and statistics.
 useR! 2014 is underway with 16 tutorials
 What is Scalable Machine Learning?
 rlist：基于list在R中处理非关系型数据
 The perfect candidate
 The Leek group guide to giving talks
 38 Seminal Articles Every Data Scientist Should Read
 Deep Learning – important resources for learning and understanding
 Twenty rules for good graphics + Ten Simple Rules for Better Figures
 Git Cookbook
 Making Your Code Citable
 biblatex for statisticians
 Do your “data janitor work” like a boss with dplyr
2014/9/22—2014/12/04:
 Tutorial: How to detect spurious correlations, and how to find the …
 Practical illustration of MapReduce (Hadoopstyle), on real data
 Jackknife logistic and linear regression for clustering and predict…
 From the trenches: 360degrees data science
 A synthetic variance designed for Hadoop and big data
 Fast Combinatorial Feature Selection with New Definition of Predict…
 A little known component that should be part of most data science a…
 11 Features any database, SQL or NoSQL, should have
 Clustering idea for very large datasets
 Hidden decision trees revisited
 Correlation and RSquared for Big Data
 Marrying computer science, statistics and domain expertize
 New pattern to predict stock prices, multiplies return by factor 5
 What Map Reduce can’t do
 Excel for Big Data
 Fast clustering algorithms for massive datasets
 Source code for our Big Data keyword correlation API
 The curse of big data
 How to detect a pattern? Problem and solution
 Interesting Data Science Application: Steganography
 Easily create documents from R with Rmarkdown
 How to publish R and ggplot2 to the web
 magrittr: Simplifying R code with pipes
 Updated dplyr Examples
 Video introduction to data manipulation with dplyr
 R and Data Science
 jiebaR中文分词——R的灵活，C的效率
 Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
 41 hours of courses given in Iceland this Summer at the Machine Learning Summer School.
 summary of parallel machine learning approaches
 big data and data science talks
################## From SimplyStats ##################Editor’s Note: Last year I made a list off the top of my head of awesome things other people did. I loved doing it so much that I’m doing it again for 2014. Like last year, I have surely missed awesome things people have done. If you know of some, you should make your own list or add it to the comments! The rules remain the same. I have avoided talking about stuff I worked on or that people here at Hopkins are doing because this post is supposed to be about other people’s awesome stuff. I wrote this post because a blog often feels like a place to complain, but we started Simply Stats as a place to be pumped up about the stuff people were doing with data. Update: I missed pipes in R, now added!
 I’m copying everything about Jenny Bryan’s amazing Stat 545 class in my data analysis classes. It is one of my absolute favorite open online set of notes on data analysis.
 Ben Baumer, Mine CetinkayaRundel, Andrew Bray, Linda Loi, Nicholas J. Horton wrote this awesome paper on integrating R markdown into the curriculum. I love the stuff that Mine and Nick are doing to push data analysis into undergrad stats curricula.
 Speaking of those folks, the undergrad guidelines for stats programs put out by the ASA do an impressive job of balancing the advantages of statistics and the excitement of modern data analysis.
 Somebody tell Hector Corrada Bravo to stop writing so many awesome papers. He is making us all look bad. His epiviz paper is great and you should go start using the Bioconductor package if you do genomics.
 Hilary Mason founded fast forward labs. I love the business model of translating cutting edge academic (and otherwise) knowledge to practice. I am really pulling for this model to work.
 As far as I can tell 2014 was the year that causal inference become the new hotness. One example of that is this awesome paper from the Google folks on trying to infer causality from related time series. The R package has some cool features too. I definitely am excited to see all the new innovation in this area.
 Hadley was Hadley.
 Rafa and Mike taught an awesome class on data analysis for genomics. They also created a book on Github that I think is one of the best introductions to the statistics of genomics that exists so far.
 Hilary Parker wrote this amazing introduction to writing R packages that took the twitterverse by storm. It is perfectly written for people who are just at the point of being able to create their own R package. I think it probably generated 100+ R packages just by being so easy to follow.
 Oh you’re not reading StatsChat yet? For real?
 FiveThirtyEight launched. Despite some early bumps they have done some really cool stuff. Loved the recent piece on the beer mile and I read every piece that Emily Oster writes. She does an amazing job of explaining pretty complicated statistical topics to a really broad audience.
 David Robinson’s broom package is one of my absolute favorite R packages that was built this year. One of the most annoying things about R is the variety of outputs different models give and this tidy version makes it really easy to do lots of neat stuff.
 Chung and Storey introduced the jackstraw which is both a very clever idea and the perfect name for a method that can be used to identify variables associated with principal components in a statistically rigorous way.
 I rarely dig exceltype replacements, but the simplicity of charted.co makes me love it. It does one thing and one thing really well.
 The hipsteR package for teaching old R dogs new tricks is one of the many cool things Karl Broman did this year. I read all of his tutorials and never cease to learn stuff. In related news if I was 1/10th as organized as that dude I’d actually you know, get stuff done.
 Whether I agree with them or not that they should be allowed to do unregulated human subjects research, statistics at tech companies, and in particular randomized experiments have never been hotter. The boldest of the bunch is OKCupid who writes blog posts with titles like, “We experiment on human beings!”
 In related news, I love the PlanOut project by the folks over at Facebook, so cool to see an open source approach to experimentation at web scale.
 No wonder Mike Jordan (no not that Mike Jordan) is such a superstar. His reddit AMA raised my respect for him from already super high levels. First, its awesome that he did it, and second it is amazing how well he articulates the relationship between CS and Stats.
 I’m trying to figure out a way to get Matthew Stephens to write more blog posts. He teased us with the Dynamic Statistical Comparisons post and then left us hanging. The people demand more Matthew.
 Di Cook also started a new blog in 2014. She was also part of this cool exploratory data analysis event for the UN. They have a monster program going over there at Iowa State, producing some amazing research and a bunch of students that are recognizable by one name (Yihui, Hadley, etc.).
 Love this paper on sure screening of graphical models out of Daniela Witten’s group at UW. It is so cool when a simple idea ends up being really well justified theoretically, it makes the world feel right.
 I’m sure this actually happened before 2014, but the Bioconductor folks are still the best open source data science project that exists in my opinion. My favorite development I started using in 2014 is the gitsubversion bridge that lets me update my Bioc packages with pull requests.
 rOpenSci ran an awesome hackathon. The lineup of people they invited was great and I loved the commitment to a diverse group of junior R programmers. I really, really hope they run it again.
 Dirk Eddelbuettel and Carl Boettiger continue to make bigtime contributions to R. This time it is Rocker, with Docker containers for R. I think this could be a reproducibility/teaching gamechanger.
 Regina Nuzzo brought the pvalue debate to the masses. She is also incredible at communicating pretty complicated statistical ideas to a broad audience and I’m looking forward to more stats pieces by her in the top journals.
 Barbara Engelhardt keeps rocking out great papers. But she is also one of the best AE’s I have ever had handle a paper for me at PeerJ. Super efficient, super fair, and super demanding. People don’t get enough credit for being amazing in the peer review process and she deserves it.
 Ben Goldacre and Hans Rosling continue to be two of the best advocates for statistics and the statistical discipline – I’m not sure either claims the title of statistician but they do a great job anyway. This piece about Professor Rosling in Science gives some idea about the impact a statistician can have on the most current problems in public health. Meanwhile, I think Dr. Goldacre does a great job of explaining how personalized medicine is an information science in this piece on statins in the BMJ.
 Michael Lopez’s series of posts on graduate school in statistics should be 100% required reading for anyone considering graduate school in statistics. He really nails it.
 Trey Causey has an equally awesome Getting Started in Data Science post that I read about 10 times.
 Drop everything and go read all of Philip Guo’s posts. Especially this one about industry versus academia or this one on the practical reason to do a PhD.
 The top new Twitter feed of 2014 has to be @ResearchMark (incidentally I’m still mourning the disappearance of @STATSHULK).
 Stephanie Hicks’ blog combines recipes for delicious treats and statistics, also I thought she had a great summary of the Women in Stats (#WiS2014) conference.
 Emma Pierson is a Rhodes Scholar who wrote for 538, 23andMe, and a bunch of other major outlets as an undergrad. Her blog, obsessionwithregression.blogspot.com is another must read. Here is an example of her awesome work on how different communities ignored each other on Twitter during the Ferguson protests.
 The Rstudio crowd continues to be on fire. I think they are a huge part of the reason that R is gaining momentum. It wouldn’t be possible to list all their contributions (or it would be an Rstudio exclusive list) but I really like Packrat and R markdown v2.
 Another huge reason for the movement with R has been the outreach and development efforts of the Revolution Analytics folks. The Revolutions blog has been a must read this year.
 Julian Wolfson and Joe Koopmeiners at University of Minnesota are straight up gamers. They live streamed their recruiting event this year. One way I judge good ideas is by how mad I am I didn’t think of it and this one had me seeing bright red.
 This is just an awesome paper comparing lots of machine learning algorithms on lots of data sets. Random forests wins and this is a nice update of one of my favorite papers of all time: Classifier technology and the illusion of progress.
 Pipes in R! This stuff is for real. The piping functionality created by Stefan Milton and Hadley is one of the few inventions over the last several years that immediately changed whole workflows for me.
##########################################################################
2014/12/05—2015/2/20:
 Deep Learning Master Class
 Advances in Variational Inference
 Numerical Optimization: Understanding LBFGS
 An exact mapping between the Variational Renormalization Group and Deep Learning
 New ASA Guidelines for Undergraduate Statistics Programs
 奇异值分解（We Recommend a Singular Value Decomposition）
 如何简单形象又有趣地讲解神经网络是什么？
 Academic vs. Industry Careers
 Hadley Wickham: Impact the world by being useful
 Statisticians in World War II: They also served
 A Brief Overview of Deep Learning
 Advice for applying Machine Learning
 Deep Learning Tutorial
 Gibbs Sampling in Haskell
 Howto go parallel in R – basics + tips
2015/2/21—2015/7/31
 hierarchical models are not Bayesian models
 嘿，朋友，抢红包了吗？
 xgboost: 速度快效果好的boosting模型
 Machine Learning for Programming
 Deep stuff about deep learning?
 《怎样快糙猛的开始搞Kaggle比赛》aka 迅速入门当上挣钱多干活少整天猎头追跳槽涨一倍数据科学家的捷径. 本文写给想开始搞Kaggle比赛又害怕无从下手的小朋友们。原文发表于 http://t.cn/RAqksWV
 Randomized experimentation
2015/8/1—
 “Navigating Big Data Careers with a Statistics PhD.”
 Great article from Professor Radhika Nagpal (Harvard) on tenuretrack life.
 Career advice for academics from Robert Sternberg (Cornell).
 Installing R on OS X + Installing R on OS X – “100% Homebrew Edition”
2 comments
Comments feed for this article
April 8, 2013 at 4:53 pm
Daniel Chavez Moran
Valuable info. Lucky me I discovered your web site accidentally, and
I am shocked why this accident did not came about earlier!
I bookmarked it.
April 23, 2013 at 4:15 pm
Life Insurance Premium Calculator
I got this website from my pal who shared with me regarding this web
page and now this time I am visiting this web site and
reading very informative articles here.