Recently we are holding a journal club on RNA-Seq data analysis and this is a promising area to work on. Here I want to list some good papers for future reading:

  1. Julia Salzman, Hui Jiang and Wing Hung Wong (2011), Statistical Modeling of RNA-Seq Data. Statistical Science 2011, Vol. 26, No. 1, 62-83. doi: 10.1214/10-STS343. (We are done with this paper.)
  2. Turro E, Su S-Y, Goncalves A, Coin LJM, Richardson S and Lewin A (2011). Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biology. 12:R13. journal page. (RNA-seq produces sequence information that can be used for genotyping and phasing of haplotypes, thus permitting inferences to be made about the expression of each of the two parental haplotypes of a transcript in a diploid organism. )
  3. Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation, Jingyi Jessica Li, Ci-Ren Jiang, James B. Brown, Haiyan Huang, and Peter J. Bickel. ( SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. )
  4. Dalpiaz, D., He, X., and Ma, P. (2012) Bias correction in RNA-Seq short-read counts using penalized regression , Statistics in Biosciences , DOI: 10.1007/s12561-012-9057-6. [Software]
  5. M. Nicolae and S. Mangul and I.I. Mandoiu and A. Zelikovsky, Estimation of alternative splicing isoform frequencies from RNA-Seq data, Algorithms for Molecular Biology 6:9, 2011, pdf preprint, publisher url, bibtex (In this paper it presents a novel expectation-maximization algorithm for inference of isoform- and
    gene-specific expression levels from RNA-Seq data.)
  6. There is a special issue for DNA-Seq, especially the paper: Statistical Issues in the Analysis of ChIP-Seq and RNA-Seq Data
  7. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
  8. Sensitive Gene Fusion Detection Using Ambiguously Mapping RNA-Seq Read Pairs (Paired-end whole transcriptome sequencing provides evidence for fusion transcripts. However, due to the repetitiveness of the transcriptome, many reads have multiple high-quality mappings. Previous methods to find gene fusions either ignored these reads or required additional longer single reads. This can obscure up to 30% of fusions and unnecessarily discards much of the data. We present a method for using paired-end reads to find fusion transcripts without requiring unique mappings or additional single read sequencing.) Availability: A C++ and Python implementation of the method demonstrated in this paper is available at