The meaning of the term ”Biological Replicate” unfortunately often does not get adequately addressed in many publications. “Biological Replicate” can have multiple meanings, depending upon the context of the study. A general definition could be that biological replicates are when the same type of organism is grown/treated under the same conditions. For example, if one was performing a cell-based study, then different flasks containing the same type of cell (and preferably the exact same lineage and passage number) which have been grown under the same conditions could be considered biological replicates of one another. The definition becomes a bit trickier when dealing with higher-order organisms, especially humans. This may be an entire discussion in and of itself, but in this case, it is important to note that one does not have a well-defined lineage or passage number for humans. Indeed, it is basically impossible to ensure that all of your samples for one treatment or control have been exposed to the same external factors. In this case, one must do all that is possible to accurately portray and group these organisms; thus, one should group according to such traits as gender, age, and other well-established cause-effect traits (smokers, heavy drinkers, etc.).
Also, it may be helpful to outline the contrast between biological and technical replicates. Though people have varying definitions of technical replicates, perhaps the purest form of technical replicate would be when the exact same sample (after all preparatory techniques) is analyzed multiple times. The point of such a technical replicate would be to establish the variability (experimental error) of the analysis technique (mass spectrometry, LC, etc.), thus allowing one to set confidence limits for what is significant data. This is in contrast to the reasoning behind a biological replicate, which is to establish the biological variability which exists between organisms which should be identical. Knowing the inherent variability between “identical” organisms allows one to decide whether observed differences between groups of organisms exposed to different treatments is simply random or represents a “true” biological difference induced by such treatment.
Biological Factor: Single biological parameter controlled by the investigator. For example, genotype, diet, environmental stimulus, age, etc.
Treatment or Treatment Level: An exact value for a biological factor; for example, stress, no-stress, young, old, drug-treated, placebo, etc.
Condition: A single combination of treatments; for example, strain1/stressed/time10, young/drug-treated, etc.
Sample: An entity which has a single condition and is measured experimentally; for example serum from a single mouse, a sample drawn from a pool of yeast, a sample of pancreatic beta cells pooled from 5 diabetic animals, the third blood sample taken from a participant in a drug study.
Biological Measurement: A value measured on a collection of samples; for example, abundance of protein x, abundance of phospho-protein y, abundance of transcript z.
Experiment: A collection of biological measurements on two or more samples.
Replicate: Two sets of measurements, either within a single experiment or in two different experiments, where measurements are made on samples in the same condition.
Technical Replicates: Replicates that share the same sample; i.e. the measurements are repeated.
Biological Replicates: Replicates where different samples are used for both replicates
Question: Technical/Biological Replicates in RNA-Seq For Two Cell Lines
I have a question around the meaning of “biological replicate” in the context of applying RNA-seq to compare two cell lines. Apologies if this is an overly naeve question.
We have two human cell lines, one of which was derived from the other. Both have different phenotypes, and we want to use RNA-seq to explore the genetic underpinnings of the difference.
If we generate one cDNA library for each sample, and sequence each library on two lanes of an Illumina GA flowcell, I understand we will have “technical replicates”. In this scenario, we can expect very little difference between the two replicates in a sample. If we were to use something like DESeq to call differential expression, it would be inappropriate to treat our technical replicates as replicates in DESeq, since that would likely lead to a large list of DE calls that don’t reflect biological differences.
So, I’d like to know if it possible within our model to have “biological replicates” with which we can use DESeq to call biologically meaningful differential expression.
So, two questions:
(1) If we grow up two sets of cells from each of our two cell lines, generate separate cDNA libraries (4 in total), and sequence them on separate lanes, would these be considered “biological replicates” in the sense that it would be appropriate to treat them as replicates within something like DESeq. I suspect not, since the fact that both replicates in a sample derive from a single cell line within a short period of time will mean that they will be very similar anyway, almost as similar as the technical replicate scenario. Perhaps we would need entirely separate cell lines to be considered biological replicates.
(2) In general, how would others address this – does it seem a better approach to go with separate cells and separate libraries, or would this entail extra effort for effectively no benefit?
Two “biological replicate” are two samples that should be identical (as much as you can/want control) but are biologically separated (different cells, different organisms, different populations, colonies…)
You want to check the difference between cell Line A and cell line B. Let’s start assuming they are identical. Even if they are, by random fluctuation, technical issues, intrinsic slightly different environments… you will never observe that all genes have exactly the same expression. You find differences but can’t conclude if they are inevitable fluctuation or result of an actual difference.
So, you want to have 2 independent populations from A and two independent populations from B and then see how the variability WITHIN A1 and A2 compare to B1 and B2. The RNA levels from A1 and A2 WILL NOT be the same because… because biological system are far from being deterministic. They might be very similar, but different.
because A and B would be on different plates (their environment) I would seed A1 and A2, B1 and B2 the same day on 4 distinct (but as similar as possible) dishes, grow them together in the same condition to minimize external influence, and then collect at once from the 4 cell lines, extract RNA…
Since the cost is not growing cell lines, but sequencing, I would recommend to do 4 independent replicates for A and for B (or any other cell lines you may be interested in) in ONE GO, and then freeze the sample or the RNA. Even better, if you could have somebody to give you the lines called alfa, bravo, charlie, delta… (make sure they keep track of what they are in a safe place 😉 ) so that you are not biased while seeding, growing and manipulating the lines, that would be even better!