On heritability (3): Non-additive effects – dominance

Dominance is a just a fancy-schmancy way of saying that the phenotypic effects of alleles at a single locus are not additive 💁. Ok, but what does that mean?

To illustrate, let’s go back to the example of differences in milk-yield among a bunch of cows 🐮🐮 (Fig. 1). As I mentioned in a previous post, some of these differences are due to genetic differences among the cows and some are due to environmental differences. For example, there could be (average) differences between the milk yield of cows with AA genotype (30 liters) vs cows with Aa genotype (20 liters) vs cows with aa genotype (10 liters). In this example, each additional A allele results in an (average) increase of 10 liters of milk (Fig. 1A). This may not always be the case if, for example, cows with the Aa genotype and AA genotype have the same average milk yield: 30 liters (Fig 1B). If this is the case, we say that the effect of alleles at this locus are non-additive or that there is dominance at this locus.


Fig. 1: A) Additivity is when each additional allele adds a fixed value to the phenotype. B) Things are not that symmetric when there’s dominance. In this example, the average phenotype is the same for the AA and Aa genotypes.

When you were learning genetics in school 🏫, you probably came across terms like “dominant” and “recessive”, which are often used to refer to Mendelian traits such as the presence/absence of cleft chin 🍑 in humans. These terms are ok for simple Mendelian traits but quite limiting for quantitative traits. Also limiting are terms like complete dominance, a special case of dominance in which the phenotypic distributions of AA and Aa genotypes are indistinguishable (Fig. 1B). There is a more flexible way to describe dominance: d.

The dominance deviation, d, describes the degree (and direction) in which genotypic values deviate from additivity.

🔴✋Sidebar: It’s useful to define genotypic value here as it is an important concept in quantitative genetics. The genotypic value is the average phenotype of a specific genotype. For example, the genotypic value of the Aa genotype in Fig. 1A is 20 liters. So even though some individuals carrying the Aa genotype have higher or lower milk yields because of environmental differences among them, they all yield, on average, 20 liters of milk. Thus, the genotypic value is supposed to represent the ‘true’ phenotypic effect of the genotype. It is often convenient to represent genotypic values as deviations from the mean of the population. For example, the genotypic value of the aa genotype in Fig. 1A can be written as 10 – 20 = -10 (instead of 10) if we assume the average milk yield in the population is 20 liters. This is mostly for convenience sake in calculations so don’t worry too much about this. Sidebar over. 👍

Now take a look at Fig. 2 and give it some time. 🤔

Fig. 2: Different values of d describe different cases of dominance. The scale represents the genotypic values, where +1 and -1 refer to the genotypic values of AA and aa, respectively, expressed as scaled deviations from the mean. The red line, which represents the scaled dominance deviation, shows the extent to which the genotypic value of the heterozygote Aa deviates from what we expect had the allelic effects been completely additive.

Can you see how the dominance deviation flexibly and quantitatively describes the different cases of dominance (including no dominance!)? The meaning of the different values of d are further explained below:

d = 0: no dominance. The allelic effects are completely additive and the average phenotype of Aa is smack dab in the middle between AA and aa.

d = 1: complete dominance. AA has the same genotypic value as Aa. Note that this is the same as the example of milk yield in Fig. 1B.

0 > d > 1: partial dominance. The genotypic value of Aa lies somewhere between that of AA and aa. As d gets closer to 1, the genotypic value of Aa gets closer to that of AA.

d > 1: overdominance. This is when the genotypic value of Aa is greater than that of either homozygote. So, for example, the average milk yield of cows carrying the Aa genotype might greater than the average milk yield of cows carrying the AA or aa genotype. This is sometimes referred to as hybrid vigor.

d < -1: underdominance. The genotypic value of Aa is less than that of either homozygote.

🚨 There is one common misconception regarding dominance that I want to clear up: dominance does not tell us anything about how frequent the allele is in the population. For example, the A allele in Fig. 1B, even though it is ‘dominant’, might only be present at 10% frequency in the population.

Dominance is only one example of non-additive effects of alleles on the phenotype. So far I have only talked about the phenotypic effects of alleles at a single locus. What happens if we consider alleles at two or more loci? We’re faced with non-additive effects of alleles across loci. Non-additive effects of alleles at two or more loci are called epistasis and I plan to talk about this next time. See you in a couple of months…📆

What your genetic ancestry test can and cannot tell you about your genealogical ancestors

The topic of this post was not on my radar 📡 but I don’t live under a rock and I’ve been following the recent developments with Elizabeth Warren and her much publicized ancestry results. Political drama aside, there are concerns over what such results mean (and don’t mean) for one’s identity and genealogical history. While many geneticists and anthropologists have weighed in on the matter, their discussions are not very accessible for people who are not statistical geneticists. So I figured I would take a crack at explaining it in simpler terms.

Quick run down: Ms. Warren is somewhat Native American. The analysis was conducted by a well-known geneticist who knows his sh*t. While I cannot find the % of Native American ancestry in his report, one can wager a very rough guess of 0.1% – 1.6% based on the simple fact that the number of ancestors doubles in every previous generation: you have two parents, four grandparents, eight great grandparents, 16 great great grandparents and so on. Based on a more sophisticated algorithm, it is estimated that Elizabeth Warren had an ancestor who was 100% Native American 6-10 generations ago. Ten generations ago, you, she, and I had 2^10 (1,024) ancestors! Some of these passed on their genetic material to you and some did not (in fact many did not).

Some questions surrounding Warren’s ancestry results are: Does her ancestry make her Native American? What if her ancestry results had shown that she didn’t have any Native American ancestry?

I want to try to answer these questions by illustrating how chromosomal segments are inherited over a small number of generations forward in time. Take a look at Fig. 1 which shows a pedigree on the left (the genealogical history of a family). On the right I’m showing a single possibility of how chromosomal segments in this family might be inherited.


Fig. 1: Difference between genealogical ancestry and genetic ancestry. A family tree (the genealogical ancestry) is shown on the left where the squares represent males and circles represent females. To the right is one possible way in which chromosomes can be inherited by family members. The numbers with % signs refer to the % of blue ancestry in each child.

In the first generation, two people of different ancestries (blue and yellow) get together and have two kids, Child 1 and Child 2. Each parent gets to pass only 50% of their genetic material (only counting autosomes for simplicity) to their offspring. At every independent location on a chromosome (shown as segments), only one of two alleles can be passed on by each parent (two because chromosomes come in pairs). Which allele gets passed on is determined by a random process (independent assortment). In Fig. 1, the blue mother can only pass blue alleles to her kids and the yellow father can only pass yellow alleles to his kids. As a result, both children in the 2nd generation will carry exactly 50% of the “blue” ancestry. So far so good?

Now Child 1 goes off and has two kids (Child 3 and Child 4) with another person of yellow ancestry. Child 1 can pass either a blue allele or a yellow allele (picked at random) at each segment to her children because she has both blue and yellow chromosomes. Because the “choice” of alleles in each segment is random, Child 3 inherits her mother’s yellow allele in the 1st segment, blue allele in the 2nd segment, blue again in the 3rd segment, yellow in the 4th, and blue in the 5th. Child 3 cannot inherit blue alleles from her dad, who is 100% yellow, giving her a total of 30% blue ancestry. Meanwhile, Child 4 happens to receive a slightly different configuration from her mother (blue, yellow, blue, yellow, yellow), adding up to a total of 20% blue ancestry. Even though we expect Child 3 and Child 4 to inherit 25% blue ancestry (it gets diluted by 1/2 in every generation – see Fig. 2), it doesn’t mean that’s going to happen. Because which allele the mother picks to give to her children is random, Child 3 and Child 4 can carry different % of blue ancestry. This difference between the expectation and what we actually observe can increase with every subsequent generation. For example, Child 5 and Child 6 have 20% and 0% blue ancestry in their genome, respectively, even though we expect both of them to have 12.5% blue ancestry.


Fig. 2: The % of blue ancestry is expected to decrease every generation by 1/2 (red line and numbers on top of the red line). However, this is only the expectation. In other words, if you had many many families at every generation from the same pedigree, you would see that the blue ancestry would decrease, on average, by 1/2 in every generation. However, this % can be very different between two siblings as shown in Fig. 1. The dotted lines show the range of % blue ancestry that different siblings in each generation can have. It’s also clear that the blue ancestry can be lost in some siblings (lower dotted line touches the 0% mark). The code for simulation and plot is available here.

What is even more interesting is that eventually you could have someone without any blue ancestry at all (Child 6). This raises an important question: Does Child 6 have a blue ancestor? The answer is: yes!

So, if someone has Native American ancestry, they necessarily would have had to have an ancestor who, at some point in the past, was 100% Native American. However, just because someone does not have any Native American ancestry does not mean that they did not have a Native American ancestor (triple negative was unavoidable 🤕).

What does this mean for Elizabeth Warren? Did she have a Native American ancestor in the past? Yes, but that’s not saying a lot. There are many people in the world walking around without any Native American ancestry even though they had Native American ancestors at some point in the past, as the example in Fig. 1 illustrates. Does that make them Native American? Is she Native American? This is not a question genetics can answer as a person’s identity depends on their culture, geography, customs, and family history. Forcing genetics to answer such questions presents more questions than answers: How much Native American ancestry do you have to have to be called Native American? How far back would you have had to have a Native American ancestor? Answers to these questions will be driven by inherent biases that are influenced by our political and social inclinations. But they are certainly not scientific answers.

Further reading: If you want a more detailed understanding of the difference between genetic and genealogical ancestry, be sure to check out Graham Coop’s blog post on the subject.



—— UPDATE 10/18/2018: ——-

Technical clarification on Fig. 2: The red line (mean) and dashed lines (range = max – min) are based on a simulation of 1,000 independent loci and 1,000 predigrees. The simulations are meant to illustrate the concept that the observed % ancestry can deviate greatly from expectation. In reality, there are many more loci in the human genome, and the patterns of inheritance are much more complex because of linkage and recombination. Because of these reasons, it is unlikely, as someone pointed out the comments 👇, that the ancestry is going to be lost in two generations as Fig. 2 currently shows.

On heritability (2): narrow and broad sense heritability

Quick recap: I mentioned last time that heritability = Vg / Vp, where Vg is the genetic variance among individuals and Vp is the total phenotypic difference among individuals. Furthermore, Vp = Vg + Ve, where Ve is the environmental variance contributing to phenotypic differences.

There are two different types of heritability: broad-sense heritability and narrow-sense heritability. What we’ve been discussing so far is broad-sense heritability: the proportion of phenotypic variance that can be explained by ALL genetic differences among individuals. Geneticists often like to split Vg, the genetic variance, into two: additive genetic variance and non-additive genetic variance. There is a reason for this split (besides a masochistic need to make life harder for everyone 🤦🏽‍♂️), which I will talk about later but first let’s talk about what these things are.

Additive genetic variance first since that’s easier. Suppose there is a gene called A (very creative, I know 😛) involved in milk production in cows 🐮. Further suppose that a cow can have the genotype AAAa, or aa at gene A because cows are diploid. Cows with genotype aa produce, on average, 10 liters of milk everyday. I don’t know if that’s a normal amount or not so don’t judge me if you’re a farmer 👨🏻‍🌾 and you’ve chanced upon this blog. Cows with a single A allele in their genotype (i.e. genotype Aa or aA) produce, on average, 20 liters of milk and cows with genotype AA produce 30 (Fig. 1).


Figure 1: Cows with different genotypes can have different means for milk yield. This variation is due to a combination of genetic and environmental differences. However, among cows with the same genotype, differences in milk yield are purely due to environmental differences.

There are a couple of interesting things happening here (if we play fast and loose with the word ‘interesting’ 😅):

  1. There are differences in milk yield among cows of the same genotype even though there are no genetic differences among them (Vg = 0) (Fig. 1). Remember from the last post that these differences arise solely because of environmental differences. This is why I keep saying things like “cows with a single A allele produce, on average, 20 liters of milk” as there are some that might produce 22 and some that might produce 18, because of small environmental differences. Note also that this means that heritability among cows of the same genotype is 0. Ok, I think I’ve hammered 🔨 this point hard enough.
  2. The differences among groups of cows with different genotypes (AA vs Aa vs aa) arise not only due to environmental differences, but also due to genetic differences (i.e. Vg > 0). Because Vg > 0, Heritability > 0 and we can say that milk yield is heritable in this specific herd of cows (Fig. 1).
  3. It seems that each additional A allele that a cow carries, on average, increases the milk production by 10 liters (aa = 10, Aa = 20, AA = 30, Fig. 1). There is no dominant or recessive allele, despite my poor use of the letters. We say that the effect of A (and a if we look at it as decreasing milk yield) is additive. If one of the alleles were dominant/recessive, we would say the allelic effects are non-additive because we couldn’t just figure out the average milk yield by adding up the number of A alleles. (Side exercise: if A is the dominant allele and a is recessive, what would the average milk yields be for the three genotypes?).

The last point can be extended from one gene/ locus to two (or more) genes/loci. Different alleles at the two loci might contribute additively to milk yield (Fig. 2). This additivity has everything to do with the difference between narrow-sense and broad-sense heritability. If broad-sense heritability is the proportion of phenotypic variance that is due to ALL genetic effects (additive + non-additive), narrow-sense heritability is the proportion of phenotypic variation only due to additive genetic effects. What are non-additive effects? Why are they important? Why does this distinction matter? What is the meaning of life 🌄🤔? There’s a whole post about this (ok, not the last question), so stay tuned 😎🍿.


Figure 2: The effect of alleles on milk yield (numbers) in this example is additive. Doesn’t matter if we look at each gene separately, or combine effects across them.

Primary source: Lynch M, Walsh B. 1998. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates, Inc.

On heritability (1): How ‘genetic’ is a trait?

One of the reasons I’ve come to write about this is because recently I’ve been asked a lot about “how genetic is this trait or that trait?”. And I find myself responding with another question (yeah I’m that guy 👨🏽‍🏫, get over it): “What do you mean by genetic?”. I ask this because I think there is often a confusion between the term ‘genetic’ and ‘heritable’, and that distinction is extremely important to understand.

Maybe you’ve heard this before: Every trait is genetic. There are genes involved in the development of every trait. Nothing pops out of the environment (unless it’s a spray tan or something?🤔). Is height genetic? Yes. Is hair color genetic? Yes. If you go work out a lot🏋🏽‍♂️, are your bulging biceps genetic? Yes, because there are genes that respond to the body’s need by triggering a cascade of biochemical processes that result in muscle production. The question, “How genetic?”, is not a legit question. What one really means to ask is “How heritable?“.

Cutting to the chase and then I’ll explain: Heritability is the proportion of variation in a trait (Vp) explained by genetic differences among individuals (Vg).

Heritability = Vg/Vp

Vp = Vg + Ve. This is the classic equation which, in english, reads: Variation in the phenotype/trait can be explained by the sum of the variation in the genotype (Vg) and variation in the environment (Ve), where environment is everything that is not ‘genetic’. It becomes clear from the definition of heritability that it cannot be defined for a trait in a single individual. That’s not a thing. It can only be defined for differences between two or more individuals.

Let’s expand on this. Imagine there are 10 cows 🐮☘️ grazing on a pasture. They’re all clones of each other, i.e., they are all exactly the same genetically. If you milk all of them, the milk yield 🥛 will vary among them (Figure 1). How much it varies (the variance) is Vp. Is the milk yield of a cow genetic? There are genes involved in milk production, so yeah. Is the milk yield among the cows heritable? Nope. Think about it. Since the cows are clones of one another, there are no genetic differences among them (i.e. Vg = 0 👌🏽) . So any differences in milk yield have to be because of the environment (Vp = Ve). Maybe they were grazing on different regions of the pasture or maybe some of them were being tipped more than others 🙌🐄 (it’s a real thing). The heritability of milk yield among the cows is 0.


Figure 1: Variation in milk yield in a hypothetical herd of cloned cows. Since there is no genetic variation among cows, heritability of milk production in THIS set of cows is zero.

There are some other things about the concept of heritability that make it tricky. One thing that I’ll mention now and leave the rest for some other time is that heritability is not a fundamental property of the trait itself. It is context specific and is a property of the sample you are looking at. If I tried to measure the heritability of milk yield in another set of 10 cows grazing in the same pasture, who are genetically different from one another, the heritability of milk yield will likely not be zero since Vg > 0.

So when most people ask “how genetic is skin pigmentation” or “how genetic is intelligence” (that’s a whole can of worms that I hope to address at some point), they’re really asking “how heritable are they?”.


Introgression – what’s that all about?

Introgression is lit 🔥 these days. While the idea has been around forever, there has been a lot of interest in studying introgression in the last few years. Recently a journal-club discussion on an article (specifically this one 👉 Kuhlwilm et al. (2016)) prompted me to write about this. Introgression is nothing but the introduction of a new variant/allele into a population. This can occur through hybridization between genetically diverged populations of the same species or between closely related species.

Hold on a second now✋. If they’re two different species, how can they mix with each other? Aren’t different species by definition unmixable (well technically: two organisms are from the same species if they can interbreed AND produce a fertile offspring.)? Ignoring all other complications around the term species (asexually reproducing organisms, I’m looking at you), speciation does not happen overnight. It’s not like one day a part of the population rebels and claims to be a different species (even Indian and Pakistan didn’t happen that quickly 👀). Truth is, we don’t reeeally understand how speciation happens (more on this and introgression in another post). Even in the classic, well-understood case of allopatric speciation (speciation due to geographic isolation), gene flow, though it slows down over time, continues to occur as the populations separate, making speciation rather fuzzy (quite literally, see Fig. 1). It may even occur some time after the populations have diverged quite a bit (something referred to as secondary contact). It is these two cases that introgressors people who study introgression are often talking about.

OP-MOLB140159 2004..2017

Fig. 1: The species tree (blue line) is a measure of central tendency that captures the overall pattern of gene trees (fuzzy grey lines). Grey lines which span different species of bears (such as that between the Asian black bear and Sloth bear) are suggestive of gene flow from one species into another. – from Kutschera et al. (2014)

So why is introgression such a rage now? The idea has been around for a while, initially used in experimental breeding, where a trait, such as disease resistance 🌿 🐛 or cute fluffiness🐕 🐾 would be introduced into a breeding line. For a long time, it has also been discussed by evolutionary biologists who study naturally occurring plants and animals. One major reason it has become so popular in recent years is because the technology (cost effective high-throughput sequencing) and methodology (population genetic theory) have enabled us to harness genome-wide information from multiple individuals of many closely related species, making it possible to study complex evolutionary histories. That, and this paper 👉  Green et al. (2010). In 2010, a team of researchers sequenced the Neanderthal (or is it Neandertal?) genome and found that 1-4% of the genomes of non-Africans are made up of Neanderthal sequences. This could only mean one thing: some hanky panky went down between Neanderthals and non-Africans as they came out of Africa. Mind.blown. 😲  + 💥 = 🤕 . Wait, what? Yeah, the subset of human ancestors who left Africa made contact with Neanderthals and had a good ol’ time before Neanderthals went extinct.

🤔 At this point maybe you’re thinking: “Huh? How did that happen? Weren’t Neanderthals, or Homo neanderthalensis a different species from Homo sapiens. It’s right there!” Well, aside from the fact that the naming convention is man-made 🙄, speciation is complicated, and not as clear-cut as the Biological Species Concept makes it out to be. Gene flow between genetically diverged populations, while it may be evolutionarily prohibitive due to pre- and post-zygotic barriers, is not only possible, but much more common than we thought. C’mon it’s not that crazy. After all, we’re well aware of the existence of hybrids such as mules and ligers.


Fig. 2: Map showing regions of contact among humans and archaic hominins

Following the Neanderthal finding, researchers (a lot of the same people who were involved in the Neanderthal sequencing) also sequenced another extinct hominin, which they named Denisovans owing to the fact that it was discovered in Denisova Cave in Siberia Reich et al. (2010). They found that ~5% of the genomes of Melanesians derives its ancestry from Denisovans, suggesting interbreeding between Denisovans and ancestors of Melanesians (Fig. 2). In genetics-research time, this was a loooong time ago (the discovery, not the admixture). Since then, s***t has hit the fan for human introgression, with many studies showing admixture among human ancestors and extinct hominins (Fig. 3), some of whom we haven’t even sequenced yet! (Hammer et al. (2011), Sokglund and Jakobsson (2011)Prüfer et al. (2013)Vernot and Akey (2014), Sankaram et al. (2014), Lu et al. (2016), Kuhlwilm et al. (2016), Sankaram et al. (2016) to cite  a FEW). For geneticists and evolutionary biologists 🤘, this is an exciting time not only for sketching out human evolutionary history, but also for understanding speciation on a fundamental level and how we define ‘species’ in the first place. Stay tuned for a post on this last bit: “Is geographic isolation important for speciation?” – or something cooler along those lines.


Fig. 3. Possible introgression events among humans and archaic hominins. The blue arrow shows introgression from an unknown archaic hominin species – Kuhlwilm et al. (2016)

Next up: The stage is set – How do we detect introgression? I’ll talk about how we can draw inferences from even one or two individuals from a species (not-a-real-hint: not your everyday N= 1 problem). Also – incomplete lineage sorting (ILS) explained. I literally can’t wait…