Home Research Here is how C...

Here is how Coffea Arabica originated from a single super plant 10,000 year ago

Coffee trees in Bonga, Ethiopia

MILAN – Arabica coffee probably appeared in southwestern Ethiopia, where it now grows in mountain zones, in the understorey of primary or secondary forests. It arose from a cross between two coffee species: Coffea canephora and Coffea eugenioides . This event, called “allopolyploidization”, was discovered 20 years ago (1).

The new study published in Nature-Scientific reports reveals that in this case, allopolyploidization happened just once. “This means that a single plant, a super individual, gave rise to the entire species C. arabica and to the millions of trees now grown worldwide, throughout the intertropical zone”, says Benoît Bertrand, coordinator of the study and of coffee varietal improvement operations at CIRAD.

“This event happened between 10 and 20 000 years ago, whereas previous studies put it at 100 to 600 000 years ago.”

A combination of three conditions gave rise to the species C. arabica

The resulting super individual had very little chance of even existing, let alone reproducing, according to the researchers involved in the study.

“Three things were needed for this event to happen. Firstly, both parent species had to cohabit within a given zone”, Benoît Bertrand explains.

“Then the climate had to favour the thermal shocks that enabled the gametes of the two species to fuse. This was unlikely but not impossible. Lastly – and this is much rarer – the self-incompatibility systems of the two parent species had to be “turned off” in order for the newly formed plant to be able to reproduce without needing to backcross with the two parent species, which would have destabilized it.”

This required the insertion of a retrotransposon* when the gametes fused (2).

C. arabica has limited genetic diversity

The study showed that as a result, the genetic diversity of the species was very limited, and split between three sub-groups of cultivated varieties:

  • so-called Yemen-Harare varieties, which gave rise to the main varieties now grown worldwide, particularly in Latin America. These varieties were probably chosen by the Arabs in the 14th century or thereabouts, for their capacity to adapt to full sunlight
  • those grown in Ethiopia (group called ‘Jimma-Bonga’)
  • little-known varieties that originated in forests (Sheka population).

This structure explains how crossing the group of American varieties with “Ethiopian” varieties from forests produces healthy (6), very vigorous F1 hybrids (7) suitable for agroforestry (8).

“This consolidates the coffee improvement strategy we launched over thirty years ago in Central America, using genetic resources introduced at CATIE by CIRAD, IRD and FAO”, says Benoît Bertrand.

A species that is both very strong and very weak

This genealogy, stemming from a single super individual, is a source of both great strength and great weakness for the species C. arabica . “If we subject it to heat stress, we observe that it adapts well to various thermal regimes by modulating gene expression in accordance with its two sub-genomes”, Benoît Bertrand observes (3, 4). It also reacts very well to increases in atmospheric CO2 levels.

“Unfortunately, its low polymorphism, in other words its limited genetic diversity, makes it susceptible to epidemics.” To bolster the survival of this very threated species (5), Benoît Bertrand recommends breeding programmes based on introgressing resistance and tolerance genes from the 130 wild coffee species held by biological resource centres (BRCs) belonging to CIRAD and IRD in Réunion and French Guiana. “In the medium and long term, our strategy is to pursue the introgression** of traits from C. canephora into Arabica ” says WCR molecular breeder Dr. Lucile Toniutti. “That work has begun .” However, Benoît Bertrand stresses that “generally speaking, the resources currently devoted to tackling the main challenges of the future are largely insufficient in view of the economic importance of coffee”.

This study was funded by two leading European roasters, Luigi Lavazza S.P.A and Illycaffè S.P.A., and World Coffee Research (WCR).

How did the researchers go about their study?

The DNA of the most widely cultivated arabica variety (Bourbon) was sequenced by the team headed by Michele Morgante at the Institute of Applied Genomics in Italy. The results were compared with sequences of the two parent species, Coffea eugenioides and Coffea canephora.

At the same time, 736 cultivated and wild arabica species were genotyped to estimate their genetic diversity, along with 35 canephora genotypes representative of the genetic diversity of that species. The study revealed very limited nucleotide diversity. The results were then validated using another recently sequenced arabica reference genome.

* Retrotransposon: Retrotransposons are endogenous DNA sequences capable of moving and, more importantly, multiplying in the host genome, giving rise to dispersed repeated sequences.

** Introgression: Transfer of a gene from one species to the gene pool of another after hybridization followed by repeated backcrossing with one of the parent species.


A single event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm. Scientific Reports – Nature

Other Bibliographic References

(1) Lashermes P, Combes M-C, Robert J, Trouslot P, D’Hont A, Anthony F, Charrier A. 1999. Molecular characterisation and origin of the Coffea arabica L. genome. Mol Gen Genet MGG 261: 259–266.

(2) https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-313X.2011.04590.x

(3) Genomic expression dominance in the natural allopolyploid Coffea arabica is massively affected by growth temperature

(4) Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures

(5) https://advances.sciencemag.org/content/5/1/eaav3473