Tải bản đầy đủ (.pdf) (24 trang)

PAN-GENOME OF RAPHANUS HIGHLIGHTS GENETIC VARIATION AND INTROGRESSION AMONG DOMESTICATED, WILD, AND WEEDY RADISHES

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.19 MB, 24 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

Pan-genome of Raphanus highlights genetic

variation and introgression among domesticated,wild, and weedy radishes

Xiaohui Zhang

1

, Tongjin Liu

1,2

, Jinglei Wang

1,3

, Peng Wang

1

, Yang Qiu

1

, Wei Zhao

1

,

Shuai Pang

4

, Xiaoman Li

1

, Haiping Wang

1

, Jiangping Song

1

, Wenlin Zhang

4

, Wenlong Yang

1

,Yuyan Sun

1,3

and Xixiang Li

1,

*

Post-polyploid diploidization associated with descending dysploidy and interspecific introgression drivesplant genome evolution by unclear mechanisms. Raphanus is an economically and ecologically importantBrassiceae genus and model system for studying post-polyploidization genome evolution and introgres-sion. Here, we report the de novo sequence assemblies for 11 genomes covering most of the typicalsub-species and varieties of domesticated, wild and weedy radishes from East Asia, South Asia, Europe,and America. Divergence among the species, sub-species, and South/East Asian types coincided withQuaternary glaciations. A genus-level pan-genome was constructed with family-based, locus-based,and graph-based methods, and whole-genome comparisons revealed genetic variations ranging fromsingle-nucleotide polymorphisms (SNPs) to inversions and translocations of whole ancestral karyotype(AK) blocks. Extensive gene flow occurred between wild, weedy, and domesticated radishes. High fre-quencies of genome reshuffling, biased retention, and large-fragment translocation have shaped thegenomic diversity. Most variety-specific gene-rich blocks showed large structural variations. Extensivetranslocation and tandem duplication of dispensable genes were revealed in two large rearrangement-rich islands. Disease resistance genes mostly resided on specific and dispensable loci. Variations causingthe loss of function of enzymes modulating gibberellin deactivation were identified and could play animportant role in phenotype divergence and adaptive evolution. This study provides new insights into thegenomic evolution underlying post-polyploid diploidization and lays the foundation for genetic improve-ment of radish crops, biological control of weeds, and protection of wild species’ germplasms.

<i>Key words: Raphanus, pan-genomes, post-polyploid diploidization, gene flow, speciation</i>

Zhang X., Liu T., Wang J., Wang P., Qiu Y., Zhao W., Pang S., Li X., Wang H., Song J., Zhang W., Yang W., Sun

<i>Y., and Li X. (2021). Pan-genome of Raphanus highlights genetic variation and introgression among domesticated,</i>

wild, and weedy radishes. Mol. Plant. 14, 2032–2055.

Cruciferae is an economically and ecologically important family ofplants containing 321 genera and3700 species, including manyimportant vegetable, oilseed, condiment, and industrial crops (Al-Shehbaz, 2012;Huang et al., 2016). Due to the cross-pollinationand susceptibility to interspecific hybridization nature of the fam-ily, the genealogical relationships are indistinct at the genus level

(Al-Shehbaz et al., 2006). As important cruciferous vegetable,

<i>oilseed, and forage crops worldwide, Raphanus species are</i>

well defined without ambiguous members and are clearlyseparated from genetically close and morphologically similar

<small>Published by the Molecular Plant Shanghai Editorial Office in associationwith Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and CEMPS, CAS.</small>

2032 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

Research Article

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

<i>genera such as Brassica, benefitting from their distinctive fruitstructure. Both Brassica and Raphanus have experienced</i>

similar ancestral whole-genome triplication processes (Chenget al., 2017<i>). Compared with Brassica, which contains more</i>

than 40 species and evolved diversified karyotypes (Al-Shehbazet al., 2006<i>), Raphanus contains only two species and several</i>

sub-species with the same karyotype of nine chromosomes,making the genus more amenable to genomic comparison anda good model for studying genome evolution in relation to post-polyploid diploidization, a mechanism widely noted in plants(Mandakova and Lysak, 2018<i>). The wild radish Raphanusraphanistrum contains two well-recognized sub-species (ssp.)landra and raphanistrum, which are native to the Mediterranean</i>

area, and the second is distributed worldwide (Sahli et al.,2008). They are becoming seriously invasive weeds, not onlycompeting with but also genetically polluting crops byinterspecific hybridization and gene flow (Chevre et al., 2007;

Charbonneau et al., 2018). The descendants of wild andcultivated radish hybrids have replaced their original parentalpopulations and have become regionally important weeds inCalifornia, USA (Hegde et al., 2006; Heredia and Ellstrand,2014). Conversely, wild radishes have successfully colonized avariety of habitats, including infertile, saline, and dry lands, andhave evolved a rich pool of resistance genes (Kebaso et al.,2020). As environmental changes accelerate, wild relativesbecome important germplasm resources for crop improvement

<i>to overcome new biotic and abiotic stresses. Raphanus sativus</i>

contains several domesticated varieties, such as cherry belle

<i>radish (R. sativus var. radicula), black radish (R. sativus var.niger), rat’s tail radish (R. sativus var. caudatus), oilseed radish(R. sativus var. oleiformis), and the most widely planted cultivarof radish, R. sativus var. longipinnatus. The cherry belle radish</i>

and black radish originated in Europe, where wild radishes still

<i>grow in nature. R. sativus var. longipinnatus radishes have</i>

diversified in size, shape, color, and quality, and are widelygrown in Asia. Rat’s tail radish grows mainly on the South Asiansubcontinent. Recent studies have shown that these cultivatedradish varieties underwent several independent domesticationprocesses (Yamagishi and Terachi, 2003; Kobayashi et al.,2020). All of these species and varieties can cross with each

<i>other, which makes Raphanus an ideal model for investigating</i>

gene flow and co-evolution among crops, weeds, and their wildrelatives (Campbell et al., 2009).

<i>One wild radish and three accessions of longipinnatus group </i>

rad-ishes were sequenced with the next-generation sequencing form, which provided useful resources for gene mapping andcloning as well as other genetic analyses (Kitashiba et al., 2014;

plat-Moghe et al., 2014; Mitsui et al., 2015; Zhang et al., 2015;

Jeong et al., 2016). However, only 50%–70% of the genomeswere assembled based on short reads, and their completenessand integrity could not satisfy the requirements for identifyinglarger structural variations (SVs), such as presence/absencevariations (PAVs), copy number variations (CNVs), inversions,and translocations, which contribute greatly to genetic diversityand play key roles in the regulation of important agronomictraits (Hurgobin et al., 2018; Wang et al., 2018). Recently, agiant root radish cultivar, Sakurajima Daikon, was sequencedwith single-molecule real-time technology, which improved thecontig length, and 69.3% of the assembled contigs were mappedto nine pseudomolecules (Shirasawa et al., 2020). This genome

cannot represent the genetic variations among the different

<i>species, sub-species, and varieties of Raphanus from the</i>

perspective of pan-genomics. The pan-genome was proposedto represent an entire gene repertoire including core genes anddispensable genes of a species or even a larger taxon (Taoet al., 2019; Bayer et al., 2020; Della Coletta et al., 2021).Recently, rapidly increasing numbers of pan-genomic studieshave been conducted on crops and vegetables, includingsoybeans (Liu et al., 2020), rice (Zhao et al., 2018), maize(Hirsch et al., 2014), wheat (Montenegro et al., 2017), rapeseed(Song et al., 2020a), tomatoes (Gao et al., 2019), and cabbage(Golicz et al., 2016;Bayer et al., 2019), which have shed newlight on genomic evolution and plant improvement. Pan-genomic studies with multiple high-quality reference genomesrepresenting different species, sub-species, and varieties arenecessary for a better understanding of the genetic basis of evo-lution, domestication, phenotypic diversity, and agronomic trait

<i>determination in Raphanus.</i>

In this study, we assembled 11 high-quality genomes coveringwidely sourced typical sub-species and varieties of domesti-cated, wild, and weedy radishes. Using these assemblies, agenus-scope pan-genome was constructed by family-, locus-,and graph-based methods. Post-polyploidization genomic evo-lution, introgression, and genetic variations for important agro-

<i>nomic traits were elucidated in Raphanus by genome-wide</i>

comparative analysis.

Genome assembly and annotation

<i>To elucidate the evolution and pan-genome architecture of phanus, de novo genome sequencing was performed for 11</i>

Ra-radish accessions, including seven cultivated Ra-radish varieties,two wild radish sub-species, one semi-wild radish, and oneinter-species hybrid-derived weedy radish, using a combinationof PacBio (N50 = 22.5–35 kb), Illumina, BioNano Direct Labeland Stain (DLS), and high-throughput chromosome conformationcapture (Hi-C) technologies (Figure 1A–1C and SupplementalFigures 1–4; Table 1 and Supplemental Tables 1–7;

Supplemental Notes). For each genome, the chromosomescovered 79.89%–99.99% of the estimated genome size, with418.24–514.48 Mb (92.61%–99.98%) of N-free nucleotides andcontig N50 values reaching 1.89–18.72 Mb (SupplementalTables 8 and 9), which are significantly better than thecorresponding values of previously published radish genomeassemblies (Kitashiba et al., 2014; Moghe et al., 2014;Mitsuiet al., 2015;Zhang et al., 2015;Jeong et al., 2016;Shirasawaet al., 2020). BUSCO assessment showed that 91.9%–95.8% ofthe universal single-copy orthologs were fully covered by thegenome (Supplemental Table 9). The Circos and dotPlotlymaps showed good co-linearity of chromosomes between thepresent assemblies and previously released genomes, indicatingthe accuracy of the contig orientation (Supplemental Figures 5

and6). Benefitting from BioNano DLS technology, the Xin-li-meichromosomes included a total of 58.99 Mb of centromere regions(although containing gaps) where large sections have not previ-ously been anchored (Supplemental Table 10). Except for theupper end of Chr. 3 and the bottom end of Chr. 7, all of thetelomeres of the nine chromosomes were successfullyMolecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2033

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

assembled, further indicating the good coverage of the genomeassembly (Figure 1C andSupplemental Table 11).

Repeats accounted for a total of 194.56–268.81 Mb (46.51%–53.94% of the genome assembly) of each genome (Table 1

and Supplemental Table 12; Supplemental Notes). Thepercentages were higher than or comparable with those inprevious reports (Jeong et al., 2016; Kitashiba et al., 2014;

Mitsui et al., 2015; Shirasawa et al., 2020; Zhang et al.,2015). Copia retrotransposons, Gypsy retrotransposons, andDNA transposons accounted for 13.57%–23.51%, 9.67%–13.02%, and 5.31%–6.74% of each genome, respectively(Supplemental Table 12). The long terminal repeat (LTR)assembly index (LAI) was 18.41–31.06 in each of the 11genomes (Table 1), indicating the high quality of the genomeassembly (Ou et al., 2018). Approximately one-third of thefull-length Copia and Gypsy retrotransposons were newly in-serted; the most recent insertion peak (7% of the total)occurred 60–80 thousand years ago (kya) (SupplementalFigure 7), indicating that retrotransposons were recently quiteactive in radish genomes. We then identified 1259–2905tRNAs, 1463–1645 snRNAs, 1899–6921 rRNAs, and 236–275miRNAs (Supplemental Table 13), which were similar to or

slightly greater than the numbers in the previous report(Zhang et al., 2015).

A comprehensive strategy combining protein-homology-basedprediction, RNA sequencing (RNA-seq)-based prediction, andab initio prediction was used to annotate the protein-codinggenes (Supplemental Notes;Supplemental Tables 14and 15).For each of the 11 radishes, a total of 42 319–52 190 protein-coding genes were predicted (Table 1 and SupplementalTable 16), with 63.53%–74.57% supported by RNA-seq data(Supplemental Table 17). The mean lengths of transcripts andcoding sequences (CDSs) were 1401–1464 and 1243–1305 bpin RS01-RS10, respectively, which were comparable with those

<i>in the Arabidopsis genome and longer than those reported in </i>

cab-bage, Chinese cabcab-bage, and formerly released radish genomes(Supplemental Figure 8). Benefitting from the full-length tran-scripts generated by Iso-seq, the mean transcript and CDSlengths were 1833 and 1375 bp in Xin-li-mei, with further exten-sion of 373–432 and 70–132 bp, respectively, to the other 10 rad-ishes, indicating that Iso-seq transcripts are powerful forimproving protein-coding gene annotation. Specifically, thetranscripts extended primarily into the 5<sup>0</sup>and 3<sup>0</sup>untranslated re-gions, and the coding length was improved to some extent

Figure 1. Geographic origin, genome features and phylogenetic tree of 11 radish accessions.

(A) Geographical origins of the accessions. RS00, RS02, and RS04 originated in Beijing, Yunnan, and Jiangsu in China, respectively; RS01 originated inIndia; and RS03 originated in Europe. However, the details are unclear regarding which accessions were introduced to Japan, where we obtained thisaccession. RS05 and RS07 originated in Japan, and RS06, RS08, RS09, and RS10 originated in Russia, Italy, Slovakia, and the US, respectively.(B) Genome-wide contact matrix of the Xin-li-mei genome. The color intensity indicates the frequency of contact between two 100-kb loci.(C) Circos plot showing the basic features of the Xin-li-mei genome.

<i>(D) Phylogeny of the genus Raphanus. Divergence times were estimated by the K</i><sub>s</sub>values of 13 119 syntenic orthologous genes. EQG, early stage of theQuaternary glaciations (beginning 2.4 mya); DG, Donau glaciations (1.5–1.3 mya); NG, Naynayxungla glaciations (0.5–0.78 mya).

2034 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

(Supplemental Table 16and Supplemental Figure 8). In total,95.30%–96.84% of the genes were functionally annotated by atleast one of the NR, eggNOG, gene ontology (GO), KyotoEncyclopedia of Genes and Genome (KEGG), and Swiss-Prot da-tabases (Supplemental Table 18).

Genomic variations

The 10 genome assemblies were aligned to the Xin-li-meigenome to detect variations among the 11 genomes. Betweeneach pair of genomes, 2.5–4.53 10<small>6</small>

single-nucleotide phisms (SNPs) and 0.55–1.043 10<small>6</small>small (1–100 bp) insertionsand deletions (indels) were identified, with an average of 5.6–10.1 SNPs and 1.2–2.3 indels per kb. Alignment of the Illuminareads with each of the genomes revealed a density of 0.18–7.76 3 10<small>3</small>single-nucleotide variations and 1.1–3.6 3 10<small>2</small>indel-like homozygous mismatches per kb in each genome(Supplemental Table 25), which is three or two orders ofmagnitude lower than the density of the SNPs and indelsbetween genomes, indicating the high quality of the genomeassembly and the reliability of the SNP and indel calling.A total of 20 290–29 643 medium-length (100–1000 bp) SVs,including 7186–10 400 deletions (15.42–24.63/Mb), 9502–12 568 insertions (20.52–30.01/Mb), and 3602–6868 duplications(7.80–16.29/Mb), as well as 4983–14 579 long (>1 kb) PAVs(totalling 12.71–44.06 Mb), were detected (Supplemental

polymor-Tables 26–29). The density of all types of variations increasedwith increasing genetic distance, indicating the high accuracyof the variation calling (Supplemental Figure 9). Totals of12.12%–23.47%, 12.14%–28.47%, 7.08%–11.04%, and 2.1%–5.95% of the genes were potentially strongly affected by SNPs,indels, SVs, and PAVs in each genome, respectively, due tovariations causing start/stop codon gain or loss, frameshifts, splice donor/acceptor variants, exon loss, gene fusion,and truncation. These genes were enriched in metabolicprocesses, responses to stress, and external/endogenousstimuli, and other terms (Supplemental Tables 21,24,27, and

29;Supplemental Figure 9;Supplemental Notes).

A total of 1618 inversions covering 83.36 Mb were discoveredbetween the Xin-li-mei genomes and the 10 other radish ge-nomes. Each genome had 134–193 inversions covering 2.7–21.1 Mb, including 9–49 large (>50 kb) inversions(Supplemental Table 30). The wild radishes contained more andlonger inversions (Supplemental Table 31).

Thousands of translocations were detected, including 16 longsegment (0.13–8.67 Mb) translocations harboring 29–708 contig-uous genes. Fourteen of the 16 translocations were detectedin wild and weedy radishes, including nine, four, and one inRS09, RS10, and RS08, respectively. RS09 harbored 56.3%of the events and 72.4% of the length of the total long-segment translocations. Eleven of the 16 translocations wereAccession Species (cultivar)

Assemblysize (Mb)

No. ofannotatedgenes

LTRassemblyindex (LAI)

Table 1. Statistics of the genomic assembly and annotation of 11 Raphanus genomes.

Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2035

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

inter-chromosomal (Supplemental Table 32). The contents ofinversions and translocations were comparable with those of

<i>the close relative Brassica napus (</i>Song et al., 2020a).

The high level of co-linearity in each comparison of the dent genome assemblies of several different varieties carried outby different research groups using independent genetic maps,Hi-C maps, and BioNano optical maps indicates that the ge-

<i>indepen-nomes of Raphanus did not experience large structural </i>

rear-rangements (Supplemental Figures 5and6). The majority of theinversions and translocations are relatively short (<50 kb),which is consistent with findings in rapeseed and soybeans(Song et al., 2020a;Liu et al., 2020). These kinds of SVs can bewell covered by our long contigs (N50 ỗ [4.07, 18.72] Mb,except in RS01, where N50 = 1.89 Mb). These 11 high-quality ge-nomes provide valuable resources for screening the panorama ofgenetic variations among different species, sub-species, and va-

<i>rieties of Raphanus. However, the assembly of more wild </i>

acces-sions and application of individual Hi-C to each genome wouldstill be valuable for confirming the identified SVs and for detectingnovel inversions and translocations that may have been missed inthis study.

Phylogeny and divergence time

A phylogenetic tree was generated using 4464 single-copy thologous genes (Supplemental Figure 10<i>). R. sativus andR. raphanistrum were on different branches. The California</i>

or-weedy radish (RS10) was situated at the junction of the

<i>R. raphanistrum and R. sativus groups, which is consistent with</i>

its genealogical mixture of wild and domesticated radishes; incontrast, the Japanese semi-wild accession (RS07) belonged to

<i>the R. sativus group, indicating its probable descendancy fromthe domesticated radish. Within R. sativus, the phylogenetic rela-</i>

tionships were consistent with geographical origin rather thanvarieties or phenotypes, consistent with the result obtainedfrom the double-digest restriction site-associated DNAsequencing (ddRAD-seq) of a larger population (Kobayashiet al., 2020). The divergence times were calculated by thesynonymous substitution rates (K<small>s</small>) of 13 119 syntenic genes(Figure 1D and Supplemental Figure 11), which showed that

<i>Raphanus (RR genome) and Brassica (AA genome) diverged</i>

6.33 million years ago (mya), similar to previous reports (Chenget al., 2017;Jeong et al., 2016). Approximately 1.8 (1.35–2.7)mya and 1.67 (1.25–2.5) mya, in the early stage of theQuaternary glaciation (beginning at 2.4 mya), the species

<i>R.raphanistrum</i> and <i>R.sativus</i> and the sub-species

<i>R. raphanistrum ssp. raphanistrum and ssp. landra began to</i>

diverge. Shortly thereafter, the Asian and European types of

<i>R. sativus and R. sativus var. niger and var. radicula separated</i>

at 1.49 (1.1–2.2) mya and 1.33 (1.0–2.0) mya, during the Donau

<i>glaciation (1.5–1.3 mya). R. sativus var. caudatus, a South Asian</i>

(Indian) rat’s tail radish cultivar, diverged from the East Asian(China and Japan) accessions 0.53 mya, during the Naynayxun-gla glaciation (0.5–0.78 mya). The co-incidence of divergencetimes and glaciations indicates that climate change may haveshaped the evolution of this genus. Although the phenotypesdiverged significantly, leading to the recognition of severaldifferent varieties, the accessions from China and Japan wereclosely related and diverged below the detection threshold ofour method (peak value of K<small>s</small>< 10<sup>3</sup>; divergence age < 30 kya)

(Figure 1D and Supplemental Figure 11B). Ancient peoplesettled in Hemudu (Zhejiang, China) and began to cultivate riceand other crops 6.3 kya (Zheng et al., 2021). Radish was

<i>recorded in The Book of Songs, which is a collection of poems</i>

from the early Western Zhou Dynasty to the middle of theSpring and Autumn period (11th to 6th century BC) (Zhou andChen, 1991), indicating that it was domesticated 6.3–2.6 kya inChina. These results indicated that radishes were domesticated

<i>independently at least four times: R. sativus var. niger and var.radicula were domesticated independently in Europe, R. sativusvar. caudatus was domesticated in South Asia, and the diverse</i>

forms of radishes originating in China, Japan, and Korea weredomesticated from a common ancestor.

Gene flow among wild, domesticated, and weedyradishes

We analyzed the gene flow between the 11 genomes based onthe D-test. The strongest signals were observed between wildradishes (RS08 and RS09) and European cultivars (RS03 andRS06) (Supplemental Table 33; Figure 2A and SupplementalFigure 2A), in accordance with the observation that these plantsgrow in the same habitat. Significant signals were also detected

<i>between R. raphanistrum ssp. landra (RS08) and Japanese</i>

types (RS05 and RS07) (Supplemental Figure 12B).

<i>R.raphanistrum</i> ssp. <i>raphanistrum</i> (RS09) showed nosignificant gene flow with Japanese radishes or other EastAsian accessions, which indicates that Asian radishes havedifferent gene exchange histories than European cultivars. Thedifferent sub-species and their different degrees of participationin introgression could have contributed to the divergence of Eu-ropean and East Asian radish cultivars. It is well known that theCalifornia weedy radish is the progeny of an inter-species hybrid

<i>of R. raphanistrum ssp. raphanistrum and R. sativus var radicula</i>

(Hegde et al., 2006; Heredia and Ellstrand, 2014<i>). The</i>

California weedy radish (RS10) indeed showed gene flow with

<i>both R. sativus var radicula (RS06) and R. raphanistrum</i>

ssp. <i>raphanistrum</i> (RS09) (Supplemental Figure 12C).

<i>R. raphanistrum ssp. landra (RS08) displayed gene flow with</i>

European cultivars (RS03 and RS06) rather than Californiaweedy radish (RS10) (middle column of SupplementalFigure 12<i>A), further demonstrating that R. raphanistrum ssp.raphanistrum and not R. raphanistrum ssp. landra was the wild</i>

ancestor of California weedy radish. Positive D-test signalswere also detected between a South Asian cultivar (RS01) andwild radishes (RS08 and RS09), between RS01 and weedyradish (RS10), and between RS01 and European-type radishes(RS03 and RS06) (Supplemental Figure 12D). RS10 is anaccession collected in the US that was separated from RS01by strict geographical isolation. Therefore, the positive D-testsignal between RS10 and RS01 may not reflect direct geneflow but could result from introgression footprints inherited fromthe RS09 and RS06 genomes, both of which are of theEuropean type. This phenomenon could reflect that the Indiansubcontinent experienced several rounds of population andtrade inflows from Europe, with the best known being the Dutchand British East India Company and British rule over India(1600–1947) (Ratcliff, 2016). However, the oilseed radish (RS02)displayed no significant gene flow with the rat’s tail radish, eventhough the reproductive parts of both were consumed and theirplanting regions were separated by a short aerial distance,2036 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

Figure 2. Gene flow in the genomes of the genus Raphanus.

<i>(A) Gene flow in Raphanus was determined based on ABBA-BABA analysis. The red arrows indicate gene flow, and the blue arrows indicate interspecific</i>

(B) Chromosome inversions and their introgressions. The red crosses on the tree indicate the deduced times and nodes of the inversions. The greenboxes show the sub-branches harboring the inversions. The red and blue arrows indicate hypothetical introgression events. The sequence length andnumber of orthologous genes are listed inSupplemental Table 34.

(C) Genome composition of the California weedy radish. The origin of the chromosome fragments was determined by the comparatively lower K<small>s</small>values oforthologous genes.

Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2037

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

possibly because human connectivity was blocked by themountains and forest between them. Within the domesticatedradishes, gene flow was observed between the European-originated black radish (RS03) and the Japanese Moriguchidaikon landrace (RS05), as well as between Moriguchidaikon and Yangzhou-Yanzhong (RS04), which is a landracefrom Yangzhou city in China (Supplemental Figure 12E).Interestingly, RS03 originated in Europe but was transferred toJapan and, from there, introduced to China. The gene flowbetween RS03 and RS05 may have been derived from geneticcontamination of the black radish after its introduction toJapan. Yangzhou city has been one of the most importanttransportation hubs for communication between China andJapan since the Tang Dynasty (618–907 A.D.) (Han and Guang,2019). The eminent monk Jianzhen (AD 688–763) was born inYangzhou and from there traveled to Japan, where he broughtBuddhist culture, medicine, agronomy, and building technologyfrom China (Yang, 2011). Yangzhou has also been one of themost important port cities where Japanese envoys and monkshave arrived to and departed from China since the eighthcentury (Zhang, 2008). Gene flow between Moriguchi daikonand Yangzhou-Yanzhong (RS04) revealed the long-lastingcommunication between Yangzhou city and Japan. Therefore,the gene flow footprints in the genomes faithfully recorded thehistory of both nature and human activity.

In addition to the SNP-based gene flow signal, large SVs, such asinversions, also provided evidence of genome introgression. Tolimit the pseudo-inversions derived from assembly errors, weselected paracentric inversions that appeared at least twice inthe 11 genomes. A total of 141 inversion loci were displayed in2–9 of the 10 radish genome versus Xin-li-mei genome compari-sons (Supplemental Table 34). Among the inversions, 11 werelonger than 50 kb and harbored more than 5 genes. Such largeparacentric inversions arose approximately once every 0.16million years (Figure 2B), which is 10 times faster than the

<i>estimated rate in Oryza (</i>Stein et al., 2018), possibly becausethe mesohexaploidy of the radish genome confers highertolerance to inversions and their subsequent accumulation.In addition, the cross-pollination habit and interspecifichybridization-prone nature of radishes may boost inversionevents due to the genome shock that commonly occurs in distanthybrid genomes (Bashir et al., 2018). Among the 11 inversions,only 2 showed a pattern perfectly matching that of thephylogenetic tree (Figure 2B). Because large inversions are rarein the genome, dual independent occurrence in the same placeis almost statistically impossible. Therefore, we estimated thatthe inversions displayed on different sub-branches originatedfrom introgression. Four inversion loci showed gene flow from Eu-ropean varieties to East Asian cultivars. One inversion locusshowed gene flow between wild and cultivated types(Figure 2B). For the remaining four inversions, estimating theorigin and gene flow was too complicated. These results areconsistent with the fact that East Asia is the main area of radishcultivation and breeding; European types have been adoptedand could be used as genetic resources in the breedingprocess to improve Asian types. The D-statistic was notsignificant because of backcrossing and background purifyingprocesses, while the inversions were retained, possibly due tobeneficial character linkage and recombination inhibition, andprovided sound evidence of genome introgression. The

introgression of chromosome inversions frequently leads to theinheritance of complex traits governed by groups of loci tightlylinked as a single Mendelian locus, which are termedsupergenes (Jay et al., 2018).

The genome composition of the California weedy radish

The weeds that evolved from hybrids between crops and wildspecies cause not only agricultural problems but also ecosystemproblems by influencing local biodiversity and replacing nativeplants in their habitats. To reveal the genomic composition of Cal-ifornia weedy radish, the K<small>s</small>values of orthologous genes betweenRS10 and RS06 and between RS10 and RS09 were compared. Atotal of 38.2-Mb fragments were inferred to be derived from

<i>R. raphanistrum ssp. raphanistrum because the K</i><small>s</small> of RS10versus RS09 was significantly lower than that of RS10 versusRS06. Based on the opposite ratio of K<small>s</small>between the two pairs,

<i>245.2-Mb segments were inferred to be derived from R. sativusvar. radicula (</i>Figure 2C andSupplemental Table 35). Therefore,the overall California weedy radish genome was composed ofwild and domesticated genomes at a ratio of 1:6.4, which isclose to the theoretical ratio of BC<small>2</small>descendants of 1:7. Theseresults support the backcrossing of the inter-species hybrid of

<i>R. raphanistrum ssp. raphanistrum and R. sativus var. radiculafor two generations with R. sativus var. radicula and acquisition</i>

of a competitive advantage under California ecological tions. The competitive advantage among weedy radishes maybe derived from their higher biomass (Campbell et al., 2009).These findings provide a cautionary sign that the genes ofdomesticated crops invade the genomes of wild relativespecies on an unexpectedly large scale. Thus, a majorchallenge in maintaining the valuable gene pool in the wildrelatives of crops is to prevent gene flow from domesticatedplants.

condi-Shuffling and biased retention among sub-genomes

The radish genome has been triplicated, and 24 AK genomebuilding blocks have been mapped (Jeong et al., 2016). Tomaintain consistency with previous studies, we used the samemethod to dissect the AK blocks of our 11 genomes(Supplemental Tables 36 and 37). Excluding block G, whichcommonly retained only one copy, blocks D, I, P, S, and Tretained only two copies in some accessions, and all the otherAK blocks accumulated 3 or more copies in each of the 11radish genomes (Supplemental Table 37 and Figure 3). Thedistribution of the AK blocks on chromosomes was generally inagreement with that reported previously (Figure 3). The radish

<i>and Brassica chromosomes displayed rearrangements when</i>

compared with each other (Cheng et al., 2017). We comparedthe AK arrangement among the 11 radish genomes and foundthat the overall arrangement was consistent among thegenomes. However, a total of 28 entire-block translocationevents was observed at 14 non-redundant locations, including8 inter-chromosomal and 6 intra-chromosomal locations(Figure 3). RS09 and RS10 contained 7 and 6 of these entire-block translocations, respectively, representing higher fre-quencies than those in the other accessions (Figure 3 and

Supplemental Table 38), which is consistent with the findingthat RS09 contained the richest number of translocations(Supplemental Table 32). In addition to translocations, blocksB, N, and S were deleted from four locations, while blocks C, I,2038 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

O, S, W, and X were inserted into 8 loci (Supplemental Table 38).An inversion of blocks K-L was observed on Chr. 1 of black radish(RS03) (Figure 3and Supplemental Table 38). Of the 27 non-redundant locations exhibiting block variation, 16 (59.3%) werelocated in the most fractionated (MF2) sub-genome, followedby seven loci (25.9%) in the least fractionated (LF) sub-genomeand four loci (14.8%) in the medium fractionated (MF1) sub-genome, indicating that the MF2 sub-genome is still the most un-stable. The rearrangement of AK blocks dramatically shaped thechromosome landscape. These findings further indicate the use-

<i>fulness of the de novo assembly of multiple genomes.</i>

<i>Traced back to the Arabidopsis non-redundant genes, the overall</i>

retention rates of the LF, MF1, and MF2 blocks were 31%–36%,22%–25%, and 19%–21% in each radish genome, respectively(Supplemental Table 39). In the AK blocks, many genes wereretained or lost in different genomes (Supplemental Table 36).We incorporated the 11 genomes to obtain a suite of non-redundant pan-AK blocks, in which at least one of the 11 ge-

<i>nomes contained the corresponding Arabidopsis homolog. The</i>

pan-AK contained a total of 52%, 37.5%, and 33% of the genesin the LF, MF1, and MF2 blocks, respectively, which were muchhigher than the percentages observed in each individual genome

(Figure 4A andSupplemental Table 39). A total of 60.4%–69.4%,58.6%–67.1%, and 56.6%–62.9% of the LF, MF1, and MF2 pan-

<i>AK genes, respectively, were retained in situ in each of the 11 </i>

ge-nomes (Figure 4A andSupplemental Figure 13), which indicated

<i>that, after establishment of the Raphanus genus, the AK blocks</i>

continued to undergo fractionation. The genes were lost fromthe AK blocks in two ways: biased retention and genomeshuffling. We counted the total number of genes that were

<i>homologous to those in Arabidopsis AK blocks in each radish</i>

genome without considering their chromosomal positions. Intotal, each genome contained 73%–83% of the total pan-AKgenes, which indicated that 17%–27% of the pan-AK geneshad been lost in each genome through biased retention. Incontrast, 11%–19% of the pan-AK genes translocated from theoriginal blocks to other positions through genome shuffling(Figure 4B and Supplemental Table 40). The rates of biasedretention and shuffling of AK blocks diverged significantly withinand among different species. Blocks E, N, V, and Wexperienced a weak biased retention process but showed ahigh level of genome shuffling. Blocks I, J, L, O, P, Q, S, and Ushowed high levels of biased retention but experiencedrelatively low rates of genome shuffling (Figure 4B). Extensiveand ongoing genome shuffling and biased retention are the

Figure 3. The 24 AK genome building blocks mapped to and rearranged in the chromosomes of 11 Raphanus plants.

A–Z indicate the AK blocks. LF, least fractionated blocks; MF1, medium fractionated blocks; MF2, most fractionated blocks. The translocation of blocks isindicated by arrows.

Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2039

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

fundamental evolutionary driving forces of the diploidization ofhexaploid genomes (Mandakova and Lysak, 2018). Thisprocess generates new divergent genetic combinations inoffspring populations but also results in gene loss in individuals.Benefitting from this mechanism, thousands of species ofcruciferous plants have evolved and occupy nearly all types ofniches globally while maintaining relatively small genomes,despite experiencing several rounds of genome duplication(Mandakova and Lysak, 2018;Huang et al., 2020).

The locus- and family-based pan-genomes

We constructed the radish pan-genome using locus- and based methods. Using the locus-based method, the 11 genomeswere constructed into a pan-genome containing 150 757 non-redundant gene loci, which is approximately triple the size ofeach single genome (Figure 5A and Supplemental Figure 14).The 24 539 gene loci retained in all 11 genomes were termedthe core loci, which accounted for only 16.3% of the pan-locibut half of each genome. The 59 726 loci (39.6% of pan) wereresident in 2–10 genomes and were determined as dispensableloci, which occupied 27.3%–42.5% of each genome. Eachgenome harbored 3113–9793 (6.6%–20.1%) loci without allelesin other genomes, which comprised a total of 66 492 lineage-

family-specific loci, accounting for 44.1% of the pan-genome(Figure 5B and 5C). The relatively small fraction of core loci andthe unsaturated pan-loci (Figure 5D) indicated the high

<i>plasticity of the genomes within Raphanus and explained the</i>

significant variation in genome size and the divergence of thetotal gene number by up to 20% among the 11 genomes(Table 1). Theoretically, each radish genome can be eitherreduced by half or multiplied several times by genome-assistedcrossing and selection. The core loci covered 83.2% of the uni-versal single-copy orthologs based on the BUSCO assessment(Supplemental Table 41), indicating that the core loci wereinvolved in basic life processes, while the dispensable andspecific loci were composed of redundant or non-essentialgenes.

To exclude the effect of the multi-copy genes derived frompaleoploidization, the pan-genome was reconstructed with afamily-based method. A total of 449 856 genes were incorporatedinto 41 952 families. The 15 147 families (36.11% of the totalfamilies) containing 293 006 genes (65.13% of the totalgenes) were considered core families, which were sharedamong all 11 genomes (Figure 5E and SupplementalFigure 15). Each accession contained 25 709–28 590 corefamily genes, accounting for 63.05%–70.19% of the genome

Figure 4. Retention rates of the 24 AK genome building blocks in Raphanus.

<i>(A) The retention of AK blocks of the pan-genome and single genomes in the genus Raphanus. LF, least fractionated blocks; MF1, medium fractionated</i>

blocks; MF2, most fractionated blocks.

<i>(B) Gene loss/retention and shuffling rates of 24 AK blocks in 11 Raphanus genomes. The diamonds and their surrounding boxes show the mean and SEof gene loss (blue) and the shuffling (red) rate of the AK blocks in Raphanus. The scattered points show the gene loss and shuffling rate of AKs in each ofthe 11 accessions. The P value was calculated using the paired t-test.</i>

2040 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

Figure 5. Pan-genome composition and evolution in the genus Raphanus.

(A) A pseudochromosome of the locus-based pan-genome. The red blocks show the alleles within the 11 genomes. The blue blocks show the absence ofalleles at the loci in the genomes.

(B) Statistics of the composition of the pan-genome loci.(C) Statistics of the locus composition of each genome.

(D) Statistics of the pan and core loci. The pan-loci increased while the core loci decreased with the addition of more genomes.(E) Statistics of the composition of the pan-genome families.

<i>(F) The CDS lengths of core, dispensable, and specific genes. The P value was calculated using the paired t-test.</i>

(G) The K<small>a</small>/K<small>s</small><i>statistics of the core and dispensable genes among the 11 genomes. The P value was calculated using the paired t-test.</i>

(H) Statistics of the pan-genomes and sub-pan-genomes. The pan-families increased while the core families decreased with the addition of more nomes. The pan-families are more sensitive to population structure than the core families.

ge-(I) Venn diagram showing the intersections and complementary sets of the core and specific genes classed by locus-based and family-based methods.

<i>(legend continued on next page)</i>

Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2041

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

(Supplemental Table 42). The 24 709 families (harboring 147 016genes) shared by 2–10 genomes were termed dispensablefamilies, accounting for 58.90% of the total families. Eachgenome contained 77–469 accession-specific families, whichincluded 315–2723 genes, accounting for 0.18%–1.12% of thetotal families. The CDS lengths of core genes were longer thanthose of the dispensable and specific genes on average(Figure 5F). The K<small>a</small>/ K<small>s</small> ratios of the orthologous genesamong the 11 plants were lower in core families than indispensable families (Figure 5G). These results indicate that thecore genes are more conserved and under stronger negativeselection pressure. In addition to these families, each genomealso contained 3290–6400 singleton genes (Figure 5H and

Supplemental Table 42).

We compared the locus- and family-based pan-genomes. A totalof 97.3% of the core families and 94.3% of the core loci overlap-ped between the two methods, indicating that the classificationof the core genes was reliable (Figure 5I). However, theintersection of the lineage-specific families (including singletons)and the lineage-specific loci decreased to 75.3% and 70.2% ofthe corresponding aggregates (Figure 5I). A total of 24.7% ofthe lineage-specific families and singletons resided in the sameloci as other genes, indicating that these genes diverged fromtheir alleles and that their sequence similarity decreased signifi-cantly, thus resulting in their recognition as different families.Although we cannot exclude annotation error, the variation speedwas significantly higher for the loci with fewer alleles than for thecore loci. The variation speed and the allele number per locusshowed a power function relationship (Figure 5J). One reason isthat the core and the highly retained genes are morefunctionally important and confer heavier mutation pressure.Another reason is that highly retained genes are more abundantin a population; thus, the mutations have a higher chance ofrecovery via crossing. However, 29.8% of the lineage-specificloci could be assigned to one of the core or dispensable families.These loci were derived from translocation, duplication, or biasedretention. These events have a linear inverse correlation with thenumber of genomes shared by the gene families (Figure 5K),indicating that the two pressures on mutations have no effecton translocation. The linear decline in the translocation ratecould be a result of the repulsion effect from the existing allele.Based on these two models, of the 46 672 genes assigned toboth the specific family/singleton and lineage-specific locus cat-egories, 32.3% experienced sequence differentiation, and 6%experienced translocation.

The GO enrichment results for core and dispensable genes aresummarized in Supplemental Table 43 and the SupplementalNotes. The core genes were enriched in single-organism pro-cess, biological regulation, developmental process, and otherfundamental life processes. The dispensable genes were en-riched in several metabolic processes of specific substrates,the regulatory networks of many subdivided processes and the

responses to biotic and abiotic stresses (SupplementalTable 43). These functional categories were consistent withthose reported in previous pan-genomic studies on other plants(Gordon et al., 2017;Wang et al., 2018;Zhao et al., 2018;Songet al., 2020a;Liu et al., 2020).

We then constructed the sub-pan-genomes of the domesticatedand wild radishes. The sub-pan-families increased while the sub-core-families decreased as more genomes were used for sub-pan-genome construction (Figure 5H and SupplementalTable 44). This result is consistent with those of studies on wildsoybeans (Li et al., 2014;Liu et al., 2020) and tomatoes (Gaoet al., 2019). The addition of wild radish to the domesticatedvarieties dramatically expanded the pan-families but did notchange the trendline of the core families (Figure 5H), indicatingthat dispensable genes were gained or lost faster than core

<i>genes after the divergence between R. raphanistrum andR. sativus. Compared with the whole pan-genome, the wild</i>

and domesticated radish sub-pan-genomes contained 3152and 1368 lineage-specific sub-core gene families, respectively.Compared with those in wild radish, the sub-core genes specificto domesticated lineages were enriched in GO terms related tophotosynthesis processes and lipid binding, among others(Supplemental Tables 45and46), indicating that domesticatedradishes have undergone convergent evolution to obtain ahigher carbon assimilation capacity. A similar case wasobserved for the genome of a tropical fruit, mango (Wang et al.,2020). This finding is consistent with the observation thatdomesticated radishes commonly have fewer leaves butproduce quickly enlarging storage roots, while wild radisheshave a creeping and bushy plant architecture without enlargedfleshy roots. We found that 43 families were retained in all threeof the wild genomes but absent in the domesticated genomes.These genes were functionally annotated as, for example, theATPase family, vacuolar protein sorting-associated protein, E3ubiquitin-protein ligase, ubiquitin-like modifier involved inautophagosome formation, mitogen-activated proteinkinase 4-like, senescence-associated protein, mitochondrialimport inner membrane, and late embryogenesis abundant pro-tein (Supplemental Table 47). The retention of these genes inthe wild radishes indicates the conservative mechanismcontrolling the rapid maturation of seeds and segmentalbreakage and abscission of the silique. These genes wereeliminated in the domesticated radishes because the traits theycontrolled were unfavorable in agriculture.

Accession-specific blocks and rearrangement-richislands

We detected accession-specific blocks harboring 5 or morecontinuous accession-specific genes. Finally, 229 mergedaccession-specific gene blocks (12–47 for each genome) withbordering allelic core genes were obtained (SupplementalTable 48). Among them, at least 67 translocation, 36 inversion,

(J) Mutation rate of genes. The abscissa axis shows the numbers of genomes containing the gene families. The red diamonds indicate the actual mutationrate, and the curve indicates the power regression.

(K) Translocation rate of genes. The abscissa axis shows the numbers of genomes containing the gene loci. The blue diamonds indicate the actualtranslocation rate, and the oblique line indicates the linear regression.

(L) The co-linearity of orthologous core and dispensable genes. The window shows the two rearrangement-rich islands (blocks) and a conservedchromosome segment on Chr. 3; only the orthologous genes from adjacent accessions are matched.

2042 Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021.

</div><span class="text_page_counter">Trang 12</span><div class="page_container" data-page="12">

and 40 insertion events were observed, accounting for29.3%, 15.7%, and 17.5% of the total blocks, respectively(Supplemental Figure 6); that is, 62.4% of the total blocks wereaccompanied by large SVs. Large SVs can inhibit chromosomesynapsis and crossing over in these regions, thereby blockingchromosome recombination-dependent mutation repair (Rowanet al., 2019), which results in significantly more divergenceaccumulation in the corresponding regions. In weedy radish(RS10), the numbers of inversions, translocations, and species-specific gene blocks were significantly greater than the estimatedvalues (Supplemental Table 49). These results indicated thatinterspecific hybridization and subsequent gene flow not onlyled to the exchange of existing genetic variation but alsocreated novel variations and accelerated the generation of newgenes.

The percentages of core, dispensable, accession-specific, andsingleton genes were non-uniformly distributed on the chromo-somes, with the dispensable, accession-specific, and singletongenes accumulating at much higher ratios than the core genesin pericentromeric regions (Supplemental Figure 17A), which is

<i>consistent with a previous finding in Brachypodium distachyon</i>

(Gordon et al., 2017). Excluding the bottom half of Chr. 3 andthe top part of Chr. 7, the chromosome patterns wereconserved among the 11 genomes. Significant accession-specific gene-rich islands were observed on the chromosomes,especially Chr. 3, where the chromosomes dramaticallydiverged among varieties. Chr. 3 contained two variation-rich(accession-specific gene-rich) blocks (length of 1.4–14.9 Mbfor block 1 and 4.7–14.9 Mb for block 2) separated by aconserved chromosome segment (7.6–11 Mb in length) ineach accession (Figure 5L and Supplemental Figure 17B;

Supplemental Table 50). The conserved chromosomesegments contained the core genes that were collinear amongaccessions (Figure 5L and Supplemental Figure 17B). The twovariation-rich blocks comprised few core genes but large frac-tions of dispensable and specific genes. The dispensable genesamong these accessions were characterized by many gain andloss, translocation, and inversion events (Figure 5L and

Supplemental Figure 17B). Therefore, we call these two blocks"rearrangement-rich islands," which are essentially the sameas, but longer than, the hotspots of rearrangements revealed

<i>in Arabidopsis (</i>Jiao and Schneeberger, 2020). Excluding block1 among the two wild radishes and one weedy radish, the K<small>a</small>and K<small>s</small>values among homologous genes were much higher inthe two blocks than in the central conserved segments. TheK<small>a</small>/ K<small>s</small> of genes in block 1 and block 2 reached 0.6–0.8, thelevel of which was significantly higher than the 0.2–0.3observed for the genes in the conserved segment(Supplemental Figure 17C), indicating that the genes in thetwo blocks were under less negative selection pressure andevolved more quickly than the syntenic genes, which isconsistent with the findings in <i>Arabidopsis</i> (Jiao andSchneeberger, 2020). We then extracted the TEs within theblock 1-central-block 2 structure, as well as the bordering seg-ments. The counts of total transposable element (TEs) per Mbwere significantly lower in the two blocks than in the central re-gion and two bordering segments (Supplemental Figure 18).This is contrary to the findings in other chromosomal regionsand the general assumption that rearrangement regions arealways accompanied by high TE contents (Gordon et al.,

2017; Jiao and Schneeberger, 2020). The majority of thedifferences originated from DNA transposons, whichdecreased from 183.8 to 261.5 copies per Mb in the centraland bordering segments to 8.0–24.8 copies per Mb in theblocks. The Copia and Gypsy LTRs showed no significantdifferences between the blocks and the conserved centralsegments (Supplemental Figure 18). These results indicatedthat the deletional translocation of DNA transposons alongwith their constituent genes played an important role in theformation of these two rearrangement-rich islands. The overallgene densities in the two blocks were comparable with (infour accessions) or higher than (in seven accessions) those inthe central segments (Supplemental Table 51). In the twoblocks, most of the genes were organized into TE-free tandemlyrepeated structures (Supplemental Figure 19). These resultsindicated that not only the deletion but also the multiplicationof protein-coding genes occurred in the two blocks. The major-ity of the genes in these two blocks were functionally uncertainbased on homologous annotation (Supplemental Table 52),

<i>indicating that they were newly evolved genes or </i>

Raphanus-specific genes that have received little attention to date.

Graph-based pan-genome

<i>To construct a graph-based pan-genome of Raphanus, we</i>

merged the total SVs (PAVs > 50 bp) from each of the RS10 genomes compared with RS00 into a set of non-redun-dant SVs (PAVs). The number of SVs in this study (26 3 10<small>3</small>per sample) is similar to the number in soybeans (25.8 310<small>3</small> per sample) (Liu et al., 2020). Because the genome size(500 Mb) is half of that of soybeans (1 Gb), the SV density

<i>RS01-of Raphanus is approximately twice that RS01-of the soybean</i>

genome. Both the non-redundant insertion (presence) and tion (absence) numbers increased as more genomes wereadded, with the trend tending to flatten (Figure 6A), which issimilar to the findings in soybeans (Liu et al., 2020). Theshared SVs (PAVs) declined upon the addition of moregenomes, as was revealed in soybeans, but we did notclassify the SVs as core, dispensable, or private. Theinsertions (presences)/deletions (absences) shared by all 10samples should be considered specific deletions (absences)/insertions (presences) in the reference genome (RS00) insteadof core insertions (presences)/deletions (absences) in thepopulation. Therefore, we transformed the 6:510:1-frequencyinsertions (presences)/deletions (absences) to 5:61:10-frequency deletions (absences)/insertions (presences) andclassed the SVs into high-frequency (4:7 and 5:6), low-frequency (2:9 and 3:8), and specific (1:10) categories(Figure 6B). The number of specific SVs diverged significantlymore than the number of high-frequency SVs among differentaccessions. The wild radishes (RS09, RS08) contained manymore specific SVs than the domesticated varieties. These re-sults are consistent with the findings in soybeans (Liu et al.,2020). The specific and low-frequency SVs were biasedtoward insertions (presence), indicating that insertion ratherthan deletion played a larger role in the structural variations in

<i>dele-the Raphanus genomes (</i>Figure 6B). After filtering out thePAVs containing more than 90% repetitive DNA, an integrativegraph-based pan-genome containing a total of 53.8 3 10<small>3</small>PAVs (329.1 Mb in total) was constructed using the Xin-li-mei(RS00) genome as the standard linear base reference genome.Molecular Plant 14, 2032–2055, December 6 2021ª The Author 2021. 2043

</div>

×