Towards complete and error-free genome assemblies of all vertebrate species

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1,2,3,4. To address this issue, the international Genome 10K (G10K) con...

Full description

Bibliographic Details
Published in:Nature
Main Authors: Rhie, Arang, McCarthy, Shane A., Fedrigo, Olivier, Damas, Joana, Formenti, Giulio, Koren, Sergey, Uliano-Silva, Marcela, Chow, William, Fungtammasan, Arkarachai, Kim, Juwan, Lee, Chul, Ko, Byung June, Chaisson, Mark, Gedman, Gregory L., Cantin, Lindsey J., Thibaud-Nissen, Francoise, Haggerty, Leanne, Bista, Iliana, Smith, Michelle, Haase, Bettina, Mountcastle, Jacquelyn, Winkler, Sylke, Paez, Sadye, Howard, Jason, Vernes, Sonja C, Lama, Tanya M, Grutzner, Frank, Warren, Wesley C., Balakrishnan, Christopher N., Burt, Dave, George, Julia M., Biegler, Matthew T., Iorns, David, Digby, Andrew, Eason, Daryl, Robertson, Bruce, Edwards, Taylor, Wilkinson, Mark, Turner, George, Meyer, Axel, Kautt, Andreas F., Franchini, Paolo, Detrich III, H. William, Svardal, Hannes, Wagner, Maximilian, Naylor, Gavin J. P., Pippel, Martin, Malinsky, Milan, Mooney, Mark, Simbirsky, Maria, Hannigan, Brett T., Pesout, Trevor, Houck, Marlys L., Misuraca, Ann, Kingan, Sarah B., Hall, Richard, Kronenberg, Zev, Sović, Ivan, Dunn, Christopher, Ning, Zemin, Hastie, Alex, Lee, Joyce, Selvaraj, Siddarth, Green, Richard E., Putnam, Nicholas H., Gut, Ivo, Ghurye, Jay, Garrison, Erik, Sims, Ying, Collins, Joanna, Pelan, Sarah, Torrance, James, Tracey, Alan, Wood, Jonathan, Dagnew, Robel E., Guan, Dengfeng, London, Sarah E., Clayton, David F., Mello, Claudio V., Friedrich, Samantha R., Lovell, Peter V., Osipova, Ekaterina, Al-Ajli, Farooq O., Secomandi, Simona, Kim, Heebal, Theofanopoulou, Constantina, Hiller, Michael, Zhou, Yang, Harris, Robert S., Makova, Kateryna D., Medvedev, Paul, Hoffman, Jinna, Masterson, Patrick, Clark, Karen, Martin, Fergal, Howe, Kevin, Flicek, Paul, Walenz, Brian P., Kwak, Woori, Clawson, Hiram, Diekhans, Mark, Nassar, Luis, Paten, Benedict, Kraus, Robert H. S., Crawford, Andrew J., Gilbert, M.Thomas P., Zhang, Guojie, Venkatesh, Byrappa, Murphy, Robert W., Koepfli, Klaus-Peter, Shapiro, Beth, Johnson, Warren E., Di Palma, Federica, Marqués-Bonet, Tomàs, Teeling, Emma C., Warnow, Tandy, Marshall Graves, Jennifer, Ryder, Oliver A., Haussler, David, O’Brien, Stephen J., Korlach, Jonas, Lewin, Harris A., Howe, Kerstin, Myers, Eugene W., Durbin, Richard, Phillippy, Adam M., Jarvis, Erich D.
Other Authors: National Institutes of Health (US), National Human Genome Research Institute (US), Ministry of Health and Welfare (South Korea), Wellcome Trust, European Molecular Biology Laboratory, Howard Hughes Medical Institute, Rockefeller University, Robert and Rosabel Osborne Endowment, European Commission, National Library of Medicine (US) NLM, Korea Institute of Marine Science & Technology KIMST, Ministry of Oceans and Fisheries (South Korea), Alfred P. Sloan Foundation, Max Planck Society, Maine Dept of Inland Fisheries and Wildlife, National Science Foundation (US), University of Queensland, Science Exchange, Northeastern University (US), Federal Ministry of Education and Research (Germany), EMBO, National Key Research and Development Program (China), Qatar Society of Al-Gannas (Algannas), Katara Cultural Village, Government of Qatar, Monash University Malaysia, Hessen State Ministry of Higher Education, Research and the Arts, Ministry of Science, Research and Art Baden-Württemberg, Agency for Science, Technology and Research A*STAR (Singapore), European Research Council, Ministerio de Ciencia, Innovación y Universidades (España), Obra Social la Caixa, Generalitat de Catalunya, Irish Research Council, Danish National Research Foundation, Australian Research Council
Format: Article in Journal/Newspaper
Language:English
Published: Springer Nature 2021
Subjects:
Online Access:http://hdl.handle.net/10261/251905
https://doi.org/10.1038/s41586-021-03451-0
id ftcsic:oai:digital.csic.es:10261/251905
record_format openpolar
institution Open Polar
collection Digital.CSIC (Spanish National Research Council)
op_collection_id ftcsic
language English
description High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1,2,3,4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors’ contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (864203), MINECO/FEDER, UE (BFU2017-86471-P), Unidad de Excelencia María de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna’s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. With funding from the Spanish government through the "Severo Ochoa Centre of Excellence" accreditation (CEX2018-000792-M). Peer reviewed
author2 National Institutes of Health (US)
National Human Genome Research Institute (US)
Ministry of Health and Welfare (South Korea)
Wellcome Trust
European Molecular Biology Laboratory
Howard Hughes Medical Institute
Rockefeller University
Robert and Rosabel Osborne Endowment
European Commission
National Library of Medicine (US) NLM
Korea Institute of Marine Science & Technology KIMST
Ministry of Oceans and Fisheries (South Korea)
Alfred P. Sloan Foundation
Max Planck Society
Maine Dept of Inland Fisheries and Wildlife
National Science Foundation (US)
University of Queensland
Science Exchange
Northeastern University (US)
Federal Ministry of Education and Research (Germany)
EMBO
National Key Research and Development Program (China)
Qatar Society of Al-Gannas (Algannas)
Katara Cultural Village
Government of Qatar
Monash University Malaysia
Hessen State Ministry of Higher Education, Research and the Arts
Ministry of Science, Research and Art Baden-Württemberg
Agency for Science, Technology and Research A*STAR (Singapore)
European Research Council
Ministerio de Ciencia, Innovación y Universidades (España)
Obra Social la Caixa
Generalitat de Catalunya
Irish Research Council
Danish National Research Foundation
Australian Research Council
format Article in Journal/Newspaper
author Rhie, Arang
McCarthy, Shane A.
Fedrigo, Olivier
Damas, Joana
Formenti, Giulio
Koren, Sergey
Uliano-Silva, Marcela
Chow, William
Fungtammasan, Arkarachai
Kim, Juwan
Lee, Chul
Ko, Byung June
Chaisson, Mark
Gedman, Gregory L.
Cantin, Lindsey J.
Thibaud-Nissen, Francoise
Haggerty, Leanne
Bista, Iliana
Smith, Michelle
Haase, Bettina
Mountcastle, Jacquelyn
Winkler, Sylke
Paez, Sadye
Howard, Jason
Vernes, Sonja C
Lama, Tanya M
Grutzner, Frank
Warren, Wesley C.
Balakrishnan, Christopher N.
Burt, Dave
George, Julia M.
Biegler, Matthew T.
Iorns, David
Digby, Andrew
Eason, Daryl
Robertson, Bruce
Edwards, Taylor
Wilkinson, Mark
Turner, George
Meyer, Axel
Kautt, Andreas F.
Franchini, Paolo
Detrich III, H. William
Svardal, Hannes
Wagner, Maximilian
Naylor, Gavin J. P.
Pippel, Martin
Malinsky, Milan
Mooney, Mark
Simbirsky, Maria
Hannigan, Brett T.
Pesout, Trevor
Houck, Marlys L.
Misuraca, Ann
Kingan, Sarah B.
Hall, Richard
Kronenberg, Zev
Sović, Ivan
Dunn, Christopher
Ning, Zemin
Hastie, Alex
Lee, Joyce
Selvaraj, Siddarth
Green, Richard E.
Putnam, Nicholas H.
Gut, Ivo
Ghurye, Jay
Garrison, Erik
Sims, Ying
Collins, Joanna
Pelan, Sarah
Torrance, James
Tracey, Alan
Wood, Jonathan
Dagnew, Robel E.
Guan, Dengfeng
London, Sarah E.
Clayton, David F.
Mello, Claudio V.
Friedrich, Samantha R.
Lovell, Peter V.
Osipova, Ekaterina
Al-Ajli, Farooq O.
Secomandi, Simona
Kim, Heebal
Theofanopoulou, Constantina
Hiller, Michael
Zhou, Yang
Harris, Robert S.
Makova, Kateryna D.
Medvedev, Paul
Hoffman, Jinna
Masterson, Patrick
Clark, Karen
Martin, Fergal
Howe, Kevin
Flicek, Paul
Walenz, Brian P.
Kwak, Woori
Clawson, Hiram
Diekhans, Mark
Nassar, Luis
Paten, Benedict
Kraus, Robert H. S.
Crawford, Andrew J.
Gilbert, M.Thomas P.
Zhang, Guojie
Venkatesh, Byrappa
Murphy, Robert W.
Koepfli, Klaus-Peter
Shapiro, Beth
Johnson, Warren E.
Di Palma, Federica
Marqués-Bonet, Tomàs
Teeling, Emma C.
Warnow, Tandy
Marshall Graves, Jennifer
Ryder, Oliver A.
Haussler, David
O’Brien, Stephen J.
Korlach, Jonas
Lewin, Harris A.
Howe, Kerstin
Myers, Eugene W.
Durbin, Richard
Phillippy, Adam M.
Jarvis, Erich D.
spellingShingle Rhie, Arang
McCarthy, Shane A.
Fedrigo, Olivier
Damas, Joana
Formenti, Giulio
Koren, Sergey
Uliano-Silva, Marcela
Chow, William
Fungtammasan, Arkarachai
Kim, Juwan
Lee, Chul
Ko, Byung June
Chaisson, Mark
Gedman, Gregory L.
Cantin, Lindsey J.
Thibaud-Nissen, Francoise
Haggerty, Leanne
Bista, Iliana
Smith, Michelle
Haase, Bettina
Mountcastle, Jacquelyn
Winkler, Sylke
Paez, Sadye
Howard, Jason
Vernes, Sonja C
Lama, Tanya M
Grutzner, Frank
Warren, Wesley C.
Balakrishnan, Christopher N.
Burt, Dave
George, Julia M.
Biegler, Matthew T.
Iorns, David
Digby, Andrew
Eason, Daryl
Robertson, Bruce
Edwards, Taylor
Wilkinson, Mark
Turner, George
Meyer, Axel
Kautt, Andreas F.
Franchini, Paolo
Detrich III, H. William
Svardal, Hannes
Wagner, Maximilian
Naylor, Gavin J. P.
Pippel, Martin
Malinsky, Milan
Mooney, Mark
Simbirsky, Maria
Hannigan, Brett T.
Pesout, Trevor
Houck, Marlys L.
Misuraca, Ann
Kingan, Sarah B.
Hall, Richard
Kronenberg, Zev
Sović, Ivan
Dunn, Christopher
Ning, Zemin
Hastie, Alex
Lee, Joyce
Selvaraj, Siddarth
Green, Richard E.
Putnam, Nicholas H.
Gut, Ivo
Ghurye, Jay
Garrison, Erik
Sims, Ying
Collins, Joanna
Pelan, Sarah
Torrance, James
Tracey, Alan
Wood, Jonathan
Dagnew, Robel E.
Guan, Dengfeng
London, Sarah E.
Clayton, David F.
Mello, Claudio V.
Friedrich, Samantha R.
Lovell, Peter V.
Osipova, Ekaterina
Al-Ajli, Farooq O.
Secomandi, Simona
Kim, Heebal
Theofanopoulou, Constantina
Hiller, Michael
Zhou, Yang
Harris, Robert S.
Makova, Kateryna D.
Medvedev, Paul
Hoffman, Jinna
Masterson, Patrick
Clark, Karen
Martin, Fergal
Howe, Kevin
Flicek, Paul
Walenz, Brian P.
Kwak, Woori
Clawson, Hiram
Diekhans, Mark
Nassar, Luis
Paten, Benedict
Kraus, Robert H. S.
Crawford, Andrew J.
Gilbert, M.Thomas P.
Zhang, Guojie
Venkatesh, Byrappa
Murphy, Robert W.
Koepfli, Klaus-Peter
Shapiro, Beth
Johnson, Warren E.
Di Palma, Federica
Marqués-Bonet, Tomàs
Teeling, Emma C.
Warnow, Tandy
Marshall Graves, Jennifer
Ryder, Oliver A.
Haussler, David
O’Brien, Stephen J.
Korlach, Jonas
Lewin, Harris A.
Howe, Kerstin
Myers, Eugene W.
Durbin, Richard
Phillippy, Adam M.
Jarvis, Erich D.
Towards complete and error-free genome assemblies of all vertebrate species
author_facet Rhie, Arang
McCarthy, Shane A.
Fedrigo, Olivier
Damas, Joana
Formenti, Giulio
Koren, Sergey
Uliano-Silva, Marcela
Chow, William
Fungtammasan, Arkarachai
Kim, Juwan
Lee, Chul
Ko, Byung June
Chaisson, Mark
Gedman, Gregory L.
Cantin, Lindsey J.
Thibaud-Nissen, Francoise
Haggerty, Leanne
Bista, Iliana
Smith, Michelle
Haase, Bettina
Mountcastle, Jacquelyn
Winkler, Sylke
Paez, Sadye
Howard, Jason
Vernes, Sonja C
Lama, Tanya M
Grutzner, Frank
Warren, Wesley C.
Balakrishnan, Christopher N.
Burt, Dave
George, Julia M.
Biegler, Matthew T.
Iorns, David
Digby, Andrew
Eason, Daryl
Robertson, Bruce
Edwards, Taylor
Wilkinson, Mark
Turner, George
Meyer, Axel
Kautt, Andreas F.
Franchini, Paolo
Detrich III, H. William
Svardal, Hannes
Wagner, Maximilian
Naylor, Gavin J. P.
Pippel, Martin
Malinsky, Milan
Mooney, Mark
Simbirsky, Maria
Hannigan, Brett T.
Pesout, Trevor
Houck, Marlys L.
Misuraca, Ann
Kingan, Sarah B.
Hall, Richard
Kronenberg, Zev
Sović, Ivan
Dunn, Christopher
Ning, Zemin
Hastie, Alex
Lee, Joyce
Selvaraj, Siddarth
Green, Richard E.
Putnam, Nicholas H.
Gut, Ivo
Ghurye, Jay
Garrison, Erik
Sims, Ying
Collins, Joanna
Pelan, Sarah
Torrance, James
Tracey, Alan
Wood, Jonathan
Dagnew, Robel E.
Guan, Dengfeng
London, Sarah E.
Clayton, David F.
Mello, Claudio V.
Friedrich, Samantha R.
Lovell, Peter V.
Osipova, Ekaterina
Al-Ajli, Farooq O.
Secomandi, Simona
Kim, Heebal
Theofanopoulou, Constantina
Hiller, Michael
Zhou, Yang
Harris, Robert S.
Makova, Kateryna D.
Medvedev, Paul
Hoffman, Jinna
Masterson, Patrick
Clark, Karen
Martin, Fergal
Howe, Kevin
Flicek, Paul
Walenz, Brian P.
Kwak, Woori
Clawson, Hiram
Diekhans, Mark
Nassar, Luis
Paten, Benedict
Kraus, Robert H. S.
Crawford, Andrew J.
Gilbert, M.Thomas P.
Zhang, Guojie
Venkatesh, Byrappa
Murphy, Robert W.
Koepfli, Klaus-Peter
Shapiro, Beth
Johnson, Warren E.
Di Palma, Federica
Marqués-Bonet, Tomàs
Teeling, Emma C.
Warnow, Tandy
Marshall Graves, Jennifer
Ryder, Oliver A.
Haussler, David
O’Brien, Stephen J.
Korlach, Jonas
Lewin, Harris A.
Howe, Kerstin
Myers, Eugene W.
Durbin, Richard
Phillippy, Adam M.
Jarvis, Erich D.
author_sort Rhie, Arang
title Towards complete and error-free genome assemblies of all vertebrate species
title_short Towards complete and error-free genome assemblies of all vertebrate species
title_full Towards complete and error-free genome assemblies of all vertebrate species
title_fullStr Towards complete and error-free genome assemblies of all vertebrate species
title_full_unstemmed Towards complete and error-free genome assemblies of all vertebrate species
title_sort towards complete and error-free genome assemblies of all vertebrate species
publisher Springer Nature
publishDate 2021
url http://hdl.handle.net/10261/251905
https://doi.org/10.1038/s41586-021-03451-0
long_lat ENVELOPE(44.987,44.987,65.619,65.619)
ENVELOPE(67.717,67.717,-70.533,-70.533)
ENVELOPE(-84.767,-84.767,-78.617,-78.617)
geographic Canada
Katara
Loewe
Osborne
Pacific
Queensland
geographic_facet Canada
Katara
Loewe
Osborne
Pacific
Queensland
genre Icefish
Lynx
genre_facet Icefish
Lynx
op_relation info:eu-repo/grantAgreement/EC/H2020/750747
info:eu-repo/grantAgreement/EC/H2020/864203
info:eu-repo/grantAgreement/AEI/Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia/BFU2017-86471-P
info:eu-repo/grantAgreement/EC/FP7/311000
info:eu-repo/grantAgreement/EC/H2020/681396
Publisher's version
https://doi.org/10.1038/s41586-021-03451-0

Nature 592: 737-746 (2021)
0028-0836
CEX2018-000792-M
http://hdl.handle.net/10261/251905
1476-4687
op_rights openAccess
http://creativecommons.org/licenses/by/4.0/
op_rightsnorm CC-BY
op_doi https://doi.org/10.1038/s41586-021-03451-0
container_title Nature
container_volume 592
container_issue 7856
container_start_page 737
op_container_end_page 746
_version_ 1766032573441507328
spelling ftcsic:oai:digital.csic.es:10261/251905 2023-05-15T16:42:08+02:00 Towards complete and error-free genome assemblies of all vertebrate species Rhie, Arang McCarthy, Shane A. Fedrigo, Olivier Damas, Joana Formenti, Giulio Koren, Sergey Uliano-Silva, Marcela Chow, William Fungtammasan, Arkarachai Kim, Juwan Lee, Chul Ko, Byung June Chaisson, Mark Gedman, Gregory L. Cantin, Lindsey J. Thibaud-Nissen, Francoise Haggerty, Leanne Bista, Iliana Smith, Michelle Haase, Bettina Mountcastle, Jacquelyn Winkler, Sylke Paez, Sadye Howard, Jason Vernes, Sonja C Lama, Tanya M Grutzner, Frank Warren, Wesley C. Balakrishnan, Christopher N. Burt, Dave George, Julia M. Biegler, Matthew T. Iorns, David Digby, Andrew Eason, Daryl Robertson, Bruce Edwards, Taylor Wilkinson, Mark Turner, George Meyer, Axel Kautt, Andreas F. Franchini, Paolo Detrich III, H. William Svardal, Hannes Wagner, Maximilian Naylor, Gavin J. P. Pippel, Martin Malinsky, Milan Mooney, Mark Simbirsky, Maria Hannigan, Brett T. Pesout, Trevor Houck, Marlys L. Misuraca, Ann Kingan, Sarah B. Hall, Richard Kronenberg, Zev Sović, Ivan Dunn, Christopher Ning, Zemin Hastie, Alex Lee, Joyce Selvaraj, Siddarth Green, Richard E. Putnam, Nicholas H. Gut, Ivo Ghurye, Jay Garrison, Erik Sims, Ying Collins, Joanna Pelan, Sarah Torrance, James Tracey, Alan Wood, Jonathan Dagnew, Robel E. Guan, Dengfeng London, Sarah E. Clayton, David F. Mello, Claudio V. Friedrich, Samantha R. Lovell, Peter V. Osipova, Ekaterina Al-Ajli, Farooq O. Secomandi, Simona Kim, Heebal Theofanopoulou, Constantina Hiller, Michael Zhou, Yang Harris, Robert S. Makova, Kateryna D. Medvedev, Paul Hoffman, Jinna Masterson, Patrick Clark, Karen Martin, Fergal Howe, Kevin Flicek, Paul Walenz, Brian P. Kwak, Woori Clawson, Hiram Diekhans, Mark Nassar, Luis Paten, Benedict Kraus, Robert H. S. Crawford, Andrew J. Gilbert, M.Thomas P. Zhang, Guojie Venkatesh, Byrappa Murphy, Robert W. Koepfli, Klaus-Peter Shapiro, Beth Johnson, Warren E. Di Palma, Federica Marqués-Bonet, Tomàs Teeling, Emma C. Warnow, Tandy Marshall Graves, Jennifer Ryder, Oliver A. Haussler, David O’Brien, Stephen J. Korlach, Jonas Lewin, Harris A. Howe, Kerstin Myers, Eugene W. Durbin, Richard Phillippy, Adam M. Jarvis, Erich D. National Institutes of Health (US) National Human Genome Research Institute (US) Ministry of Health and Welfare (South Korea) Wellcome Trust European Molecular Biology Laboratory Howard Hughes Medical Institute Rockefeller University Robert and Rosabel Osborne Endowment European Commission National Library of Medicine (US) NLM Korea Institute of Marine Science & Technology KIMST Ministry of Oceans and Fisheries (South Korea) Alfred P. Sloan Foundation Max Planck Society Maine Dept of Inland Fisheries and Wildlife National Science Foundation (US) University of Queensland Science Exchange Northeastern University (US) Federal Ministry of Education and Research (Germany) EMBO National Key Research and Development Program (China) Qatar Society of Al-Gannas (Algannas) Katara Cultural Village Government of Qatar Monash University Malaysia Hessen State Ministry of Higher Education, Research and the Arts Ministry of Science, Research and Art Baden-Württemberg Agency for Science, Technology and Research A*STAR (Singapore) European Research Council Ministerio de Ciencia, Innovación y Universidades (España) Obra Social la Caixa Generalitat de Catalunya Irish Research Council Danish National Research Foundation Australian Research Council 2021-04-29 application/pdf http://hdl.handle.net/10261/251905 https://doi.org/10.1038/s41586-021-03451-0 eng eng Springer Nature info:eu-repo/grantAgreement/EC/H2020/750747 info:eu-repo/grantAgreement/EC/H2020/864203 info:eu-repo/grantAgreement/AEI/Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia/BFU2017-86471-P info:eu-repo/grantAgreement/EC/FP7/311000 info:eu-repo/grantAgreement/EC/H2020/681396 Publisher's version https://doi.org/10.1038/s41586-021-03451-0 Sí Nature 592: 737-746 (2021) 0028-0836 CEX2018-000792-M http://hdl.handle.net/10261/251905 1476-4687 openAccess http://creativecommons.org/licenses/by/4.0/ CC-BY artículo 2021 ftcsic https://doi.org/10.1038/s41586-021-03451-0 2021-10-12T23:35:56Z High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1,2,3,4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors’ contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (864203), MINECO/FEDER, UE (BFU2017-86471-P), Unidad de Excelencia María de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna’s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory. With funding from the Spanish government through the "Severo Ochoa Centre of Excellence" accreditation (CEX2018-000792-M). Peer reviewed Article in Journal/Newspaper Icefish Lynx Digital.CSIC (Spanish National Research Council) Canada Katara ENVELOPE(44.987,44.987,65.619,65.619) Loewe ENVELOPE(67.717,67.717,-70.533,-70.533) Osborne ENVELOPE(-84.767,-84.767,-78.617,-78.617) Pacific Queensland Nature 592 7856 737 746