Lichun Sun1,2*, Jun He2, Jing Luo3 and David H Coy1
Received:May 20, 2019; Published: May 29, 2019;
Corresponding author:Lichun Sun, Department of Medicine, School of Medicine, Tulane University Health Sciences Center, Sino US Innovative Bio Medical Center and Hunan Beautide Pharmaceuticals, Hunan, China
It is an explosive era of the big digital data with an exponential growth rate. The data produced in the couple years between 2015 and 2017 are more than all the preceded data produced in the entire human history. Nowadays, over 40 exabytes (EBs) of new data per day are produced all over the world. There were totally 33 zettabytes (ZBs, or 33000 EBs) of digital data worldwide in 2018. It is estimated that there will be almost 175 zettabytes (ZBs) of data in 2025. As the continuous increase of data, scientists realized that, to store these huge data is a mission impossible with current technologies and is becoming a server headache for people from academy, industry and almost all other fields in this digital universe. For instances, the data storage center built by IBM in 2011 just only has 120 petabytes (PBs) (about 0.1 EB) of the data-storing capacity. In 2013, Facebook built a large data storage center as well with only the capacity to store 1 EB of data. The current data storage spaces, technologies and approaches cannot meet the urgent requirement of storing these huge data.
Currently, most digital data are stored on traditional magnetic and optical media such as tapes, DVDs, HDD (hard disk drive), and USB flash drives [1-4]. These media have quick retrieval times, but with very limited data storage capacity. A CD may store several hundred megabytes (MBs) of data. A flash drive can maximally store less than one hundred gigabytes (GBs) of data, with an HDD holding couple terabytes (TBs) of data. Meanwhile, these media also lack long-term durability. They are also easily damaged, resulting in data loss . As the broad use of computers, smart phones, Internets and other electronic devices, we are surrounded in this digital data world. These digital data are stored on the silicon-based chips with the binary numeric system that uses only two digital numbers, or 0 and 1. As reported, the microchip-grade silicon is rare in nature and will be run out in 2040 . A new media or a new technology is needed to fix the coming data storage crisis. Due to its unique advantages , deoxyribonucleic acid (DNA) is expected to be the magic digital data storage medium to eventually solve data storage problem that have long plagued scientists worldwide.
To most of natural organisms, deoxyribonucleic acids (DNAs) are their genetic materials to pass their genetic characteristics from one generation to next generation and has double-stranded helical structures, with each strand having a backbone and a side chain. The former consists of the phosphate groups and the deoxyribose groups. The latter consists of four types of nitrogenous bases including adenine (A), cytosine (C), guanine (G), and thymine (T). Each double-stranded DNA replicate in a semi-conservative manner . Each of two parental strands serves as a template for the synthesis of new daughter DNA molecules. The complementary nucleotides are added to the daughter strand via pairing with the opposite of the bases on the parental strand under the base-pairing rules of A with T and G with C. The new double-stranded DNA molecule consisting of one parental strand and one daughter strand is the same as their parent DNA. Given that DNA is highly conserved with high density, high stability, high replication efficiency, and long-term durability [2,7,8], DNA was considered as the ideal data storage medium with high data fidelity and long-term storage ability. DNA can magically store one EB of data per cubic millimeter (mm3) or 215PBs per gram of DNA. As mentioned above, an entire huge data center built by Facebook in 2013 can only archive one EB of data. And indeed, for the first time, DNA had been demonstrated to be capable of storing digital data in 1988 . Data-storing DNA currently is a single-stranded type of synthetic nucleotide sequences . DNA data storage is the process that encodes and decodes binary data to and from the synthesized DNA sequences. For data encoding, the data information are converted to binary codes with 0 and 1, and then encoded to DNA nucleotide sequences, with the four bases (A, C, G, T) instead of 0 and 1, further synthesize and store the DNA sequences (Figure 1) [4-10]. For data decoding, the DNA sequences stored are amplified by PCR, sequenced, decoded, and eventually converted the sequencing information into the original digital data as required (Figure 1) [2,3,8].
Figure 1: The schematic process of DNA data storage. It is to encode and decode binary data to and from the synthesized DNA sequences. For data-encoding, A text string with 24 bytes (“DNA digital data storage”) is converted to binary bits that are subsequently encoded to an oligo DNA using a 1 bit per base, with purine (A, G) being assigned as 1s and pyrimidine (C, T) as 0s. The oligoDNA is chemically synthesized and the text contents are then stored as an oligo DNA fragment for the long-term storage. For data-decoding, the DNA fragment required will be amplified by PCR, sequenced and decoded to binary bits once, one day, we need to retrieve data in order to output binary data to be readable. Eventually, reading the data from DNA sequence library is to sequence the unique DNA molecules, convert the sequencing information into the original digital data as required [2,3,8]. This figure is cited from reference .
Figure 2: The completed end-to-end automated DNA data storage system that was made by scientists from Microsoft and University of Washington, including synthesis, storage and sequencing. It is the first fully automated DNA data storage system. This device can completely write, store and read DNA data information. It is cited from Dr. Takahashi .
However, the use of DNA as a data storage medium is still on the early stage before being commercially applied. We need to solve several challenges including high cost, low throughput, the limited access to data storage, short synthetic oligoDNA fragments, error rate in synthesis and sequencing [8-18]. For example, there is an error rate of about 1% per nucleotide in DNA synthesis and sequencing, with the occurrence of insertion, deletion, and substitution of nucleotides. Also, it is time-consuming for data to write into and retrieve from oligoDNA library. Access time for traditional media is much quicker. Access cost one millisecond for flash drive and several minutes for tape, but tens of hours for DNA data storage. Data random access is also a tough challenge in DNA data storage. In particular, the use of DNA in data storage is much more expensive than the traditional magnetic and optical media such as flash drive, hard disk drive [3-6]. This seriously hampered its commercial use. For example, to synthesize 2 MBs of data cost $7000, with another $2000 to retrieve it in 2017 . It costs approximately $10,000 to write and read a 5-byte word with the fully automated DNA data storage device developed by Microsoft and University of Washington . However, following with the advances in the development of DNA synthesis and sequencing technology, scientists can eventually go through these challenges and let the magic DNA become the perfect data storage medium. Currently, the handy and portable DNA sequencers are available . The automated DNA data storage device has been invented. Once matured and commercially applied, these technologies can further reduce the cost of DNA sequencing and simplify retrieval of DNA information. Thus, in the near future, DNA serving as a data storage medium will be a golden opportunity in this era of big data.
Bio chemistryUniversity of Texas Medical Branch, USA
Department of Criminal JusticeLiberty University, USA
Department of PsychiatryUniversity of Kentucky, USA
Department of MedicineGally International Biomedical Research & Consulting LLC, USA
Department of Urbanisation and AgriculturalMontreal university, USA
Oral & Maxillofacial PathologyNew York University, USA
Gastroenterology and HepatologyUniversity of Alabama, UK
Department of MedicineUniversities of Bradford, UK
OncologyCirculogene Theranostics, England
Radiation ChemistryNational University of Mexico, USA
Analytical ChemistryWentworth Institute of Technology, USA
Minimally Invasive SurgeryMercer University school of Medicine, USA
Pediatric DentistryUniversity of Athens , Greece
The annual scholar awards from Lupine Publishers honor a selected number Read More...
We know the financial complexity of Individual read more...
The annual scholar awards from Lupine Publishers honor a selected number read more...