WILL DNA BE THE DATA STORAGE IN THE FUTURE
WILL DNA BE THE DATA
STORAGE IN THE FUTURE ?
Everyday we create 463 Exabytes of data. Each Exabyte
contains 4.6 Million of 10 Terabyte book sized hard drives which is really a
mind boggling number. These numbers represents a long-term storage and
archiving problem for data. But biotech may have the solution for this.
DNA is Data
In each of our cells lie the instructions to make a
whole human being. Those instructions are coded into our DNA molecules, long
strings of chemical nucleotide bases represented by the letters A, T, G and C.
When DNA sequencing allowed us to read these letters in the 1970s, a Japanese
group suggested advanced races could have left a message for us in the genome of a common virus like phi X147
bacteriophage. But no such message was found when the virus was sequenced, but
the idea of DNA being used to encode messages stuck. Could DNA be the data
storage of the future?
Computers and organic cells have a lot in common. In a
computer, information is encoded in strings of numbers called bits, 1s and 0s
that, when read, execute programs. In a cell, information is stored in the four
nucleobase letters that produce proteins when read. Computer data is measured
in bytes. There are eight bits in a byte, 1000 in a kilobyte, etc. Remember how
an Exabyte is basically a room full of
books? Now imagine that each letter of DNA represented two bits of information
where A = 00, T = 01, C = 10, and G = 11.
In a DNA molecule, an exabyte of DNA-data
could be stored in just a cubic millimeter.
Conversion of Nucleotides to bits
Professor George Church at Harvard took the DNA data
storage idea forward. In 2012, his team converted a 52,000-word book into
strings of DNA. They proved the principle that DNA could store data, however
they discovered that the method limited the amount of information the DNA could
store. Because DNA can break and degrade, the theoretical limit of a single nucleotide
is storing 1.8 bits of data. Church’s group achieved less than half of this
capacity with their early method.
In 2017, Dr Yaniv Erlich and Dr Dina Zielinski of the
New York Genome Centre made a breakthrough. Recognizing limitations in DNA
synthesis, they converted six files into strings of binary code and developed
an algorithm called a DNA fountain to process the information for DNA coding.
The DNA fountain randomly separated the strings into “droplets” of DNA strings
200 base pairs long, a reasonable length for error-free DNA synthesis which can
accrue errors after this length. The DNA strings were also flanked with tags to
help reassemble the fragments. The digital DNA strands, 72,000 in total, were
then sent to be synthesized.
Twist Bioscience a leading
large-scale DNA synthesis company, synthesized the DNA and sent the fragments
back two weeks later. Erlich and Zielinski the two best
researchers of this company sent the DNA for sequencing and the code
processed back into binary by a computer program using the tags as a guide to
help reassembly. The result was perfect. Erlich estimated that their
approach encoded 1.6 bits of information per nucleotide.
Industrial Approach to convert DNA into DATA
Converting DNA into data requires a lot of DNA, the
synthesis of which is traditionally neither easy nor cheap. Twist Bioscience
developed a scaled up approach for DNA synthesis which is better suited to meet
demand for DNA data storage. Microsoft and Twist partnered to set a record of
data of 200 MB in 2016.Recently, Microsoft and the University of Washington
demonstrated a completely automated system to store and retrieve DNA data – in
this case the word ‘hello’ – bringing the technology a step closer to its
application in data centers.
These approaches use DNA bases to store information in
strings, like strings of bits in a computer. However, this approach is still
prohibitively expensive with current DNA synthesis costs. DNA is not infallible
either. Missing a base, either in assembly or in reading the DNA
strand, the data can become corrupted. If the technology is to be developed for
reading and writing information as easily as computers, then these issues will
need to be addressed. Fortunately, because DNA is a natural data storage system
for our genetic blueprint, nature has evolved a range of protective measures to
keep our DNA in order which have inspired a new approach to DNA data storage.
In 2019, DNA data storage company Catalog smashed
Microsoft’s record in DNA data storage by coding all of Wikipedia in English
into DNA. That’s 16 gigabytes of data. They did this by taking a completely
different look at how DNA could store data. Rather than coding each letter as a
combination of two bits of data, Catalog code several DNA letters in
We won’t have DNA-based computers just yet. The key in
improving technology is bringing down cost of DNA synthesis and reading down
through automation. This is currently a slow process but may yet be useful for
archiving data and making long-lasting backups. DNA is structurally suited to
storing information for extended periods of time, given its half life of 521
years. Perhaps we can build a DNA time capsule containing all of humanity’s
current knowledge and blast it into space or bury it on Mars for future
generations to find.
Comments
Post a Comment