After years of being left for dead, SQL today is making a comeback. How come? And what effect will this have on the data community?
Since the dawn of computing, we have been collecting exponentially growing amounts of data, constantly asking more from our data storage, processing, and analysis technology. In the past decade, this caused software developers to cast aside SQL as a relic that couldn’t scale with these growing data volumes, leading to the rise of NoSQL: MapReduce and Bigtable, Cassandra, MongoDB, and more.
In this post we examine why the pendulum today is swinging back to SQL, and what this means for the future of the data engineering and analysis community.
Part 1: A New Hope
To understand why SQL is making a comeback, let’s start with why it was designed in the first place.
Our story starts at IBM Research in the early 1970s, where the relational database was born. At that time, query languages relied on complex mathematical logic and notation. Two newly minted PhDs, Donald Chamberlin and Raymond Boyce, were impressed by the relational data model but saw that the query language would be a major bottleneck to adoption. They set out to design a new query language that would be (in their own words): “more accessible to users without formal training in mathematics or computer programming.”
IBM Research announced, with the help of Sony Storage Media Solutions, the have achieved a capacity breakthrough in tape storage. IBM was able to fit 201 Gb/in^2 (gigabits per square inch) in areal density on a prototype sputtered magnetic tape. This marks the fifth capacity record IBM has hit since 2006.
The current buzz in storage typically goes to faster media, like those that leverage the NVMe interface. StorageReview is guilty of focusing on these new emerging technologies without spending much time on tape; namely because tape is a fairly well known and not terribly exciting storage media. However, tape remains the most secure, energy efficient, and cost-effective solution for storing enormous amounts of back-up and archival data. And the deluge of unstructured data that is now being seen everywhere will need to go on something that has the capacity to store it.
This newly announced record for tape capacity would be 20 times the areal density of state of the art commercial tape drives such as the IBM TS1155 enterprise tape drive. The technology allows for 330TB of uncompressed data to be stored on a single tape cartridge. According to IBM this is the equivalent of having the texts of 330 million books in the palm of one’s hand.
Technologies used to hit this new density include:
Innovative signal-processing algorithms for the data channel, based on noise-predictive detection principles, which enable reliable operation at a linear density of 818,000 bits per inch with an ultra-narrow 48nm wide tunneling magneto-resistive (TMR) reader.
A set of advanced servo control technologies that when combined enable head positioning with an accuracy of better than 7 nanometers. This combined with a 48nm wide (TMR) hard disk drive read head enables a track density of 246,200 tracks per inch, a 13-fold increase over a state of the art TS1155 drive.
A novel low friction tape head technology that permits the use of very smooth tape media
This new technology marks a long list of tape storage innovation for IBM stretching back 60 years. Though the capacity today is 165 million times the capacity of their first tape product.
Looking back is more than nostalgia. It helps us see what has changed and what hasn’t, and where we might improve. 2016 has been a momentus year for storage. Here are the top stories.
LEGACY VENDORS AND THE CLOUD
The cloud has devastated revenue, growth and margins of legacy vendors. Any CFO can look online to see what similar capacity, performance, and higher availability costs compared to the huge capital costs of traditional RAID arrays.
2016 saw the world’s largest independent storage company — EMC — bought by Dell, after shopping itself to all the big system vendors. The $60 billion price tag was excessive given the rapid obsolescence of much of EMC’s intellectual property, but a worthy capstone to CEO Joe Tucci’s brilliant leadership of the storage giant.
Tucci saw what many other CEOs denied, which is that the scale-out commodity-based storage systems and the internalization of storage have forever changed the storage industry. EMC needed a system partner to leverage their storage expertise, and Dell needed a robust enterprise sales force.
NetApp acquired SolidFire, a promising flash array vendor, that finally got them into the highest growth area of legacy storage. Plagued by years of flash misfires and infighting, NetApp has done well in the new market, but has had to lay off thousands of employees.
NetApp is touting their integration with Amazon Web Services — cloud — but that is a rear guard action as cloud vendors gobble up more enterprise dollars. Their next big problem: object storage systems are getting faster, offer much better data protection, are much more scalable, and more cost-effective than NetApp’s flagship NAS boxes. I hope their CEO, George Kurian, recognizes the threat and acts decisively in 2017.
LEGACY VENDORS AND THE UPSTARTS
Legacy vendors are getting squeezed between the cloud and aggressive storage startups. Companies like Nimble Storage, Nutanix, and Pure Storage offer modern architectures that leave the RAID paradigm in the dust. All three had successful IPOs, and now have the money to bring the fight to the legacy vendors.
Other startups have been acquired by legacy vendors to remake their products lines. DSSD, supported by Silicon Valley legend Andreas Bechtolsheim, was bought by EMC a couple of years ago. NetApp acquired SolidFire this year. HGST acquired Amplidata last year and are making a solid play for the active archive market. The storage startup scene continues to boil.
NVRAM is the Next Big Thing for servers and notebooks, as support from Intel and Microsoft shows. Some versions — there are around 10 — are almost as fast as DRAM, but use much less power and are much denser. Terabyte DIMMs, anyone? Big Data will especially benefit from high capacity NVRAM equipped servers.
2016 was supposed to be the year that Intel introduced their NVRAM 3D XPoint Optane drives, but like many ambitious engineering projects, they’ve slipped into 2017, and may be one reason the recent MacBook Pro’s were delayed. But Intel isn’t the only player, and certainly isn’t the first to market.
MRAM vendor Everspin IPO’d this year, raising funds needed to further enhance their NVRAM line. Nantero licensed their NVRAM to a couple of major fabs, putting their carbon nanotube technology on the fast track.
I’ve been a happy Thunderbolt 1 user for years. It’s a great technology that is fast, stable, and low-cost.
2016 saw it get even better, now that one Thunderbolt 3 connector supports 40 Gbit/s bandwidth with half the power consumption of Thunderbolt 2. That’s enough bandwidth to drive dual 4k displays at 60 Hz, PCIe 3.0, HDMI 2.0, DisplayPort 1.2, as well as 10 Gbit/s USB 3.1. Plus to to 100 watts of power to charge systems and up to 15W for bus-powered devices.
Using newly available and cheap PCIe switches, Thunderbolt 3 can be stretched to build large clusters at low prices. We’ll see more of that in 2017. On notebooks it offers performance and connectivity undreamed of 10 years ago. External drives with gigabyte per second performance are already here, with more on the way.
THE STORAGE BITS TAKE
I’ve been involved with storage for over 35 years, starting when a disk drive cost $40 a megabyte. For the last 15 years the industry has been on an innovation spree that has upended many companies and delivered incredible capabilities.
Storage is the basis of our digital world. Given the crisis of a post-fact world, I take comfort in the fact that a $100 billion plus industry is working hard to store and protect the data that is critical to the challenges humanity faces.