PCI Express-based next generation storage

25 September 2009

Fusion-io has developed patent-pending techniques to create NAND flash-based storage with reliability equal to disk-based storage. This article describes advancements that Fusion-io introduced to achieve this performance.

The Fusion-io Drive Duo

The founders of Fusion-io identified several technologies advancing in parallel that if married together could solve the age old problem of creating storage that could keep up with the insatiable power doubling every eighteen months. For many applications this mismatch of performance hasn’t been an issue, single user PCs even with today’s huge software overheads still perform to a more than satisfactory level. This isn’t the case however in the world’s large data centres that are trying to keep up with the incessant demand put upon them by banks, insurance companies, city stock brokers, web servers, on-line shops, and ultimately the constant requirement for more and more information by the consumers. Until now the only solution available has been to incorporate massively paralleled hard drives in Storage Area Networks (SANs) and a whole industry has grown up around these monster systems to provide the huge power requirements and air conditioning needed to keep these large monoliths running.

Performance
The breakthrough made by Fusion-io was to use the much faster Random Access times of today’s NAND Flash and instead of building ‘me-too’ Solid State Drives (SSDs); build a drive based on the PCI Express (PCIe) 2.0 bus standard now residing in today’s computing systems. This solution offers up to 20 Gigabits-per-second of raw throughput, with multiple PCIe lanes available and linking them virtually directly to Flash technology controlled by an on-board programmed microcontroller – and the addition of a high performance PCIe lane switch from PLX for the Fusion-io Drive Duo.

This negates the need for SCSI, SATA or SAS interfaces as the drive now communicates directly with the processor. Latency is cut to 50 microseconds. The ‘Holy Grail’ of the storage industry and in particular large databases, is IOPs (Random Reads /Writes per second). The SLC-based 160 GB Drive Duo provides over 200,000 IOPs. Read Packet size is 4K. In terms of raw write and read bandwidth, the 320 GB SLC-based drive has overall performance of 1.4 Gb/s (32K Packet size) Write and 1.5 Gb/s (32K Packet size) Read.

The SAN comparison
Let’s make some comparisons between a state-of-the-art SAN box and the Fusion-io Drive Duo. To get 25K IOPs you need a server and around 150 Enterprise-class Fibre Channel Hard Disk Drives, a high-end host bus adaptor and some fibre channel switches all incorporated in at least a 6ft 19in cabinet. The Drive Duo would be incorporated in one 2U or 4U rack mount server. Plugged directly into one of the PCIe X8 slots it would be capable of six times the IO performance, use 1% of the power and could be expanded easily depending on available slots. Popular host systems are the Pro-Reliant 580 and 780 series from HP or the 3860 series from IBM. Sun-based servers are currently in Beta testing. The conventional SAN will cost between £250,000 and £300,000 in the first year (initial cost and power consumption), the largest Fusion-io Drive Duo at 640 GB would cost £5,936 plus the cost of the server.

Reliability
There is a common misconception that Flash-based technology is inherently unreliable and in the early days this was a fair observation. Today’s NAND Flash-based products are far more reliable than previous generations. And more importantly, in today’s climate, designers know how to efficiently manage Flash mortality.

In Fusion-io’s storage devices, NAND Flash chips are stacked several at a time (to increase density), operated in parallel (to increase throughput) and mounted on a PCB that plugs into a PCIe slot on the server or in the CPU. The Flash media is integrated with the controller onto a single PCIe card.

NAND Flash, as a storage medium, offers a number of benefits in comparison to HDD devices. NAND Flash has no moving parts and is therefore significantly less prone to shock or movement disturbance. It is a high-speed solution in terms of both latency and throughput. Temperature and humidity resistance mean that it can operate in a number of different environments. Finally, NAND Flash consumes significantly less power than HDD devices, particularly when you take into account secondary power requirements for device cooling.

Data integrity
Data integrity means having the confidence that what is put into a storage system is exactly what will come out when the data is requested, and is the most important function of a storage system. While being moved from a computer’s RAM or CPU to the Fusion-io device, several approaches are used to ensure data integrity. The CPU, chipset, and RAM use SECDED (Single Error Correct Double Error Detect) or chipkill (method for on-the-fly replacement of a failed chip) to ensure accuracy. Once data is written to the storage medium, it is checked again for accuracy.
When data is read from the storage medium, error correction techniques are again employed to ensure the data being retrieved is correct. The device can correct a substantial portion of the data being read. NAND’s reputation for unreliability is based on studies that show potential data loss without using error correction – or less correction than that employed by the Fusion-io device. Using the methods described here, Fusion-io’s devices can produce results that exceed target error probability by about four times. The devices also use a patent-pending approach when writing data that allows the data’s path to be reconstructed from information generated during the write process.

Data availability
Data availability means having confidence that data stored will not be lost, either while in transition to the storage device or after it has been written to the media. Generally speaking, NAND Flash is substantially more reliable than rotating magnetic media as it eliminates the chance of mechanical failure. There is, however, a chance of bad chips and chip wear-out. Fusion-io mitigates this risk using a variety of approaches.

Fusion-io’s redundant, patent-pending approach to writing data allows data to be rebuilt at very high speed, ensuring rapid data availability. Data is also regularly moved and checked for accuracy to ensure it does not deteriorate on the Flash chip. This also consolidates good data and reallocates space on the drive to ensure greater data availability. This system also spreads data evenly across the device, ensuring uniform wear across all NAND Flash chips.

Additionally, Fusion-io uses multiple Error Correction Code (ECC) techniques to identify and correct faulty data. Using ECCs, the device controller can correct up to 11 missing or incorrect bits out of every 240 Bytes. One of the biggest benefits of ECC routines is that they allow the device to predict the likelihood of failure on individual chips. When a particular area of a chip has passed a set unreliability threshold, its data can be moved and that area will be taken out of service. The controller continues to identify and remove bad blocks, regions of chips or even entire chips so that ordinary wear-out does not cause catastrophic failure, rather a very predictable wear-out.

Managing long-term availability
NAND Flash wears out at a predictable rate as described shown in Table 1. Effective use of wear-levelling strategies can significantly improve the life expectancy of its drives. Please note the table includes both MLC and SLC NAND-based non-volatile memory technologies. Single-Level Cell (SLC) NAND and Multi-Level Cell (MLC) NAND offer capabilities that serve two very different types of applications, respectively – those requiring high performance at an attractive cost-per-bit and those seeking even higher performance over time, that are less cost-sensitive.

Type / Write Duty Average Estimated Lifetime
SLC flash @ 40% write duty 25 years
MLC flash @ 20% write duty 10 years
MLC flash @ 40% write duty 5 years

Flashback protection
The primary objection to NAND Flash has been the reliability of the medium. Fusion-io has eliminated this barrier by inventing a self-healing technology, known as ‘Flashback Protection’ in its controllers. This instantaneously restores, corrects and resurrects lost data in the Flash-based storage sub-system. Flashback Protection is accomplished by collectively using advanced bit error correction, proactive data integrity monitoring of stored data and the recent addition of a dedicated chip to repair failed devices.

Green issues surrounding data centres
In 2006, U.S. data centres consumed an estimated 61 billion kilowatt-hours (kWh) of energy, which accounted for about 1.5% of the total electricity consumed in the U.S. that year, up from 1.2% in 2005. The total cost of that energy consumption was $4.5 billion, which is more than the electricity consumed by all colour televisions in the country and is equivalent to the electricity consumption of about 5.8 million average U.S. households.

Data centres’ cooling infrastructure accounts for about half of that electricity consumption. If current trends continue, by 2011, data centres will consume 100 billion kWh of energy, at a total annual cost of $7.4 billion and would necessitate the construction of 10 additional power plants.

Fusion-io Drives consume a minuscule amount of power. In fact, the ioDrive equals the performance of 600 parallel HDDs in a SAN (comprising the HDDs, the redundant power systems, redundant network equipment, HBAs, and more) but only requires the energy of just one of those hard disk drives. This means eliminating around 10,000 Watts of power with the use of a single ioDrive.

With this superior storage technology, not only can the performance and throughput of the data centre increase but IT managers can reduce the amount of memory installed in a server, and collapse entire storage tiers, thus dramatically reducing, by orders of magnitude, overall energy consumption.

Besides the obvious reduction in costs (including software licences), IT hardware power consumption was reduced by more than 90% for equivalent application performance. Processor utilisation dramatically improved resulting in an increase in the performance per Watt because the ioDrive was able to deliver data at a substantially faster rate. But more importantly, the ioDrive did so at a fraction of the power and without the large typical data centre infrastructure.

Now it is safe to exploit the performance gains and power savings offered by NAND Flash storage. The storage architecture pioneered by Fusion-io ensures predictable, controlled mitigation of early device failure, long-term device attrition and data changes due to external and data transport interference – issues that have up to now limited the adoption of NAND Flash-based storage at the enterprise level.

John Vaines is Managing Director of Diamond Point International Europe, Fusion-io’s UK partner.


Contact Details and Archive...

Related Articles...

Most Viewed Articles...

Print this page | E-mail this page