NCSA rolls out largest known SAN


An article from Enterprise IT Planet claims that NCSA (former working home of your's truly) has established the largest known Storage Area Network.

The huge filesystem, containing over 110TB of data (that's 110,000,000,000,000 bytes+, or the equivalent of ~500 of the largest disks you can buy) is available to a cluster of 256 Linux-based systems at the Center.

Although the article claims that CPU-intensive operations had been the focus of NCSA since its inception because of the slowness of I/O, I see it a bit differently.

If you ever wondered why NCSA was providing networking products such as NCSA Telnet (plug, plug) or NCSA Mosaic, the reason is that data storage, retrieval, and control were at the heart of what we were trying to get done there.

From the beginning, we were granted the biggest CPUs that you could find in research organizations short of Colorado's NCAR, the main weather research facility in the US at the time. However, we were also charged with handling simulations from researchers all over the world. For this, we needed ways to move large amounts of unprocessed and processed data back and forth between the researchers and the Center.

To that end, we did an implementation of TCP/IP on our Cray/XMP under CTSS (the Cray Time Sharing System) and decided to deploy UNICOS (the Cray version of UNIX) when we installed our Cray 2. Both of these moves were designed to make moving data easier. Further, we had an array of large, very fast (extremely expensive) disk drives at the site.

In those days, when most organizations were running small, slow LANs and WANs connected via 10MBPS links were the norm, we had already implemented a sophisticated fiber-optic network (OK, it was sophisticated at the time) for communicating between our supercomputers and the workstations and other computers on the network running at 100MBPS, and faster links to the Crays from certain post-processing systems.

The move for more accessible and faster shared storage, such as the systems being created at NCSA are just the tip of the iceberg. As we all try to deal with the challenges of the enormous amount of available data, these kinds of efforts will continue to grow in importance. Perhaps one future direction fro NCSA will be to take the SAN work and push it to a NAS (Network Attached Storage) or WANAS (Wide Area Network Attached Storage) model which would allow the post-processing and visualization to be done in real time as the data is coming off of the cluster.