WOS FAQs

Common Questions and Answers About the WOS

Overview

What is Web Object Scaler(WOS)?

WOS is a new Extreme Storage product from DataDirect Networks designed to solve the performance and scalability challenges faced when managing large amounts of file based data, serving it quickly and with low latency, and distributing it among multiple data centers. WOS is a globally clustered cloud storage system that uses advanced object storage technology.

Some of the use cases for WOS are:

  • Web content storage and distribution (websites, corporate intranets, social networks, photo sharing sites)
  • Picture Archiving and Communication Systems (PACS) medical imaging storage
  • Document management systems storage (check imaging for banks, PDF statements & bills for financial institutions, energy companies, telecom carriers)
  • Online Game Developers (user profile & state file storage)
  • Geospatial Information Services (map images)

What problems does WOS solve?

WOS has been designed to solve the following problems:

  • It eliminates the need to have multiple storage systems and multiple file systems. WOS Clouds deliver Internet-scale performance and store hundreds of billions of files in a single global object repository. Forget file systems that max out at 16TB or even at Petabytes. Forget running out of inodes. Forget about tweaking your file systems to get better performance. You'll never have to engineer around file system limitations with WOS.
  • It eliminates the need to buy large storage systems up-front to plan for capacity growth. WOS Clouds can start small and grow to Internet scale over time, allowing you to dynamically add capacity as you need it, as well as where you need it.
  • It stores files without wasting capacity. If you tend to store lots of small files such as thumbnail photos, WOS will not over-allocate space like most file systems. You'll never have to deal with "stranded" capacity and will always get the most out of the capacity you purchase.
  • Content delivery is fast and efficient. WOS' technology makes it possible to retrieve files from anywhere in the Cloud using only a single disk operation. Object-based WOS Clouds deliver the maximum amount of files without bogging down in traditional operations like metadata lookups, extent list fetches, and RAID operations. You'll be truly amazed at how WOS handles massive amounts of file operations.
  • It uses policy-based replication to distribute content where you want it. A single WOS Cloud can have nodes that are geographically dispersed, but operate and are managed as a cohesive system. Do you need a file accessible quickly from Los Angeles, New York, and Tokyo? A WOS policy can distribute the content within the Cloud to these locations and the Cloud will serve the data locally -- all using common identifiers. Never again worry about how to replicate, distribute, and synchronize your data -- WOS does it for you.
  • It is a completely distributed system with no bottlenecks or single points of failure. Anything can fail in WOS -- a disk drive, a storage node, a network connection -- anything, and data remains online and accessible. WOS automatically heals around the problem. You'll never need to think about how to keep your data available again.
  • It is easy to manage from a single web-based GUI. If you work with multiple storage systems, especially spread across multiple data centers, WOS will change your life. No matter how large a WOS Cloud grows, no matter how geographically dispersed it is, you always manage the Cloud as a single entity. No more logging into multiple systems. No more separate management of block-level devices and file systems. No more complex monitoring solutions. WOS is completely integrated and can be setup in minutes.

Is WOS right for me?

Does this sound like you?

  • I need to store millions to hundreds of billions of files
  • My storage is accessed machine-to-machine through our applications
  • I have a database or other mechanism for tracking file names and paths
  • I want to use one big file system but can't because no single file system scales big enough
  • My file system runs out of inodes
  • I have to manage how many files I put into directories and how many directories I put into a file system to get the best performance
  • I've named all my files 1,2,3,4,5... and all my folders 1,2,3,4,5... so I can hash them to facilitate faster lookups
  • I've had to tweak my file systems to perform better
  • I need to replicate content to more than one data center across geographies
  • I have thumbnail images or other small files that wreak havoc on my file and storage systems
  • I am constantly tweaking and engineering around performance and scalability limits
  • I've looked at everything on the market and no storage system delivers enough IOPS to serve my content
  • I have to spend time load balancing my storage environment
  • I feel like I'm spending too much on a CDN, but I have to because I can't origin serve everything
  • I want a single, simple way to manage all this data

If you can relate to even a few of these bullets, then WOS can help. Let's talk -- 1-800-TERABYTE.

How does WOS work?

WOS Clouds intelligently automate the tedious tasks associated with traditional storage and file systems. WOS Clouds are made up of multiple intelligent storage nodes in communication with each other through standard Ethernet/Internet Protocol (IP) networks. New objects are automatically load balanced across available nodes and the WOS Cloud automatically rebalances when new nodes are added. As objects are stored, WOS ensures that they replicated according to their policy settings. When objects are retrieved, WOS ensures that they are serviced locally if possible, facilitating the best file access and load times. If content is unavailable locally, WOS automatically retrieves it from a remote node in the Cloud. WOS uses a simple API to interface to your applications to provide functions such as GET a file, PUT a file, and DELETE a file. WOS is easy to use whether you have a small system or a global network.

Is WOS Content Addressable Storage (CAS)

Yes and no, depending on your definition of CAS. WOS does use a globally unique object identifier and an API, which is similar to CAS. But from there it is quite different. In contrast to CAS systems which are intended for compliance and archiving, WOS is designed for global content delivery. As such, it does many things a CAS system doesn't do, such as policy-based multi-site replication, scaling to extreme capacity and performance levels, and optimizing delivery of content for low latency.

How is WOS different from traditional filers and file systems using block storage?

Traditional storage systems "live" in one location. Even if multiple sites are connected through a SAN, through replication software, or through wide area file services (WAFS), the storage and file systems in each location are separately managed. Coordination and synchronization of content between sites is difficult and tedious, if not impossible.

In contrast, WOS is a distributed system managed as a single entity. With WOS is it easy to make multiple copies of content and keep it synchronized. A WOS Cloud can be geographically dispersed, yet managed from a single easy-to-use Web interface. With WOS you'll never have to think about RAID groups, LUNs, consistency groups, or other block-level storage semantics. You'll never have to think about file systems, directories, paths, inodes, or other file-level storage semantics. In short, you can focus on what's really important to you -- growing your business.

How can WOS reduce my total cost of ownership (TCO)?

WOS customers reduce their TCO and achieve high returns on investment in the following ways:

  • They no longer spend time and resources engineering around limitations in their storage environment. They quickly and easily deploy WOS and focus their engineering resources on building their business.
  • WOS Clouds provide several times the performance of traditional storage systems, reducing the amount of equipment to purchase. A small WOS Cloud replaces multiple RAIDs, file servers, switches, and other infrastructure.
  • There is a single management interface for WOS, even if the Cloud is geographically distributed. Setting up and administering WOS is easy and makes it possible for a single part-time administrator with no storage or SAN expertise to manage a multi-Petabyte global content repository.
  • WOS nodes are very dense and power efficient. Less data center rack space and power is used.
  • WOS helps reduce or eliminate the need for a CDN, saving lots of money. CDNs are used for two reasons -- to distribute content close to the network edge, and to handle high hit rates so that origin storage systems are not overloaded. WOS customers often find that they save substantial amounts of money by building out WOS Clouds across a few new hosted data center sites instead of relying on a CDN. WOS Clouds are typically fast enough to origin serve all their traffic, eliminating the need for a CDN to achieve content delivery rates.

WOS Concepts

What is an object?

An object is a group of elements that are managed together by WOS. At the most basic level an object is a file, the user-defined metadata assigned to the file, and the policy (see policy definition below) to which the file should adhere. However, an object can be more than this. It can be a group of related files that need to be stored and accessed together. Think of this formula: Object = file(s) + metadata + policy.

What is an Object ID (OID)?

An OID is a globally unique identifier that WOS assigns to an object when storing it. Even if the object is replicated multiple times, a single OID is used to retrieve it. Think of the OID like the ticket a valet gives you when he takes your car. The ticket contains a unique number that the valet will match when you want your car back. To retrieve an object from WOS, the OID is provided and WOS figures out where the object is stored and delivers it.

Where are OIDs stored?

It's up to you. Most WOS customers run a database that tracks the organization of their content. For example, the database determines which images and text clips will be loaded onto a particular web page when a user requests it. The OIDs for these images and text clips reside in the database so that when the page is requested, the web application knows which OIDs must be retrieved in order to build the page. This is just an example. Some customers store OIDs within their applications, or distribute them. The OID simply replaces the file path you are currently storing.

What is a Node?

A node is the basic storage element of a WOS Cloud. Nodes are integrated appliances supplied by DataDirect Networks that contain storage for customer content and computing resources that run the WOS Operating System. There are different types and configuration of nodes that deliver various levels of capacity and performance to suit customer needs. Different node types can exist within the same WOS Cloud in any combination.

What is an Enclosure?

An Enclosure is a common physical chassis housing one or more nodes. There are currently two types of WOS Enclosures:

  • 3U/16 drives -- single node
  • 4U/60 drives -- two nodes

What types of WOS enclosures are available?

Currently the following enclosure types are available:

  • 4U, dual-node, 30TB and 1 billion objects per node (2 billion total objects, 60TB total storage) -- 30KB average object size -- optimized for large scale WOS Clouds with lots of data
  • 4U, dual-node, 15TB and 1 billion objects per node (2 billion total objects, 30TB total storage) -- 15KB average object size -- optimized for medium scale and cost effectiveness
  • 4U, dual-node, 13.5TB and 1 billion objects per node (2 billion total objects, 27TB total storage) -- 13.5KB average object size -- optimized for large scale, high-performance delivery of small files.
  • 3U, single-node, 16TB and 1 billion objects per node -- 16KB average object size -- optimized for entry-level clusters.
  • 3U, single-node, 7.2TB and 1 billion objects per node -- 7.2KB average object size -- high object read & write rate optimized. Ideal for lots of small files/thumbnail image storage.

Remember, different enclosure types may be mixed within the same WOS Cloud. So you can start with smaller 3U, 16TB enclosures (for example) and expand the Cloud later with 4U, 60TB enclosures.

What is a Zone?

A Zone is a collection of nodes in the WOS Cloud that are logically referenced by WOS policies (see below). Zones are most commonly used for two purposes:

  • To define geographically distinct portions of the Cloud. For example, a zone in Los Angeles and a zone in New York.
  • To define functional portions of the Cloud. For example, a zone designed for high performance or a zone designed for archiving content.

What is a Cloud?

A Cloud consists of all Nodes in all Zones managed together as a single entity.

What is a Policy?

A policy is a user-defined set of rules that determine how content should be distributed throughout the WOS Cloud. Policies are referenced at the time an object is stored into the Cloud. For example, a policy can state that a two copies of objects be made, one in the WOS zone in Los Angeles, and the other in the WOS zone in New York. Multiple policies can be created in each WOS Cloud, but each object may only adhere to a single policy at a time.

What is "Policy Compliance"?

Policy compliance means that the objects stored in WOS all adhere to the rules mandated by their policies. Normally this is the case. However, in the event of drive, node, or enclosure failures, objects will go offline, taking those objects out of policy compliance. WOS automatically recognizes this state and creates new copies of the missing objects to bring the Cloud back into policy compliance. Similarly, changing a policy (for example deciding that in addition to Los Angeles and New York, objects should also reside in a WOS zone in London) will trigger automatic replication of objects with that policy to the London zone, such that the Cloud remains in policy compliance.

What is WOS-LIB?

WOS-LIB (the WOS Library) is the API code DataDirect Networks provides that runs on any system that will directly access the WOS Cloud. WOS-LIB communicates to the Cloud and maintains knowledge of where content is stored within it. When content is requested, WOS-LIB automatically retrieves it from the correct WOS node. There is no ping-ponging between nodes to find content and no need for you to track where objects are stored within the Cloud. The process is seamless and automated. WOS-LIB can be installed on multiple machines, enabling parallel access to the WOS cluster.

Can content in WOS be directly accessed from the Internet (or only from private network)?

A WOS Cloud can be accessed from any place with Internet Protocol (IP) connectivity. If your network is setup so that the WOS nodes are outside your firewall, it is possible to access them directly. Any server accessing the Cloud must have the WOS-LIB.

How do I name a file in WOS?

With WOS, you don't name files. When files are created, WOS will create a globally unique Object Identifier (OID) for that object. This OID is in effect the file name. When you want to access the object, the OID is provided to the WOS Cloud and the object is retrieved.

How do I modify a file in WOS?

WOS supports PUT (write a file), GET (read a file), and DELETE commands. To modify a file, first read it from the Cloud. Make the necessary changes and then write it back into the Cloud. A new OID will be assigned. The old version of the file will still exist under the original OID. This object can be maintained as an object-level snapshot, or a delete command can be issued to remove it.

If an object exists in multiple zones, where will WOS retrieve it from?

WOS intelligently tracks performance in the Cloud using parameters such as load and network latency and will first attempt to retrieve objects from the zone that can provide the lowest latency path, which is generally the zone physically nearest to the requesting server. Alternatively, WOS administrators can assign the WOS-LIB to prefer retrieving objects from a particular zone. In both cases, WOS will retrieve objects from other available locations if the primary or preferred zone is unavailable or if the object doesn't exist in that zone. This behavior helps reduce WAN costs since content only traverses the WAN if need be.

Can I store metadata with my files?

Yes. WOS supports user-defined metadata in the form of key-value pairs. These fields can be used to store information such as file names, search index criteria, file creation dates, or anything else you'd like.

What metadata does WOS keep about objects?

WOS stores the object's policy, its size, and a signature used to detect corrupt data. Additionally, WOS supports user defined metadata fields which can be used for any purpose.

Setup and Management

How difficult is WOS to set up?

Just like WOS provides Extreme Performance and Extreme Scalability, it delivers Extreme Ease-of-Use. Initial WOS setup involves assigning an IP address to each of the nodes and putting them onto the network. Then, simply log into the WOS Web-based Administrator User Interface and complete basic configuration tasks such as establishing Zones and Policies. That's it. You can get a WOS Cloud up and running in minutes.

How does an application integrate with WOS?

Applications communicate with WOS through the WOS Library (WOS-LIB) API. Most customers choose the C++ API because it provides a high performance, callback style interface. If performance is a secondary concern, the API is also available through Java and Python. Other API versions can be made based on customer demand. DataDirect Networks supplies the C++ API libraries which are linked with the customer application. The customer's application uses the WOS API calls such as GET, PUT, and DELETE to utilize the WOS Cloud. The WOS-LIB must run on any machine that directly accesses the Cloud.

How do I administer WOS?

WOS Clouds are administered through a Web interface from any standard web browser. The Cloud is managed as a single entity -- there is no need to individually log into or manage each storage node. The Web interface makes it easy to expand or upgrade the Cloud, monitor performance and available capacity, review and resolve alerts and error conditions, and setup and modify Zones and Policies.

How do I configure RAID levels in WOS?

You don't need to. WOS automatically handles data protection based on the policies you set up. You don't need to configure RAID groups or RAID levels. With WOS, traditional storage administration tasks are rendered obsolete.

How do I configure LUNs in WOS?

You don't need to. WOS automatically allocates available storage as needed and new capacity as it is added to the Cloud. With WOS, traditional storage administration tasks are rendered obsolete.

How large can a WOS file system grow? How many directories and directory levels does it support? How long can my file names be? How many inodes are there?

WOS provides a single global object repository that can exceed 200 billion objects. There are no traditional file system constructs such as file names, directories, or inodes. With WOS, all data is kept in a flat object space and retrieved using Object IDs. You'll never run out of inodes or have to engineer around limitations such as directory file count limits. WOS Clouds can easily grow to tens of Petabytes all in a single easy-to-manage file system.

How are WOS nodes connected to the network?

Currently available WOS nodes are connected by Gigabit Ethernet. Four GigE ports are provided on each node. A single GigE must be connected to utilize the node, and additional GigE ports allow for higher bandwidth scalability. All nodes in the Cloud must have IP connectivity to each other.

How do I upgrade a WOS Cloud?

Upgrading a WOS Cloud is easy. Simply log into the system using the Web interface and browse to the new software image file. Once you approve beginning the upgrade process, WOS will gracefully take individual nodes out of service while continuing to serve objects from other nodes. The node taken out of service will be upgraded and brought back online. Then the process repeats for all remaining nodes. Upgrades are non-disruptive -- the entire WOS Cloud remains operational during upgrades and content continues to be served.

What happens if I change a WOS policy?

If you change a policy, WOS will automatically begin bringing objects corresponding to that policy into policy compliance according to your change. For example, if you had a policy that stored a replica of each object in Los Angeles and New York, and changed it to store a replica in Los Angeles, New York, and London, the Cloud would automatically replicate each object corresponding to that policy to London, thus establishing compliance with the policy change.

How do I add drives to a WOS node?

You don't need to. WOS nodes come preconfigured with their drives. The only reason to open the node is to replace a failed drive. This is a hot-swappable operation that can be done while the node is in service.

How do I add capacity to a WOS Cloud?

Adding capacity is very easy. Simply assign an IP address to the new node and give it the IP address of the existing WOS Cloud. Then, log into the WOS Web management interface. WOS will have discovered the new node. All you need to do is accept the node into the Cloud and designate which Zone the node belongs to. WOS will begin using the new capacity immediately and will also begin balancing the Cloud across the new node.

How do I remove a node from service?

From the Web interface, there is a "Decommission" command. This command will cause WOS to migrate all objects off the node to available capacity elsewhere in the Cloud, ensuring that they are kept in policy compliance. Once this process is complete, the node is gracefully shut down.

Does WOS re-balance itself when new nodes are added?

Yes. As new nodes and new capacity are added to the Cloud, the system automatically balances itself to ensure that no single node runs out of capacity while capacity is still available elsewhere in the Cloud.

What is the performance impact when WOS rebalances?

WOS prioritizes user requests over internal rebalancing operations. The impact to user traffic is minimal. However, a lightly loaded Cloud will complete rebalancing operations faster than a heavily loaded one.

Performance

Does WOS provide high performance for both small and large files?

WOS is optimized for delivering lots of small files (less than a few MB), the types of things typically found on web pages, such as text, photos, thumbnails, embedded animations, etc.

WOS stores large files as well, but is not a replacement for the S2A series of storage systems, which are optimized for delivering high bandwidth (hundreds of MB/s to GB/s) to a single file. The aggregate bandwidth that a WOS Cloud delivers is very large, and single file bandwidth is more than sufficient for high-definition Internet video and progressive downloads, as well as file download services. WOS is not a substitute for the S2A in HPC and Rich Media environments. WOS augments these environments by providing an ideal platform for web-based content delivery, while the S2A environment is used for ingest, transcoding, and preparation of the content to be distributed by WOS.

How many IOPS do I get with WOS?

WOS performance is not measured in IOPS. We prefer terms that are more indicative of real-world performance, namely File Reads Per Second (FRPS) and File Writes Per Second (FWPS). Think about it this way -- in traditional storage and file system environments, it takes multiple IOPS to complete each file operation. The system must perform metadata lookups, retrieve file extent lists, and reassemble the file through RAID operations. These operations all consume IOPS -- as many as 8-10 per file-level operation.

But as the operator of the system, you don't care about all that. What you need to know is how many files can be delivered per second, which is what really impacts the performance of your application. With WOS, we'll tell you how many FRPS and FWPS each node can deliver (each node type is different depending on the number and type of disk drives it contains), which is a much better and more real-world indication of how the system will perform in your environment.

As a rule of thumb, you can count on each 4U/dual-node WOS enclosure delivering multiple thousands of file reads per second (FRPS), even with inexpensive SATA drives. Much higher performance levels are available by using enterprise-class SAS drives. For more detailed WOS performance estimates, please contact DDN at 1-800-TERABYTE.

How much bandwidth do I get with WOS?

Each WOS node contains four Gigabit Ethernet network connections. The node can easily saturate these connections. Because WOS is typically delivering smaller file sizes, bandwidth needs remain moderate and this configuration is sufficient.

How long does it take to format a node?

WOS' format operations are nearly instantaneous. You do not need to wait hours, or even minutes for a new node to format before it can be placed into active service. Nodes can be brought online in just a few seconds. There are no background formatting operations.

Data Protection and Resiliency of the WOS Cloud

How does WOS protect my data?

WOS uses replication to protect data. Because WOS is used to distribute content to multiple data centers, its policy-based replication acts as a built-in disaster recovery mechanism. Even if you plan to use WOS in a single location, it will keep a redundant copy of all information stored in the system. All content replicas are accessed with a common Object ID, and WOS will always attempt to serve content locally first, defaulting to a remote location if the content either doesn't exist locally, or if local resources are offline.

Is WOS replication synchronous?

Yes. Before WOS acknowledges a PUT (object write) command as complete, it first ensures that the replication policy for the object has been met. There is no need to manually reconcile what data exists between WOS Zones, or to otherwise manage the replication process. It is completely automatic.

What happens if I lose network connectivity during WOS replication?

If a portion of your network goes down preventing WOS from reaching a Zone that is the destination for a replication operation, WOS will do three things.

  • It will create additional local or remote (in a different Zone) copies of the object to ensure it is adequately protected to the extent possible given the connectivity loss. At a minimum two local copies of the object will be kept on different nodes.
  • It will return an error through the API so that the system administrator will know that the object was stored out of compliance with its policy settings.
  • When network connectivity is restored, WOS will bring all non-compliant objects back into policy compliance by making replicas in the intended Zones and deleting any extra copies that were made to protect data during the connectivity loss.

What happens if I lose connectivity to a WOS Zone? Does a massive rebuild begin?

No. WOS is intelligent enough to realize that multiple nodes all going down at once is most likely a network connectivity issue. In this case, WOS will alert the system administrator, but will not begin a Zone-level rebuild. Only once an extended timer has elapsed, or if the system administrator manually triggers the process, will a Zone-level rebuild occur.

How does WOS protect against data corruption?

Every object WOS stores passes through an algorithm that generates a unique signature which is stored with the object. When reading objects, WOS checks this signature to make sure the file has not become corrupted. If WOS detects corruption, it will retrieve a replica of the object from elsewhere in the Cloud for delivery to the user. This replica will be used to repair or replace the corrupted copy of the object.

When a drive/node/datacenter fails, how do I know which objects are missing/lost?

You don't need to. WOS automatically detects such failures and intelligently recovers lost objects by creating new replicas, bringing all objects back into policy compliance. Because objects are replicated, any objects lost in the failure will simply be served from elsewhere in the system until the Cloud is brought back into policy compliance.

Can someone guess OIDs to retrieve unauthorized data from the WOS Cloud?

No they cannot. OIDs are unique and include embedded security features that make them non-deterministic. Even if an object is deleted, the same OID will never be handed out again for a new object. Knowing one or more OIDs doesn't provide any information that would allow retrieval of other OIDs.

How fast can WOS rebuild a failed drive? A failed node?

Very fast! Exactly how fast depends on the size of the WOS Cloud. Failed drives and nodes are rebuilt in a distributed fashion, with all remaining resources in the Cloud participating to bring the system back into policy compliance. So the larger the WOS Cloud is, the more resources participate in the rebuild, and the faster it completes. Because of its distributed rebuild capability, WOS typically rebuilds failed drives much faster than the speed at which data could be rebuilt to a single drive. A failed 1TB drive could be rebuilt in much less than an hour in a ten node WOS Cloud.

Does WOS have any single points of failure?

No. WOS is a completely redundant and distributed system. Each node contains redundant components such as power supplies and fans. Disk drive and node failures are seamlessly and automatically corrected without loss of access to data.

Does WOS have any bottlenecks?

No. As a completely distributed system, no single component of WOS sits in the middle of all transactions. This allows WOS to scale to incredibly high levels because as nodes are added to the Cloud, all performance capabilities increase linearly.

How do I prevent unauthorized access to data stored in WOS?

Obtaining unauthorized access to data stored on WOS requires several other security breaches to take place first. Organizations generally have protection against these breaches in their existing network and server environments. WOS Clouds are typically run on secure private networks, so first an attacker would have to breach network security protocols to obtain access to the system. Even if this were to happen, the attacker could not access WOS because he would be unable to communicate with the system without running WOS-LIB. If the attacker somehow obtained WOS-LIB and used it to obtain access to the Cloud, he would still have to gain access to the application's database of OIDs to be able to gain access to any files. It is nearly inconceivable that an attacker would be able to break through network security, integrate WOS-LIB, and break through application and server security to make all this happen.

Even if all these unlikely events were to happen, the WOS Cloud can still be protected. The system administrator is able to create a list of authorized WOS-LIB clients which are allowed to connect to the Cloud. An attacker's unauthorized client is thus prevented from gaining access to WOS.