The Growth Of IBM Storage Ceph – The Ideal Foundation For A Modern Data Lakehouse

It’s been one year since IBM integrated Red Hat storage product roadmaps and teams into IBM Storage. In that time, organizations have been faced with unprecedented data challenges to scale AI due to the rapid growth of data in more locations and formats, but with poorer quality. Helping clients combat this problem has meant modernizing their infrastructure with cutting-edge solutions as a part of their digital transformations. Largely, this involves delivering consistent application and data storage across on-premises and cloud environments. Also, crucially, this includes helping clients adopt cloud-native architectures to realize the benefits of public cloud like cost, speed, and elasticity. Formerly Red Hat Ceph – now IBM Storage Ceph – a state-of-the-art open-source software-defined storage platform, is a keystone in this effort.

Software-defined storage (SDS) has emerged as a transformative force when it comes to data management, offering a host of advantages over traditional legacy storage arrays including extreme flexibility and scalability that are well-suited to handle modern uses cases like generative AI. With IBM Storage Ceph, storage resources are abstracted from the underlying hardware, allowing for dynamic allocation and efficient utilization of data storage. This flexibility not only simplifies management but also enhances agility in adapting to evolving business needs and scaling compute and capacity as new workloads are introduced. This self-healing and self-managing platform is designed to deliver unified file, block, and object storage services at scale on industry standard hardware. Unified storage helps provide clients a bridge from legacy applications running on independent file or block storage to a common platform that includes those and object storage in a single appliance.

Ceph is optimized for large single and multisite deployments and can efficiently scale to support hundreds of petabytes of data and tens of billions of objects, which is key for traditional and newer generative AI workloads. The scalability, resiliency, and security of IBM Storage Ceph make it ideal to support data lakehouse and AI/ML open-source frameworks, in addition to more traditional workloads such as MySQL and MongoDB on Red Hat OpenShift or RedHat OpenStack. It’s one reason why 768 TiB raw capacity of IBM Storage Ceph is included in watsonx.data, IBM’s open, governed, fit-for-purpose data lakehouse architecture optimized for data, analytics, and AI workloads.

The Right-fit Foundation for Compute-Intensive and Data-Intensive Workloads

The explosive growth of unstructured data and generative AI share a symbiotic relationship, each influencing and benefiting the other. In its Top Trends in Enterprise Data Storage 2023 report, Gartner® states that “by 2028, large enterprises will triple their unstructured data capacity across their on premises, edge and public cloud locations, compared to mid-2023.” The proliferation of unstructured data, such as text, images, and videos, provides a vast and diverse source for training generative AI models. In turn, generative AI assists in making sense of and extracting valuable insights from the ever-expanding pool of unstructured data. This synergy results in a feedback loop where generative AI thrives on the abundance of unstructured data, and the continuous generation of realistic data by AI further enriches and refines your understanding of unstructured datasets, fostering innovation and advancements.

“By 2028, 70% of file and object data will be deployed on a consolidated unstructured data storage platform, up from 35% in early 2023,” according to the same Gartner® report. Organizations, therefore, need a storage management solution capable of accelerated data ingest, data cleansing and classification, metadata management and augmentation, and cloud-scale capacity management and deployment, such as software-defined storage. IBM Storage Ceph scales out seamlessly to meet these growing data demands. Its self-managing capabilities ensure that the system continuously adapts to constantly changing conditions, making the solution hassle-free while easily maintaining data integrity.

To accelerate and scale the impact of data and AI across an organization – and ultimately improve business outcomes – companies must be hybrid by design. This includes the ability to consume storage services on-prem with a cloud-native operating model to address issues such as the need for enterprise features sets unavailable on public cloud, data sovereignty considerations, and cost. The plug-and-play architecture of IBM Storage Ceph simplifies integration with existing infrastructures, including various platforms, cloud environments, hypervisors, open source data repositories like Apache Iceberg or Apache Parquet, and complete solution stacks such as watsonx.ai, watsonx.data, and others. New nodes or devices can be added to the cluster seamlessly, without having disruptions or service downtime. It delivers an easy and efficient way for clients to build a data lakehouse with watsonx.data and other next-generation AI workloads.

At Snap, our requirement to store more and more data continues to expand, and we need a platform that can scale quickly, satisfy our performance KPIs, and be cost effective all at the same time. IBM Storage Ceph is the platform of choice with its simple scalable architecture, easy to manage interface, and cost-effective software-defined deployment. Having world-class expertise and support from IBM is another important part of our decision to use IBM Storage Ceph for such a critical component of our business. — Snap Inc

Fast Data Access with NVMe over TCP

In the last year, IBM has introduced several important updates to Ceph, including, most recently, IBM Storage Ceph 7.0. This next generation Ceph platform prepares for NVMe/TCP capabilities which are designed to enable faster data transfer between storage devices, servers, and cloud platforms by retaining the low latency and high bandwidth characteristics of traditional NVMe. This makes it suitable for applications that demand ultra-fast storage access, such as databases, analytics, and content delivery, and it simplifies the infrastructure due to its compatibility with traditional network technology investments. These benefits will help clients adopt a software-defined approach designed to deliver a cloud-like experience in terms of speed, agility, and economics.

NVMe/TCP can help Ceph bridge the gap for traditional block storage with scale-out architectures. With NVMe/TCP, Ceph will be designed to integrate with platforms like VMware to help enterprises replicate cloud architectures in their own data center, moving away from expensive and rigid SAN networks and monolithic storage arrays.

Additional new features included in Ceph 7.0:

SEC and FINRA compliancy certification for WORM with object lock, enabling WORM compliance for object storage
NFS support for CephFS filesystem access for non-native Ceph clients
For more details on features, visit the IBM Storage community here

Cloud Economies of Scale with IBM Storage Ceph

Because IBM Storage Ceph stores data as objects within logical storage pools, a single cluster can have multiple pools, each tuned to different performance or capacity requirements. This allows clients to benefit from easier and faster access to data with content and context classifications, storage capacity limited only by the size of an organization’s infrastructure, and cost reductions at scale by removing hardware restrictions compared to traditional and legacy storage array architectures.

Faster Time to Value

IBM has also made deployment for Ceph easier than ever before. With IBM Storage Ready Nodes for Ceph, the platform can be deployed as a complete software and hardware solution and comes in a variety of different capacity configurations optimized for running IBM Storage Ceph workloads. We’ve taken all the guesswork out of configuration, making it easier to digest, configure, and administer.

The growth of IBM Storage Ceph is just another example of how IBM’s storage hardware and software portfolio helps provide faster time to value with scaled capacity and performance to optimize costs for clients.