Ceph: What It IsCeph is open source, software-defined distributed storage maintained by Red Hat since their acquisition of InkTank in April 2014.
- partial or complete reads and writes
- atomic transactions with features like append, truncate and clone range
- object level key-value mappings
- Thinly provisioned
- Resizable images
- Image import/export
- Image copy or rename
- Read-only snapshots
- Revert to snapshots
- Ability to mount with Linux or QEMU KVM clients!
- It provides stronger data safety for mission-critical applications.
- It provides virtually unlimited storage to file systems.
- Applications that use file systems can use Ceph FS with POSIX semantics. No integration or customization required!
- Ceph automatically balances the file system to deliver maximum performance.
Red hat ceph storage customer presentation
Why Ceph is HOTIn many ways Ceph is a unique animal—it’s the only storage solution that deliver four critical capabilities:
- unified storage (object, block, file).
Red Hat Ceph Storage: Past, Present and Future
Software-defined means deployment flexibility, faster hardware upgrades, and lower cost
Scale-out means it’s less expensive to build large systems and easier to manage them
Block + Object means more flexibility (most other storage products are block only, file only, object only, or file+block; block+object is very rare)
Enterprise features mean a reasonable amount of efficiency and data protection
Quick and Easy Deployment of a Ceph Storage Cluster with SLES
Ceph includes many basic enterprise storage features including: replication (or erasure coding), snapshots, thin provisioning, auto-tiering (ability to shift data between flash and hard drives), self-healing capabilities
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Despite all that Ceph has to offer there are still two camps: those that love it and those that dismiss it.
I Love Ceph!
The nature of Ceph means some of the storage world loves it, or at least has very high hopes for it. Generally server vendors love Ceph because it lets them sell servers as enterprise storage, without needing to develop and maintain complex storage software. The drive makers (of both spinners and SSDs) want to love Ceph because it turns their drive components into a storage system. It also lowers the cost of the software and controller components of storage, leaving more money to spend on drives and flash.
On the other hand, many established storage hardware and software vendors hope Ceph will fade into obscurity. Vendors who already developed richly featured software don’t like it because it’s cheaper competition and applies downward price pressure on their software. Those who sell tightly coupled storage hardware and software fear it because they can’t revise their hardware as quickly or sell it as cheaply as the commodity server vendors used by most Ceph customers.
Battle of the Titans – ScaleIO vs. Ceph at OpenStack Summit Tokyo 2015 (Full Video)
To be honest, Ceph isn’t perfect for everyone. It’s not the most efficient at using flash or CPU (but it’s getting better), the file storage feature isn’t fully mature yet, and it is missing key efficiency features like deduplication and compression. And some customers just aren’t comfortable with open-source or software-defined storage of any kind. But every release of Ceph adds new features and improved performance, while system integrators build turnkey Ceph appliances that make it easy to deploy and come with integrated hardware and software support.
What’s Next for Ceph?
EMC- Battle of the Titans: Real-time Demonstration of Ceph vs. ScaleIO Performance for Block Storage
Ceph continues to evolve, backed by both Red Hat (which acquired Inktank in 2014) and by a community of users and vendors who want to see it succeed. In every release it gets faster, gains new features, and becomes easier to manage.
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
Ceph is basically a fault-tolerant distributed clustered filesystem. If it works, that’s like a nirvana for shared storage: you have many servers, each one pitches in a few disks, and the there’s a filesystem that sits on top that visible to all servers in the cluster. If a disk fails, that’s okay too.
Those are really cool features, but it turns out that Ceph is really more than just that. To borrow a phrase, Ceph is like an onion – it’s got layers. The filesystem on top is nifty, but the coolest bits are below the surface.
If Ceph proves to be solid enough for use, we’ll need to train our sysadmins all about Ceph. That means pretty diagrams and explanations, which we thought would be more fun to share you.
Building exascale active archives with Red Hat Ceph Storage
This is the logical diagram that we came up with while learning about Ceph. It might help to keep it open in another window as you read a description of the components and services.
We’ll start at the bottom of the stack and work our way up.
OSD stands for Object Storage Device, and roughly corresponds to a physical disk. An OSD is actually a directory (eg.
) that Ceph makes use of, residing on a regular filesystem, though it should be assumed to be opaque for the purposes of using it with Ceph.
Use of XFS or btrfs is recommended when creating OSDs, owing to their good performance, featureset (support for XATTRs larger than 4KiB) and data integrity.
We’re using btrfs for our testing.
Using RAIDed OSDs
A feature of Ceph is that it can tolerate the loss of OSDs. This means we can theoretically achieve fantastic utilisation of storage devices by obviating the need for RAID on every single device.
However, we’ve not yet determined whether this is awesome. At this stage we’re not using RAID, and just letting Ceph take care of block replication.
Also referred to as PGs, the official docs note that placement groups help ensure performance and scalability, as tracking metadata for each individual object would be too costly.
A PG collects objects from the next layer up and manages them as a collection. It represents a mostly-static mapping to one or more underlying OSDs. Replication is done at the PG layer: the degree of replication (number of copies) is asserted higher, up at the Pool level, and all PGs in a pool will replicate stored objects into multiple OSDs.
As an example in a system with 3-way replication:
- PG-1 might map to OSDs 1, 37 and 99
- PG-2 might map to OSDs 4, 22 and 41
- PG-3 might map to OSDs 18, 26 and 55
Any object that happens to be stored on PG-1 will be written to all three OSDs (1,37,99). Any object stored in PG-2 will be written to its three OSDs (4,22,41). And so on.
A pool is the layer at which most user-interaction takes place. This is the important stuff like GET, PUT, DELETE actions for objects in a pool.
Pools contain a number of PGs, not shared with other pools (if you have multiple pools). The number of PGs in a pool is defined when the pool is first created, and can’t be changed later. You can think of PGs as providing a hash mapping for objects into OSDs, to ensure that the OSDs are filled evenly when adding objects to the pool.
The Future of Cloud Software Defined: Andrew Hatfield, Red Hat
CRUSH mappings are specified on a per-pool basis, and serve to skew the distribution of objects into OSDs according to administrator-defined policy. This is important for ensuring that replicas don’t end up on the same disk/host/rack/etc, which would break the entire point of having replicant copies.
A CRUSH map is written by hand, then compiled and passed to the cluster.
Focus on: Red Hat Storage big data
This may not make much sense at the moment, and that’s completely understandable. Someone on the Ceph mailing list provided a brief summary of the components which we found helpful for clarifying things:
Now we’re into the good stuff. Pools full of objects are well and good, but what do you do with it now?
What the lower layers ultimately provide is a RADOS cluster: Reliable Autonomic Distributed Object Store. At a practical level this translates to storing opaque blobs of data (objects) in high performance shared storage.
Because RADOS is fairly generic, it’s ideal for building more complex systems on top. One of these is RBD.
Decoupling Storage from Compute in Apache Hadoop with Ceph
As the name suggests, a RADOS Block Device (RBD) is a block device stored in RADOS. RBD offers useful features on top of raw RADOS objects. From the official docs:
- RBDs are striped over multiple PGs for performance
- RBDs are resizable
- Thin provisioning means on-disk space isn’t used until actually required
RBD also takes advantage of RADOS capabilities such as snapshotting and cloning, which would be very handy for applications like virtual machine disks.
Red Hat Storage Day Boston - Why Software-defined Storage Matters
CephFS is a POSIX-compliant clustered filesystem implemented on top of RADOS. This is very elegant because the lower layer features of the stack provide really awesome filesystem features (such as snapshotting), while the CephFS layer just needs to translate that into a usable filesystem.
CephFS isn’t considered ready for prime-time just yet, but RADOS and RBD are.
Kraken Ceph Dashboard
Apache: Big Data North America 2016 https://www.youtube.com/watch?v=hTfIAWhd3qI&list=PLGeM09tlguZQ3ouijqG4r1YIIZYxCKsLp
DISTRIBUTED STORAGE PERFORMANCE FOR OPENSTACK CLOUDS: RED HAT STORAGE SERVER VS. CEPH STORAGE http://docplayer.net/2905788-Distributed-storage-performance-for-openstack-clouds-red-hat-storage-server-vs-ceph-storage.html
Red Hat Announces Ceph Storage 2 http://www.storagereview.com/red_hat_announces_ceph_storage_2
Red Hat Ceph Storage