• IBM Consulting

    DBA Consulting can help you with IBM BI and Web related work. Also IBM Linux is our portfolio.

  • Oracle Consulting

    For Oracle related consulting and Database work and support and Migration call DBA Consulting.

  • Novell/RedHat Consulting

    For all Novell Suse Linux and SAP on Suse Linux questions releated to OS and BI solutions. And offcourse also for the great RedHat products like RedHat Enterprise Server and JBoss middelware and BI on RedHat.

  • Microsoft Consulting

    For Microsoft Server 2012 onwards, Microsoft Client Windows 7 and higher, Microsoft Cloud Services (Azure,Office 365, etc.) related consulting services.

  • Citrix Consulting

    Citrix VDI in a box, Desktop Vertualizations and Citrix Netscaler security.

  • Web Development

    Web Development (Static Websites, CMS Websites (Drupal 7/8, WordPress, Joomla, Responsive Websites and Adaptive Websites).

23 December 2016

Software Defined Storage and Ceph - What Is all the Fuss About?


Ceph: What It Is

Ceph is open source, software-defined distributed storage maintained by Red Hat since their acquisition of InkTank in April 2014.

The power of Ceph can transform your organization’s IT infrastructure and your ability to manage vast amounts of data. If your organization runs applications with different storage interface needs, Ceph is for you! Ceph’s foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides your applications with object, block, and file system storage in a single unified storage cluster—making Ceph flexible, highly reliable and easy for you to manage.
Ceph’s RADOS provides you with extraordinary data storage scalability—thousands of client hosts or KVMs accessing petabytes to exabytes of data. Each one of your applications can use the object, block or file system interfaces to the same RADOS cluster simultaneously, which means your Ceph storage system serves as a flexible foundation for all of your data storage needs. You can use Ceph for free, and deploy it on economical commodity hardware. Ceph is a better way to store data.

OBJECT STORAGE
Ceph provides seamless access to objects using native language bindings or radosgw, a REST interface that’s compatible with applications written for S3 and Swift.

OBJECT STORAGE

Ceph’s software libraries provide client applications with direct access to the RADOS object-based storage system, and also provide a foundation for some of Ceph’s advanced features, including RADOS Block Device (RBD), RADOS Gateway, and the Ceph File System.

LIBRADOS
The Ceph librados software libraries enable applications written in C, C++, Java, Python and PHP to access Ceph’s object storage system using native APIs. The librados libraries provide advanced features, including:
  • partial or complete reads and writes
  • snapshots
  • atomic transactions with features like append, truncate and clone range
  • object level key-value mappings

REST GATEWAY
RADOS Gateway provides Amazon S3 and OpenStack Swift compatible interfaces to the RADOS object store.

BLOCK STORAGE
Ceph’s RADOS Block Device (RBD) provides access to block device images that are striped and replicated across the entire storage cluster.

BLOCK STORAGE

Ceph’s object storage system isn’t limited to native binding or RESTful APIs. You can mount Ceph as a thinly provisioned block device! When you write data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph’s RADOS Block Device (RBD) also integrates with Kernel Virtual Machines (KVMs), bringing Ceph’s virtually unlimited storage to KVMs running on your Ceph clients.

HOW IT WORKS
Ceph RBD interfaces with the same Ceph object storage system that provides the librados interface and the Ceph FS file system, and it stores block device images as objects. Since RBD is built on top of librados, RBD inherits librados capabilites, including read-only snapshots and revert to snapshot. By striping images across the cluster, Ceph improves read access performance for large block device images.

BENEFITS
  • Thinly provisioned
  • Resizable images
  • Image import/export
  • Image copy or rename
  • Read-only snapshots
  • Revert to snapshots
  • Ability to mount with Linux or QEMU KVM clients!

FILE SYSTEM
Ceph provides a POSIX-compliant network file system that aims for high performance, large data storage, and maximum compatibility with legacy applications.

FILE SYSTEM

Ceph’s object storage system offers a significant feature compared to many object storage systems available today: Ceph provides a traditional file system interface with POSIX semantics. Object storage systems are a significant innovation, but they complement rather than replace traditional file systems. As storage requirements grow for legacy applications, organizations can configure their legacy applications to use the Ceph file system too! This means you can run one storage cluster for object, block and file-based data storage.

HOW IT WORKS
Ceph’s file system runs on top of the same object storage system that provides object storage and block device interfaces. The Ceph metadata server cluster provides a service that maps the directories and file names of the file system to objects stored within RADOS clusters. The metadata server cluster can expand or contract, and it can rebalance the file system dynamically to distribute data evenly among cluster hosts. This ensures high performance and prevents heavy loads on specific hosts within the cluster.

BENEFITS
The Ceph file system provides numerous benefits:
  • It provides stronger data safety for mission-critical applications.
  • It provides virtually unlimited storage to file systems.
  • Applications that use file systems can use Ceph FS with POSIX semantics. No integration or customization required!
  • Ceph automatically balances the file system to deliver maximum performance.




Red hat ceph storage customer presentation



It’s capable of block, object, and file storage, though only block and object are currently deployed in production.  It is scale-out, meaning multiple Ceph storage nodes (servers) cooperate to present a single storage system that easily handles many petabytes (1PB = 1,000 TB = 1,000,000 GB) and increase both performance and capacity at the same time. Ceph has many basic enterprise storage features including replication (or erasure coding), snapshots, thin provisioning, tiering (ability to shift data between flash and hard drives), and self-healing capabilities.


Why Ceph is HOT

In many ways Ceph is a unique animal—it’s the only storage solution that deliver four  critical  capabilities:
  • open-source
  • software-defined
  • enterprise-class
  • unified storage (object, block, file).
Many other storage products are open source or scale out or software-defined or unified or have enterprise features, and some let you pick 2 out of 3, but almost nothing else offers all four together.

Red Hat Ceph Storage: Past, Present and Future



Open source means lower cost
Software-defined means deployment flexibility, faster hardware upgrades, and lower cost
Scale-out means it’s less expensive to build large systems and easier to manage them
Block + Object means more flexibility (most other storage products are block only, file only, object only, or file+block; block+object is very rare)
Enterprise features mean a reasonable amount of efficiency and data protection

Quick and Easy Deployment of a Ceph Storage Cluster with SLES 


Ceph includes many basic enterprise storage features including: replication (or erasure coding), snapshots, thin provisioning, auto-tiering (ability to shift data between flash and hard drives), self-healing capabilities

Red Hat Storage Day New York - What's New in Red Hat Ceph Storage



Despite all that Ceph has to offer there are still two camps: those that love it and those that dismiss it.

I Love Ceph!
The nature of Ceph means some of the storage world loves it, or at least has very high hopes for it. Generally server vendors love Ceph because it lets them sell servers as enterprise storage, without needing to develop and maintain complex storage software. The drive makers (of both spinners and SSDs) want to love Ceph because it turns their drive components into a storage system. It also lowers the cost of the software and controller components of storage, leaving more money to spend on drives and flash.

Ceph, Meh!
On the other hand, many established storage hardware and software vendors hope Ceph will fade into obscurity. Vendors who already developed richly featured software don’t like it because it’s cheaper competition and applies downward price pressure on their software. Those who sell tightly coupled storage hardware and software fear it because they can’t revise their hardware as quickly or sell it as cheaply as the commodity server vendors used by most Ceph customers.

Battle of the Titans – ScaleIO vs. Ceph at OpenStack Summit Tokyo 2015 (Full Video)



To be honest, Ceph isn’t perfect for everyone. It’s not the most efficient at using flash or CPU (but it’s getting better), the file storage feature isn’t fully mature yet, and it is missing key efficiency features like deduplication and compression. And some customers just aren’t comfortable with open-source or software-defined storage of any kind. But every release of Ceph adds new features and improved performance, while system integrators build turnkey Ceph appliances that make it easy to deploy and come with integrated hardware and software support.
What’s Next for Ceph?

EMC- Battle of the Titans: Real-time Demonstration of Ceph vs. ScaleIO Performance for Block Storage


Ceph continues to evolve, backed by both Red Hat (which acquired Inktank in 2014) and by a community of users and vendors who want  to see it succeed.  In every release it gets faster, gains new features, and becomes easier to manage.

The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat



Ceph is basically a fault-tolerant distributed clustered filesystem. If it works, that’s like a nirvana for shared storage: you have many servers, each one pitches in a few disks, and the there’s a filesystem that sits on top that visible to all servers in the cluster. If a disk fails, that’s okay too.

Those are really cool features, but it turns out that Ceph is really more than just that. To borrow a phrase, Ceph is like an onion – it’s got layers. The filesystem on top is nifty, but the coolest bits are below the surface.
If Ceph proves to be solid enough for use, we’ll need to train our sysadmins all about Ceph. That means pretty diagrams and explanations, which we thought would be more fun to share you.

Building exascale active archives with Red Hat Ceph Storage



Diagram
This is the logical diagram that we came up with while learning about Ceph. It might help to keep it open in another window as you read a description of the components and services.



Ceph components
We’ll start at the bottom of the stack and work our way up.

OSDs
OSD stands for Object Storage Device, and roughly corresponds to a physical disk. An OSD is actually a directory (eg.
/var/lib/ceph/osd-1
) that Ceph makes use of, residing on a regular filesystem, though it should be assumed to be opaque for the purposes of using it with Ceph.

Use of XFS or btrfs is recommended when creating OSDs, owing to their good performance, featureset (support for XATTRs larger than 4KiB) and data integrity.

We’re using btrfs for our testing.

Using RAIDed OSDs
A feature of Ceph is that it can tolerate the loss of OSDs. This means we can theoretically achieve fantastic utilisation of storage devices by obviating the need for RAID on every single device.

However, we’ve not yet determined whether this is awesome. At this stage we’re not using RAID, and just letting Ceph take care of block replication.


Placement Groups
Also referred to as PGs, the official docs note that placement groups help ensure performance and scalability, as tracking metadata for each individual object would be too costly.

A PG collects objects from the next layer up and manages them as a collection. It represents a mostly-static mapping to one or more underlying OSDs. Replication is done at the PG layer: the degree of replication (number of copies) is asserted higher, up at the Pool level, and all PGs in a pool will replicate stored objects into multiple OSDs.

As an example in a system with 3-way replication:


  • PG-1 might map to OSDs 1, 37 and 99
  • PG-2 might map to OSDs 4, 22 and 41
  • PG-3 might map to OSDs 18, 26 and 55
  • Etc.


Any object that happens to be stored on PG-1 will be written to all three OSDs (1,37,99). Any object stored in PG-2 will be written to its three OSDs (4,22,41). And so on.

Pools
A pool is the layer at which most user-interaction takes place. This is the important stuff like GET, PUT, DELETE actions for objects in a pool.

Pools contain a number of PGs, not shared with other pools (if you have multiple pools). The number of PGs in a pool is defined when the pool is first created, and can’t be changed later. You can think of PGs as providing a hash mapping for objects into OSDs, to ensure that the OSDs are filled evenly when adding objects to the pool.

The Future of Cloud Software Defined: Andrew Hatfield, Red Hat


CRUSH maps
CRUSH mappings are specified on a per-pool basis, and serve to skew the distribution of objects into OSDs according to administrator-defined policy. This is important for ensuring that replicas don’t end up on the same disk/host/rack/etc, which would break the entire point of having replicant copies.

A CRUSH map is written by hand, then compiled and passed to the cluster.

Focus on: Red Hat Storage big data


Still confused?
This may not make much sense at the moment, and that’s completely understandable. Someone on the Ceph mailing list provided a brief summary of the components which we found helpful for clarifying things:


Ceph services
Now we’re into the good stuff. Pools full of objects are well and good, but what do you do with it now?

RADOS
What the lower layers ultimately provide is a RADOS cluster: Reliable Autonomic Distributed Object Store. At a practical level this translates to storing opaque blobs of data (objects) in high performance shared storage.

Because RADOS is fairly generic, it’s ideal for building more complex systems on top. One of these is RBD.

Decoupling Storage from Compute in Apache Hadoop with Ceph



RBD
As the name suggests, a RADOS Block Device (RBD) is a block device stored in RADOS. RBD offers useful features on top of raw RADOS objects. From the official docs:

  • RBDs are striped over multiple PGs for performance
  • RBDs are resizable
  • Thin provisioning means on-disk space isn’t used until actually required

RBD also takes advantage of RADOS capabilities such as snapshotting and cloning, which would be very handy for applications like virtual machine disks.

Red Hat Storage Day Boston - Why Software-defined Storage Matters



CephFS
CephFS is a POSIX-compliant clustered filesystem implemented on top of RADOS. This is very elegant because the lower layer features of the stack provide really awesome filesystem features (such as snapshotting), while the CephFS layer just needs to translate that into a usable filesystem.

CephFS isn’t considered ready for prime-time just yet, but RADOS and RBD are.

Kraken Ceph Dashboard



More Information:

http://slides.com/karansingh-1/deck

https://www.redhat.com/en/technologies/storage/ceph

https://scalableinformatics.com/unison

http://storagefoundry.net/collections/nautilus/ceph

http://www.fujitsu.com/global/products/computing/storage/eternus-cd/

http://www.mellanox.com/page/ethernet_switch_overview

http://www.mellanox.com/page/products_overview

http://www.mellanox.com/page/infiniband_cards_overview

http://ceph.com/category/webinars/

http://www.virtualtothecore.com/en/adventures-ceph-storage-part-1-introduction/

http://ceph.com/community/blog/

http://docs.ceph.com/docs/master/architecture/

http://karan-mj.blogspot.nl/2014/01/how-data-is-stored-in-ceph-cluster.html

https://www.redhat.com/en/about/press-releases/red-hat-unveils-red-hat-ceph-storage-2-enhanced-object-storage-capabilities-improved-ease-use

http://www.anchor.com.au/

http://www.anchor.com.au/blog/2012/09/a-crash-course-in-ceph/

https://www.hastexo.com/blogs/florian/2012/03/08/ceph-tickling-my-geek-genes

https://github.com/cholcombe973/ceph-dash-charm

Apache: Big Data North America 2016   https://www.youtube.com/watch?v=hTfIAWhd3qI&list=PLGeM09tlguZQ3ouijqG4r1YIIZYxCKsLp


DISTRIBUTED STORAGE PERFORMANCE FOR OPENSTACK CLOUDS: RED HAT STORAGE SERVER VS. CEPH STORAGE   http://docplayer.net/2905788-Distributed-storage-performance-for-openstack-clouds-red-hat-storage-server-vs-ceph-storage.html


Red Hat Announces Ceph Storage 2  http://www.storagereview.com/red_hat_announces_ceph_storage_2


Red Hat Ceph Storage
https://access.redhat.com/products/red-hat-ceph-storage

22 November 2016

Service Fabric as a microservices platform


What is a microservice?

Introduction to Microservices


There are different definitions of microservices. If you search the Internet, you'll find many useful resources that provide their own viewpoints and definitions. However, most of the following characteristics of microservices are widely agreed upon:

  • Encapsulate a customer or business scenario. What is the problem you are solving?
  • Developed by a small engineering team.
  • Written in any programming language and use any framework.
  • Consist of code and (optionally) state, both of which are independently versioned, deployed, and scaled.
  • Interact with other microservices over well-defined interfaces and protocols.
  • Have unique names (URLs) used to resolve their location.
  • Remain consistent and available in the presence of failures.

You can summarize these characteristics into:

  • Microservice applications are composed of small, independently versioned, and scalable customer-focused services that communicate with each other over standard protocols with well-defined interfaces.
  • Allows code and state to be independently versioned, deployed, and scaled

However you choose to write your microservices, the code and optionally the state should independently deploy, upgrade, and scale. This is actually one of the harder problems to solve, because it comes down to your choice of technologies. For scaling, understanding how to partition (or shard) both the code and state is challenging. When the code and state use separate technologies, which is common today, the deployment scripts for your microservice need to be able to cope with scaling them both. This is also about agility and flexibility, so you can upgrade some of the microservices without having to upgrade all of them at once.

Azure Service Fabric

Monolithic vs. microservice design approach

All applications evolve over time. Successful applications evolve by being useful to people. Unsuccessful applications do not evolve and eventually are deprecated. The question becomes: How much do you know about your requirements today, and what will they be in the future? For example, let's say that you are building a reporting application for a department. You are sure that the application will remain within the scope of your company and that the reports will be short-lived. Your choice of approach is different from, say, building a service that delivers video content to tens of millions of customers.

Azure Service Fabric 101 - Introduction


Sometimes, getting something out the door as proof of concept is the driving factor, while you know that the application can be redesigned later. There is little point in over-engineering something that never gets used. It’s the usual engineering trade-off. On the other hand, when companies talk about building for the cloud, the expectation is growth and usage. The issue is that growth and scale are unpredictable. We would like to be able to prototype quickly while also knowing that we are on a path to deal with future success. This is the lean startup approach: build, measure, learn, and iterate.

During the client-server era, we tended to focus on building tiered applications by using specific technologies in each tier. The term monolithic application has emerged for these approaches. The interfaces tended to be between the tiers, and a more tightly coupled design was used between components within each tier. Developers designed and factored classes that were compiled into libraries and linked together into a few executables and DLLs.



There are benefits to such a monolithic design approach. It's often simpler to design, and it has faster calls between components, because these calls are often over interprocess communication (IPC). Also, everyone tests a single product, which tends to be more people-resource efficient. The downside is that there's a tight coupling between tiered layers, and you cannot scale individual components. If you need to perform fixes or upgrades, you have to wait for others to finish their testing. It is more difficult to be agile.



Microservices address these downsides and more closely align with the preceding business requirements, but they also have both benefits and liabilities. The benefits of microservices are that each one typically encapsulates simpler business functionality, which you scale up or down, test, deploy, and manage independently. One important benefit of a microservice approach is that teams are driven more by business scenarios than by technology, which the tiered approach encourages. In practice, smaller teams develop a microservice based on a customer scenario and use any technologies they choose.

Exploring Microservices in Docker and Microsoft Azure


In other words, the organization doesn’t need to standardize tech to maintain monolithic applications. Individual teams that own services can do what makes sense for them based on team expertise or what’s most appropriate to solve the problem. In practice, a set of recommended technologies, such as a particular NoSQL store or web application framework, is preferable.

The downside of microservices comes in managing the increased number of separate entities and dealing with more complex deployments and versioning. Network traffic between the microservices increases as well as the corresponding network latencies. Lots of chatty, granular services are a recipe for a performance nightmare. Without tools to help view these dependencies, it is hard to “see” the whole system.

Standards make the microservice approach work by agreeing on how to communicate and being tolerant of only the things you need from a service, rather than rigid contracts. It is important to define these contracts up front in the design, because services update independently of each other. Another description coined for designing with a microservices approach is “fine-grained service-oriented architecture (SOA).”

At its simplest, the microservices design approach is about a decoupled federation of services, with independent changes to each, and agreed-upon standards for communication.

As more cloud apps are produced, people discover that this decomposition of the overall app into independent, scenario-focused services is a better long-term approach.

Returning to the monolithic versus microservice approach for a moment, the following diagram shows the differences in the approach to storing state.

State storage between application styles



A monolithic app contains domain-specific functionality and is normally divided by functional layers, such as web, business, and data.
You scale a monolithic app by cloning it on multiple servers/virtual machines/containers.
A microservice application separates functionality into separate smaller services.
The microservices approach scales out by deploying each service independently, creating instances of these services across servers/virtual machines/containers.
Designing with a microservice approach is not a panacea for all projects, but it does align more closely with the business objectives described earlier. Starting with a monolithic approach might be acceptable if you know that you will not have the opportunity to rework the code later into a microservice design if necessary. More commonly, you begin with a monolithic app and slowly break it up in stages, starting with the functional areas that need to be more scalable or agile.



To summarize, the microservice approach is to compose your application of many small services. The services run in containers that are deployed across a cluster of machines. Smaller teams develop a service that focuses on a scenario and independently test, version, deploy, and scale each service so that the entire application can evolve.


The objective of Service Fabric is to reduce the complexities of building applications with a microservice approach, so that you do not have to go through as many costly redesigns. Start small, scale when needed, deprecate services, add new ones, and evolve with customer usage is the approach. We also know that there are many other problems yet to be solved to make microservices more approachable for most developers. Containers and the actor programming model are examples of small steps in that direction, and we are sure that more innovations will emerge to make this easier.

Explore Microservices solutions and Microsoft Azure Service Fabric


Simplify building microservice-based applications and lifecycle management

Fast time to market: Service Fabric lets developers focus on building features that add business value to their application, without the overhead of designing and writing additional code to deal with issues of reliability, scalability, or latency in the underlying infrastructure.

Choose your architecture: Build stateless or stateful microservices—an architectural approach where complex applications are composed of small, independently versioned services—to power the most complex, low-latency, data-intensive scenarios and scale them into the cloud with Azure Service Fabric.

Microservice agility: Architecting fine-grained microservice applications allows continuous integration and development practices and accelerates delivery of new functions into the application.

Visual Studio integration: Includes Visual Studio tooling, as well as command line support, so developers can quickly and easily build, test, debug, deploy, and update their Service Fabric applications on single-box, test, and production deployments.

Service Fabric as a microservices platform

Azure Service Fabric emerged from a transition by Microsoft from delivering box products, which were typically monolithic in style, to delivering services. The experience of building and operating large services, such as Azure SQL Database and Azure DocumentDB, shaped Service Fabric. The platform evolved over time as more and more services adopted it. Importantly, Service Fabric had to run not only in Azure but also in standalone Windows Server deployments.

The aim of Service Fabric is to solve the hard problems of building and running a service and utilize infrastructure resources efficiently, so that teams can solve business problems using a microservices approach.

Service Fabric provides two broad areas to help you build applications that use a microservices approach:

A platform that provides system services to deploy, upgrade, detect, and restart failed services, discover service location, manage state, and monitor health. These system services in effect enable many of the characteristics of microservices previously described.
Programming APIs, or frameworks, to help you build applications as microservices: reliable actors and reliable services. Of course, you can choose any code to build your microservice. But these APIs make the job more straightforward, and they integrate with the platform at a deeper level. This way, for example, you can get health and diagnostics information, or you can take advantage of built-in high availability.

Exploring microservices in a Microsoft landscape



Service Fabric is agnostic on how you build your service, and you can use any technology.However, it does provide built-in programming APIs that make it easier to build microservices.

Key capabilities

By using Service Fabric, you can:

  • Develop massively scalable applications that are self-healing.
  • Develop applications that are composed of microservices by using the Service Fabric programming model. Or, you can simply host guest executables and other application frameworks of your choice, such as ASP.NET Core 1 or Node.js.
  • Develop highly reliable stateless and stateful microservices.
  • Deploy and orchestrate containers that include Windows containers and Docker containers across a cluster. These containers can contain guest executables or reliable stateless and stateful microservices. In either case, you get mapping from container port to host port, container discoverability, and automated failover.
  • Simplify the design of your application by using stateful microservices in place of caches and queues.
  • Deploy to Azure or to on-premises datacenters that run Windows or Linux with zero code changes. Write once, and then deploy anywhere to any Service Fabric cluster.
  • Develop with a "datacenter on your machine" approach. The local development environment is the same code that runs in the Azure datacenters.
  • Deploy applications in seconds.
  • Deploy applications at higher density than virtual machines, deploying hundreds or thousands of applications per machine.
  • Deploy different versions of the same application side by side, and upgrade each application independently.
  • Manage the lifecycle of your stateful applications without any downtime, including breaking and nonbreaking upgrades.
  • Manage applications by using .NET APIs, Java (Linux), PowerShell, Azure command-line interface (Linux), or REST interface.
  • Upgrade and patch microservices within applications independently.
  • Monitor and diagnose the health of your applications and set policies for performing automatic repairs.
  • Scale out or scale in the number of nodes in a cluster, and scale up or scale down the size of each node. As you scale nodes, your applications automatically scale and are distributed according to the available resources.
  • Watch the self-healing resource balancer orchestrate the redistribution of applications across the cluster. Service Fabric recovers from failures and optimizes the distribution of load based on available resources.
Azure Microservices in Practice - Radu Vunvulea

Deliver low-latency performance and efficiency at massive scale

Deliver fast in-place upgrades with zero downtime, auto-scaling, integrated health monitoring, and service healing. Orchestration and automation for building microservices gives new levels of app awareness and insight to automate live-upgrades with rollback and automatic scale-up and scale-down capabilities.

Microsoft: Building a Massively Scalable System with DataStax and Microsoft's Next Generation PaaS Infrastructure



Plus, solve hard distributed system problems such as failover, leader election, state management and provide application lifecycle management capabilities so developers don’t have to re-architect applications as usage grows. This includes multi-tenant SaaS applications, Internet-of-Things data gathering and processing, and gaming and media serving.


Proven platform used by Azure and other Microsoft services

Azure Service Fabric was born from years of experience at Microsoft delivering mission-critical cloud services and is production-proven since 2010. It’s the foundational technology on which we run our Azure core infrastructure, powering services including Skype for Business, Intune, Azure Event Hubs, Azure Data Factory, Azure DocumentDB, Azure SQL Database, and Cortana.

This experience allowed us to design a platform that intrinsically understands the available infrastructure resources and needs of applications, enabling an automatically updating, self-healing behavior that is essential to delivering highly available and durable services at hyperscale.

Azure Service Fabric Overview



As software developers, there is nothing new in how we think about factoring an application into component parts. It is the central paradigm of object orientation, software abstractions, and componentization. Today, this factorization tends to take the form of classes and interfaces between shared libraries and technology layers. Typically, a tiered approach is taken with a back-end store, middle-tier business logic, and a front-end user interface (UI). What has changed over the last few years is that we, as developers, are building distributed applications that are for the cloud and driven by the business.

The changing business needs are:

  • A service that's built and operates at scale to reach customers in new geographical regions (for example).
  • Faster delivery of features and capabilities to be able to respond to customer demands in an agile way.
  • Improved resource utilization to reduce costs.
  • These business needs are affecting how we build applications.

For more information about the approach of Azure to microservices, read Microservices: An application revolution powered by the cloud.

To summarize, the microservice approach is to compose your application of many small services. The services run in containers that are deployed across a cluster of machines. Smaller teams develop a service that focuses on a scenario and independently test, version, deploy, and scale each service so that the entire application can evolve.



More Information:

https://msdn.microsoft.com/en-us/magazine/mt595752.aspx

https://blogs.msdn.microsoft.com/azureservicefabric/2016/04/25/orchestrating-containers-with-service-fabric/

https://channel9.msdn.com/Blogs/Windows-Azure/Azure-Service-Fabric

https://azure.microsoft.com/en-us/blog/microservices-an-application-revolution-powered-by-the-cloud/

https://blogs.msdn.microsoft.com/azureservicefabric/2016/03/15/service-fabric-customer-profile-talktalk-tv/

https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-overview

https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-overview-microservices#service-fabric-as-a-microservices-platform

https://azure.microsoft.com/en-us/services/service-fabric/

25 October 2016

Hyper-Converged OpenStack on Windows Nano Server 2016


Cloudbase Solutions Announces the Industry’s First Platform for Hyper-Converged OpenStack on Windows Nano Server 2016



The Hyper-Converged OpenStack on Windows Server cloud infrastructure enables distributed data across individual cloud servers while dismissing the need for expensive dedicated storage hardware. This particular configuration features all of its nodes having compute, storage and networking roles, thus increasing scalability and fault tolerance to new levels all the while dramatically reducing overall costs.

Hyper-Converged OpenStack on Windows Nano Server 2016



Cloudbase Solutions’ design for the Hyper-Converged data center relies on components that are fully distributed, and is entirely based on commodity hardware, having a remarkably low cost of ownership for the enterprise with the benefit of all the IaaS features offered by OpenStack, for both on-premise as well as public clouds.

Windows in OpenStack


The core components for this solution are OpenStack, Microsoft’s Windows Nano Server 2016, Hyper-V, Storage Spaces Direct (S2D) and Open vSwitch for Hyper-V, deployed starting from the bare metal up with Cloudbase Solutions’ Juju charms for Windows Server.



Cloudbase Solutions offers the platform as managed or unmanaged, with support for OpenStack and Windows Nano Server 2016, along with orchestration solutions based on OpenStack Heat templates or Juju for all Microsoft based workloads, from Active Directory to SharePoint, Exchange and more!



“The Hyper-Converged infrastructure adds simplicity, increased fault tolerance and scalability to your architecture, which is exactly what modern enterprises are looking for in order to compete efficiently. It’s important for OpenStack customers to know they have choices when it comes to their infrastructure, and we see the Hyper-Converged solution as a key to helping them in achieving that architectural freedom” - said Alessandro Pilotti, Cloudbase Solutions CEO

Manage Nano Server and Windows Server 2016 Hyper-V


About Cloudbase Solutions
Cloudbase Solutions™ is dedicated to cloud computing and interoperability. Our mission is to bridge the modern enterprise and cloud computing worlds by bringing OpenStack to Windows-based infrastructures. This effort starts with developing and maintaining all the crucial Windows and Hyper-V OpenStack components and culminates with a product range which includes orchestration for Hyper-V, SQL Server, Active Directory, Exchange and SharePoint Server via Juju charms and Heat templates.

Furthermore, to solve the complexity of cloud migration, Cloudbase Solutions developed Coriolis, a cloud migration-as-a-service product for migrating existing Windows and Linux workloads between clouds. Cloud migration is a necessity for a large number of use cases, especially for users moving from traditional virtualization technologies like VMware vSphere or Microsoft System Center VMM to Azure / Azure Stack, OpenStack, Amazon AWS or Google Cloud.

https://cloudbase.it/hyper-c/

Building Your First Ceph Cluster for OpenStack— Fighting for Performance, Solving Tradeoffs


Ceph is a full-featured, yet evolving, software-defined storage (SDS) solution. It’s very popular because of its robust design and scaling capabilities, and it has a thriving open source community. Ceph provides all data access methods (file, object, block) and appeals to IT administrators with its unified storage approach.

In the true spirit of SDS solutions, Ceph can work with commodity hardware or, to put it differently, is not dependent on any vendor-specific hardware. A Ceph storage cluster is intelligent enough to utilize storage and compute the powers of any given hardware, and provides access to virtualized storage resources through the use of ceph-clients or other standard protocol and interfaces.



Ceph storage clusters are based on Reliable Automatic Distributed Object Store (RADOS), which uses the CRUSH algorithm to stripe, distribute and replicate data. The CRUSH algorithm originated from a PhD thesis by Sage Weil at the University of California, Santa Cruz. Here’s an overview of Ceph’s different ways for accessing stored data:


The power of Ceph can transform your organization’s IT infrastructure and your ability to manage vast amounts of data. If your organization runs applications with different storage interface needs, Ceph is for you! Ceph’s foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides your applications with object, block, and file system storage in a single unified storage cluster—making Ceph flexible, highly reliable and easy for you to manage.
Ceph’s RADOS provides you with extraordinary data storage scalability—thousands of client hosts or KVMs accessing petabytes to exabytes of data. Each one of your applications can use the object, block or file system interfaces to the same RADOS cluster simultaneously, which means your Ceph storage system serves as a flexible foundation for all of your data storage needs. You can use Ceph for free, and deploy it on economical commodity hardware. Ceph is a better way to store data.

OpenStack Australia Day 2016 - Andrew Hatfield, Red Hat: The Future of Cloud Software Defined Storage


The Ceph Storage Cluster
A Ceph storage cluster is a heterogeneous group of compute and storage resources (bare metal servers, virtual machines and even Docker instances) often called Ceph nodes, where each member of the cluster is either working as a monitor (MON) or object storage device (OSD). A Ceph storage cluster is used by Ceph clients to store their data directly as RADOS objects or by using virtualized resources like RDBs or other interfaces.

Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar


Windows and OpenStack - What's New in Windows Server 2016


Windows and OpenStack: What’s new in Windows Server 2016? - Alessandro Pilotti from ITCamp on Vimeo.

OpenStack is getting big in the enterprise, which is traditionally very Microsoft centeric. This session will show you everything you need to know about Windows in OpenStack!To begin with we will show how to provision Windows images for OpenStack, including Windows Server 2012 R2, Windows 7, 8.1 and the brand new Windows Server 2016 Nano Server for KVM, Hyper-V and ESXi Nova hosts.

Next, we will show how to deploy Windows workloads with Active Directory, SQL Server, SharePoint, Exchange using Heat templates, Juju, Puppet and more.



Last but not least, we'll talk about Active Directory integration in Keystone, Hyper-V deployment and Windows bare metal support in Ironic and MaaS. The session will give you a comprehensive view on how well OpenStack and Windows can be integrated, along with a great interoperability story with Linux workloads.

Exploring Nano Server for Windows Server 2016 with Jeffrey Snover



For More Information:

http://superuser.openstack.org/

https://news.microsoft.com/2010/10/22/openstack-is-now-open-for-windows-server/#sm.0001a9okhkyn3dd0uql216jjob0ut

https://www.openstack.org/summit/tokyo-2015/videos/presentation/windows-in-openstack

http://superuser.openstack.org/articles/getting-hyper-v-and-openstack-set-up-quickly/

http://superuser.openstack.org/articles/how-to-introduce-openstack-in-your-organization/

https://blogs.msdn.microsoft.com/virtual_pc_guy/2015/08/25/getting-hyper-v-and-openstack-setup-quickly/

https://www.suse.com/newsroom/post/2016/suse-wins-best-software-defined-solution-for-openstack-cloud-and-ceph-storage-offerings/

http://www.nextplatform.com/2016/05/05/mashing-openstack-hyperconverged-storage/

http://www.stratoscale.com/blog/stratoscale-labs/building-a-hyper-converged-openstack-cloud-with-stratoscale/

http://ceph.com/ceph-storage/

http://thenewstack.io/software-defined-storage-ceph-way/

22 September 2016

IBM Power Systems for Big Data and Analytics


IBM Linux Servers Designed to Accelerate Artificial Intelligence, Deep Learning and Advanced Analytics

New IBM POWER8 Chip with NVIDIA NVLink(TM) Enables Data Movement 5x Faster than Any Competing Platform
Systems Deliver Average of 80% More Performance Per Dollar than Latest x86-Based Servers(1)
Expanded Linux Server Lineup Leverages OpenPOWER Innovations

A quick introduction to the IBM Power System S822LC from the IBM Client Center Montpellier


A major achievement stemming from open collaboration is the new IBM Power System S822LC for High Performance Computing server.

IBM Linux on Power Big Data Solutions


IBM Data Engine for Hadoop and Spark – Power Systems Edition

With more and more intelligent and interconnected devices and systems, the data companies are collecting is growing at unprecedented rates. As much as 90% of that data is unstructured, coming from social media, electronic documents, machine data, connected devices, etc., and growing at rates as high as 50% per year. This is big data.

Extracting insights from big data can make your business more agile, more competitive and provide insights that, in the past, were beyond reach. The emergence of recent technologies such as the real-time analytics processing capabilities of stream computing, high speed in-memory analytics using Apache Spark and the massive MapReduce scale-out capabilities of Hadoop® has opened the door to a world of possibilities. This has also created the need for robust infrastructures that combine computing power, memory and data bandwidth to process and move large quantities of data -- fast.

Understanding the IBM Power Systems Advantage


Based on this need, the IBM Power System S812LC was used to design a solution to create a big data environment built on a heritage of strong resiliency, availability and security -- the IBM Data Engine for Hadoop and Spark - Power Systems Edition.

With a data-centric design, this Linux-based solution offers a tightly-integrated and performance-optimized infrastructure for in-memory Spark and MapReduce-based Hadoop big data workloads. The IBM Data Engine for Hadoop and Spark can be tailored specifically to meet your Big Data workloads by using a simple building block approach to match the mix of memory, networking and storage to application requirements. This approach gives you the best possible infrastructure for your big data workload.

POWER8 Scale-Out: Massive Bandwidth


With a vision for enhanced bandwidth, IBM POWER8 has achieved vast improvements in latency, two-and-a-half time’s better memory performance, and a lot more.
POWER8 offers more than 32 channels of DDR memory funneling into the POWER8 processor. This is two times the 16-channel capacity for POWER7, and four times the eight-channel capacity of the most competitors.

Move Up to Power8 with Scale Out Servers


The result of a depth and breadth of innovation focused on optimizing for data centers, while increasing efficiency and lowering infrastructure cost, the POWER8 bandwidth contributes to a better system that does more while making technology leadership attainable for customers.

Each POWER8 socket supports up to 1 TB of DRAM in the initial server configurations, yielding 2 TB capacity Scale-out systems and 16 TB capacity Enterprise systems, and supports up to 230 GBs per second of sustained memory bandwidth per socket.
Having developed the first processor designed for Big Data with massive parallelism and bandwidth for real-time results, when coupled with IBM DB2 with BLU Acceleration and Cognos analytics software the capacity of POWER8 far outpaces industry standard options with 82x faster delivery to insights

Far more than a function of size, sophisticated innovations in the POWER8 memory organization is designed to enhance both reliability and performance. Key among the innovations:
Up to eight high-speed channels which each run up to 9.6 GHz for up to 230 GB of sustained performance
Up to 32 total DDR ports yielding 410 GB/sec peak at the DRAM
Up to 1 TB memory capacity per fully configured processor socket

Big Data’s Big Memory requirements call for nothing less than the industry’s most innovative, scalable, and massive bandwidth and capacity. POWER8 thrives on the kinds of complexities that your organization faces in the current environment, with a platform to keep you ahead of the game as unforeseen challenges and opportunities emerge.





Features and benefits

A comprehensive, fully integrated cluster that is designed for ease of procurement, deployment, and operation. It includes all required components for Big Data applications, including servers, network, storage, operating system, management software, Hadoop and Spark software, and runtime libraries.

An application optimized configuration. The configuration of the cluster is carefully designed to optimize application performance and reduce total cost of ownership. The cluster is integrated with IBM Platform™ Cluster Manager, IBM Open Platform with Apache Hadoop and Spark and optionally IBM Spectrum Scale and IBM Spectrum Symphony which include advanced capabilities for storage and resource optimization. This optimized configuration enables users to show results more quickly.
Power S812LC delivers 2.3X BETTER performance per dollar spent for Spark workloads1

Advanced technology for performance and robustness. The hardware and software components in this infrastructure are customizable to allow the best performance or the best price/performance ratio.

Big data clusters can start out small and grow as the demands from line of business increase. Choosing an infrastructure that can scale to handle these demands is vital to meeting service level agreements and continuing access to insights. Organizations must also consider the maintenance required. Smart businesses choose Power Systems because they know Power Systems is built for big data workloads that demand high performance and high reliability.


Analytics solutions
Unlock the value of data with an IT infrastructure that provides speed and availability to deliver accelerated insights to the people and processes that need them.

IBM Data Engine for Analytics - Power Systems Edition
A customized infrastructure solution with integrated software optimized for both big data and analytics workloads.

Co-Design Architecture for Exascale


IBM POWER8 as an HPC platform


The State of Linux Containers



IBM Data Engine for NoSQL – Power Systems Edition
Unique technology from IBM delivers dramatic reductions in the cost of large NoSQL databases.

SAP HANA benefits from the enterprise capabilities of Power Systems
SAP HANA runs on all POWER8 servers. Power Systems Solution Editions for SAP HANA BW are easy to order and tailored for quick deployment and rapid-time-to value, while offering flexibility to meet individual client demands.

DB2 with BLU Acceleration on Power Systems
Enable faster insights using analytics queries and reports from data stored in any data warehouse, with a dynamic in-memory columnar solution.

IBM Solution for Analytics – Power Systems Edition
This flexible integrated solution for faster insights includes options for business intelligence and predictive analytics with in-memory data warehouse acceleration.

IBM Data Engine for Hadoop and Spark – Power Systems Edition
A fully integrated Hadoop and Spark solution optimized to simplify and accelerate unstructured big data analytics.

OpenPOWER Update


IBM PureData System for Operational Analytics
Easily deploy, optimize and manage data intensive workloads for operational analytics with an expert integrated system.

IBM DB2 Web Query for i
Help ensure every decision maker across the organization can easily find, analyze and share the information needed to make better, faster decisions.



OpenPOWER Roadmap Toward CORAL


The Quantum Effect: HPC without FLOPS



More Information:

http://www.ibm.com/analytics/us/en/technology/advanced-analytics/

http://www.dataversity.net/ibm-linux-servers-designed-accelerate-artificial-intelligence-deep-learning/

http://www.predictiveanalyticstoday.com/bigdata-platforms-bigdata-analytics-software/

http://www-03.ibm.com/software/products/en/category/bigdata

http://www-03.ibm.com/systems/power/hardware/hpc/

http://www-03.ibm.com/systems/power/solutions/bigdata-analytics/

https://www.ibm.com/big-data/us/en/technology/

https://www.ibm.com/developerworks/community/blogs/f0f3cd83-63c2-4744-9021-9ff31e7004a9/entry/Big_Data_workloads_on_IBM_Power_Systems?lang=en

http://marketrealist.com/2015/03/ibm-continues-lure-investors-share-buybacks/

http://www.ibmbigdatahub.com/around-the-web

http://www.predictiveanalyticstoday.com

https://www-01.ibm.com/marketing/iwm/iwm/web/signup.do?source=stg-web&S_PKG=ov37920

https://www.ibm.com/developerworks/community/blogs/f0f3cd83-63c2-4744-9021-9ff31e7004a9/entry/The_Bandwidth_of_POWER8_is_Critical_for_Big_Data_Workloads_like_Spark?lang=en

https://www-01.ibm.com/marketing/iwm/iwm/web/signup.do?source=stg-web&S_PKG=ov27123&S_CMP=web-ibm-po-_-rcp-hwlinux

http://www.nextplatform.com/2016/04/07/ibm-unfolds-power-chip-roadmap-past-2020/

http://www.nextplatform.com/2015/08/10/ibm-roadmap-extends-power-chips-to-2020-and-beyond/

http://www.nextplatform.com/2016/08/24/big-blue-aims-sky-power9/

http://www.digitaltrends.com/computing/ibm-power9-server-processor-architecture-revealed-hot-chips-28/#/9

21 August 2016

Why Cortana Analytics Suite




Cortana Analytics Suite (CAS), what can it do for you


Microsoft introduced the Cortana Analytics Suite (CAS) in July 2015, at the Worldwide Partner Conference in Orlando. Want to learn more then read on.

Cortana Analytics Suite


When Microsoft first announced CAS, it touted the suite as an integrated set of cloud-based services that vaguely promised to be “a huge differentiator for any business.” The suite would be available through a simple monthly subscription and be customizable to fit the needs of different organizations. The company planned to make CAS available that coming fall.

Two months later, Microsoft hosted the first-ever Cortana Analytics Workshop, a gathering of techies that would provide participants with a chance to learn about Microsoft’s advanced analytics vision. The workshop appeared to represent the suite’s official launch.

Microsoft Envision | Impactful analytics using the Cortana Intelligence Suite with EY


At some point during the build-up, Microsoft also set up a slick new website dedicated to the CAS vision ( https://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/). The website featured rolling graphics with stylized icons, and large bold headlines that emphasized the suite’s imminent importance. Cortana Analytics, it would seem, had officially arrived.

As we can see from the above architecture diagram, following are the key pillars of Cortana Intelligence Suite:



Information Management: Consists of services which enable us to capture the incoming data from various sources including the streaming data from sensors, devices, and other IoT systems.  Manage various data sources which are part of the data analytics ecosystem within the enterprise; and orchestrate and build end-to-end flows to perform various activities and data processing and data preparation operations.
Big Data Stores: Consists of services which enable us to store and manage large scale data. In other words, enables us to store and manage big data. These services offer high degree of elasticity, high processing power, and high throughput with great performance.
Machine Learning and Analytics: Consists of services which enable us to perform advanced analytics, build predictive models, and apply machine learning algorithms on large scale data.  Allows us to perform data analysis on large scale data of different variety using programming languages like R and Python.
Dashboards and Visualizations: Consists of services which enable us to build reports and dashboards to view the insights. It primarily consists of Power BI which allows us to build highly interactive visually appealing reports and dashboards. Apart from this, other tools like SQL Server Reporting Services (SSRS), Excel, etc. can also be used to connect to data from some of these services in Cortana Intelligence Suite.
Intelligence: Consists of advanced intelligence services which enable us to build smart interactive services using advanced text, speech, and other recognition systems.

  • “Take action ahead of your competitors by going beyond looking in the rear-view mirror to predicting what’s next.”
  • “Get closer to your customers. Infer their needs through their interaction with natural user interfaces.”
  • “Get things done with Cortana in more helpful, proactive, and natural ways.”

Modern Data Warehousing with the Microsoft Analytics Platform System

Cortana Intelligence Suite Highlights

Here are the highlights of Cortana Intelligence Suite:

  • A fully managed Big Data and Advanced Analytics Suite enabling businesses transform data into intelligent actions.
  • An excellent offering perfectly suited for handling modern day data sources, data formats, and data volumes to gain valuable insights.
  • Offers various preconfigured solutions like Forecasting, Churn, Recommendations, etc.
  • Apart from the big data and analytical services, Cortana Intelligence Suite also includes some of the advanced intelligence services - Cortana, Bot Framework, and Cognitive Services.
  • Contains services to capture the data from a variety of data sources, process and integrate the data, perform advanced analytics, visualize and collaborate, and gain intelligence out of it.
  • Offers all the benefits of Cloud Computing like scale, elasticity, and pay-as-you-go model, etc.

Microsoft Envision | Running a data driven company

Use Cases for the Cortana Intelligence Suite

Cortana Intelligence Suite can address the data challenges in various industries and enable them to transform their data into intelligent actions and helps to be more proactive in the day-to-day operational aspects of the business. Of the various industries where Cortana Intelligence Suite can be used, here are a few of them.

Financial Services: Monitor the transactions as they happen in near real-time and based on the analysis on the historical data and historical data anomalies/trends, Cortana Intelligence Suite can be used to apply complex machine learning algorithms and predictive models to predict a potential fraudulent transactions and help business prevent such transactions in future thereby protecting customer's valuable money. The Financial Services sector is pretty vast and we can use Cortana Intelligence Suite in various scenarios including credit/debit card fraud, electronic transfer fraud, phishing attempts to steal confidential customer data, etc.
Retail: Cortana Intelligence Suite can be used across the Retail Industry in various scenarios including optimizing availability by forecasting demand, enabling businesses to ensure the right products in the right location at the right time. There are numerous use cases in the retail industry and Cortana Intelligence Suite can be used in conjunction with IoT systems. For instance, with the help of sensors (Beacon Technology) we can detect when a customer enters a retail store and based the data that we have in the database about that customer, we can offer them targeted discounts based on customer's demographics, past purchase history, what the customer has been browsing online (this is where bringing in the data from outside the enterprise comes into picture as discussed in this tip on Introduction to Big Data), and other relevant information which can help understand the customer's preferences.
Healthcare: There are various scenarios in Healthcare where the Cortana Intelligence Suite can be used. Historical data on the utilization of various resources (Rooms, Beds, Other Equipment, etc.) and manpower (Doctors, Nurses, general staff, etc.) can be analyzed to predict the future demand thereby enabling the hospitals to mobilize and optimize the resources and manpower accordingly. Historical patient data can be analyzed in conjunction with weather data to identify the patterns and potential illness that might be caused during particular seasons and help the authorities take preventive measures.
Manufacturing: By constantly monitoring the equipment and collecting the data over time, probability of issues occurring can be predicted and accordingly a maintenance schedule can be defined to prevent the potential issues which if occur can hamper the production and day-to-day operations leading to unhappy customers, loss of business, and increased operational costs. Cortana Intelligence Suite fits very well in this scenario and enables end to end data collection, monitoring, alerting, and to take proactive actions/decisions.
Public Sector: There are various areas in the public sector where Cortana Intelligence Suite can be used to improve the overall operational efficiency including Public Transport, Power Grids, Water Supplies, and a lot more. By monitoring the usage of resources in various areas, we can identify the patterns in the usage, predict and forecast the demand, and accordingly ensure the supply so that there is neither shortage nor a waste of resources thereby improving the overall operational efficiency and happy customers.

Microsoft Envision | ZAP presents: It’s all about the data --big, small, or diverse



Above are just a glimpse of scenarios in each of those sectors and there are many more such scenarios in each of the sectors. Apart from these, there are various other countless sectors where the Cortana Intelligence Suite can be used like Education, Insurance, Marketing, Hospitality, Aviation, Research, and so on.

The Azure side of Cortana Analytics Suite



When it comes to the individual Azure services, we can often find more concrete information than we can with Cortana Analytics. That’s not to say we won’t run into the same type of marketing clutter, but we can usually find details that are a bit more specific (even if it means going outside of Microsoft). What we don’t find are many references to Cortana Analytics, although that doesn’t prevent us from building the types of solutions that the CAS marketing material likes to show off.



The first of the CAS-related services have to do with storing and processing large sets of data:

Azure Data Warehouse : A database service that can distribute workloads across multiple compute nodes in order to process large volumes of relational and non-relational data. The service uses Microsoft’s massive parallel processing (MPP) architecture, along with advanced query optimizers, making it possible to scale out and parallelize complex SQL queries.

Azure Data Lake Store: A scalable storage repository for data of any size, type, or ingestion speed, regardless of where it originates. The repository uses a Hadoop file system to support compatibility with the Hadoop Distributed File System (HDFS) and offers unlimited storage without restricting file sizes or data volumes.

Azure Data Lake Store is actually part of a larger unit that Microsoft refers to as Azure Data Lake. Not only does it include Data Lake Store, but also Data Lake Analytics and HDInsight, both of which share the CAS label. You can find additional information about the Data Lake services in the Simple-Talk article Azure Data Lake.

The next category of services that fall under the CAS umbrella focus on data management:

Azure Data Factory : A data integration service that uses data flow pipelines to manage and automate the movement and transformation of data. Data Factory orchestrates other services, making it possible to ingest data from on-premises and cloud-based sources, and then transform, analyze, and publish the data. Users can monitor the pipelines from a single unified view.

Azure Data Catalog : A system for registering enterprise data sources, understanding the data in those source, and consuming the data. The data remains in its location, but the metadata is copied to the catalog, where it is indexed for easy discovery. In addition, data professionals can contribute their knowledge in order to enrich the source metadata.

Azure Event Hubs : An event processing service that can ingest millions of events per second and make them available for storage and analysis. The service can log events in near real time and accept data from a wide range of sources. Event Hubs uses technologies that support low latency and high availability, while providing flexible throttling, authentication, and scalability.

Microsoft Envision | Advantage YOU: Be more, do more, with Infosys and Microsoft on your side


For more information about Event Hubs, refer to the Simple-Talk article Azure Event Hubs. In the meantime, here’s a quick overview of the analytic components included in the CAS package:

Azure Machine Learning : A service for building, deploying, and sharing predictive analytic solutions. The service runs predictive models that learn from existing data, making it possible to forecast future behavior and trends. Machine Learning also provides the tools necessary for testing and managing the models as well as deploying them as web services.
Azure Data Lake Analytics : A distributed service for analyzing data of any size, including what is in Data Lake Store. Data Lake Analytics is built on Apache YARN, an application management framework for processing data in Hadoop clusters. Data Lake Analytics also supports U-SQL, a new language that Microsoft developed for writing scalable, distributed queries that analyze data.

Azure HDInsight : A fully managed Hadoop cluster service that supports a wide range of analytic engines, including Spark, Storm, and HBase. Microsoft has updated the service to take advantage of Data Lake Store and to maximize security, scalability, and throughput.
Azure Stream Analytics : A service that supports complex event processing over streaming data. Stream Analytics can handle millions of events per second from a variety of sources, while correlating them across multiple streams. It can also ingest events in real-time, whether from one data stream or multiple streams.
I’ve already mentioned how Data Lake Analytics and HDInsight are part of Azure Data Lake, and I’ve pointed you to a related article. If you want to learn more about Stream Analytics, check out the Simple-Talk article Microsoft Azure Stream Analytics.

Azure Stream Analytics

Cortana Analytics Gallery

Another interesting component of the CAS package is the Cortana Analytics Gallery, formerly the Azure Machine Learning Gallery. The gallery provides an online environment for data scientists and developers to share their solutions, particularly those related to machine learning. Microsoft also publishes its own solutions to the site for participants to consume. Cortana Analytics gallery

The Cortana Analytics Gallery is divided into the following six sections.

Solution Templates : Templates based on industry-specific partner solutions. Currently, the category includes only the Vehicle Telemetry Analytics solution, published by Microsoft this past December. The solution demonstrates how those in the automobile industry can gain real-time and predictive insights into vehicle health and driving habits.
Experiments : Predictive analytic experiments contributed by Microsoft and those in the data science community. The experiments demonstrate advanced machine learning techniques and can be used as a starting point for developing your own solutions. For example, the Telco Customer Churn experiment uses classification algorithms to predict whether a customer will churn.
Machine Learning APIs : APIs that can access operationalized predictive analytic solutions. Some of the APIs are reference within the “Perceptual intelligence” section listed in the table above. For example, the Face APIs were published by Microsoft and are part of Microsoft Project Oxford. They provide state-of-the-art algorithms for processing face images.
Notebooks : A collection of Jupyter notebooks. The notebooks are integrated within Machine Learning Studio and serve as web applications for running code, visualizing data, and trying out ideas. For example, the notebook Topic Discovery in Twitter Tweets demonstrates how a Jupyter notebook can be used for mining Twitter text.
Tutorials : Tutorials on how to use Cortana Analytics to solve real-world problems. For example, the iPhone app for RRS tutorial describes how to create an iOS app that can consume an Azure ML RRS API using the Xamarin development software that ships with Visual Studio.
Collections : A site for grouping together experiments, templates, APIs, or other items within the Cortana Analytics Gallery.
Although Microsoft has changed the name of the gallery to make it more CAS-friendly, much of the content still focuses on the Machine Learning service. Even so, the gallery could prove to be a valuable resource for organizations jumping aboard the CAS train, particularly once the gallery has gained more momentum.

Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 1


Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Part 2

More Information: