19 December 2017

IBM Big Data Platform

What is big data?

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

Enterprise Data Warehouse Optimization: 7 Keys to Success

Big data spans three dimensions: Volume, Velocity and Variety.

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

Overview - IBM Big Data Platform

Monitor 100’s of live video feeds from surveillance cameras to target points of interest
Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach. Until now, there was no practical way to harvest this opportunity. Today, IBM’s platform for big data uses state of the art technologies including patented advanced analytics to open the door to a world of possibilities.

IBM big data platform

Data Science Experience: Build SQL queries with Apache Spark

Do you have a big data strategy? IBM does. We’d like to share our know-how with you to help your enterprise solve its big data challenges.

IBM is unique in having developed an enterprise class big data platform that allows you to address the full spectrum of big data business challenges.

The platform blends traditional technologies that are well suited for structured, repeatable tasks together with complementary new technologies that address speed and flexibility and are ideal for adhoc data exploration, discovery and unstructured analysis.
IBM’s integrated big data platform has four core capabilities: Hadoop-based analytics, stream computing, data warehousing, and information integration and governance.

Fig. 1 - IBM big data platform

The core capabilities are:

Hadoop-based analytics: Processes and analyzes any data type across commodity server clusters.
Stream Computing: Drives continuous analysis of massive volumes of streaming data with sub-millisecond response times.
Data Warehousing: Delivers deep operational insight with advanced in-database analytics.
Information Integration and Governance: Allows you to understand, cleanse, transform, govern and deliver trusted information to your critical business initiatives.

Delight Clients with Data Science on the IBM Integrated Analytics System

Supporting Platform Services:

Visualization & Discovery: Helps end users explore large, complex data sets.
Application Development: Streamlines the process of developing big data applications.
Systems Management: Monitors and manages big data systems for secure and optimized performance.
Accelerators: Speeds time to value with analytical and industry-specific modules.

IBM DB2 analytics accelerator on IBM integrated analytics system technical overview

How Big Data and Predictive Analytics are revolutionizing AML and Financial Crime Detection

Big data in action

What types of business problems can a big data platform help you address? There are multiple uses for big data in every industry – from analyzing larger volumes of data than was previously possible to drive more precise answers, to analyzing data in motion to capture opportunities that were previously lost. A big data platform will enable your organization to tackle complex problems that previously could not be solved.

Big data = Big Return on Investment (ROI)

While there is a lot of buzz about big data in the market, it isn’t hype. Plenty of customers are seeing tangible ROI using IBM solutions to address their big data challenges:

Healthcare: 20% decrease in patient mortality by analyzing streaming patient data
Telco: 92% decrease in processing time by analyzing networking and call data
Utilities: 99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data

IBM’s big data platform is helping enterprises across all industries. IBM understands the business challenges and dynamics of your industry and we can help you make the most of all your information.

The Analytic Platform behind IBM’s Watson Data Platform - Big Data

When companies can analyze ALL of their available data, rather than a subset, they gain a powerful advantage over their competition. IBM has the technology and the expertise to apply big data solutions in a way that addresses your specific business problems and delivers rapid return on investment.

The data stored in the cloud environment is organized into repositories. These repositories may be hosted on different data platforms (such as a database server, Hadoop, or a NoSQL data platform) that are tuned to support the types of analytics workload that is accessing the data.

What’s new in predictive analytics: IBM SPSS and IBM decision optimization

The data that is stored in the repositories may come from legacy, new, and streaming sources, enterprise applications, enterprise data, cleansed and reference data, as well as output from streaming analytics.

Breaching the 100TB Mark with SQL Over Hadoop

Types of data repositories include:

  • Catalog: Results from discovery and IT data curation create a consolidated view of information that is reflected in a catalog. The introduction of big data increases the need for catalogs that describe what data is stored, its classification, ownership, and related information governance definitions. From this catalog, you can control the usage of the data.
  • Data virtualization:Agile approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data
  • Landing, exploration, and archive: Allows for large datasets to be stored, explored, and augmented using a wide variety of tools since massive and unstructured datasets may mean that it is no longer feasible to design the data set before entering any data. Data may be used for archival purposes with improved availability and resiliency thanks to multiple copies distributed across commodity storage.

SparkR Best Practices for R Data Scientists
  • Deep analytics and modeling: The application of statistical models to yield information from large data sets comprised of both unstructured and semi-structured elements. Deep analysis involves precisely targeted and complex queries with results measured in petabytes and exabytes. Requirements for real-time or near-real-time responses are becoming more common.
  • Interactive analysis and reporting: Tools to answer business and operations questions over Internet-scale data sets. Tools also use popular spreadsheet interfaces for self-service data access and visualization. APIs implemented by data repositories allow output to be efficiently consumed by applications.
  • Data warehousing: Populates relational databases that are designed for building a correlated view of business operation. A data warehouse usually contains historical and summary data derived from transaction data but can also integrate data from other sources. Warehouses typically store subject-oriented, non-volatile, time-series data used for corporate decision-making. Workloads are query intensive, accessing millions of records to facilitate scans, joins, and aggregations. Query throughput and response times are generally a priority.

IBM Power leading Cognitive Systems

IBM offers a wide variety of offerings for consideration in building data repositories:
  • InfoSphere Information Governance Catalog maintains a repository to support the catalog of the data lake. This repository can be accessed through APIs and can be used to understand and analyze the types of data stored in the other data repositories.
  • IBM InfoSphere Federation Server creates consolidated information views of your data to support key business processes and decisions.
  • IBM BigInsights for Apache Hadoop delivers key capabilities to accelerate the time to value for a data science team, which includes business analysts, data architects, and data scientists.
  • IBM PureData™ System for Analytics, powered by Netezza technology, is changing the game for data warehouse appliances by unlocking data's true potential. The new IBM PureData System for Analytics is an integral part of a logical data warehouse.
  • IBM Analytics for Apache Spark is a fully-managed Spark service that can help simplify advanced analytics and speed development.
  • IBM BLU Acceleration® is a revolutionary, simple-to-use, in-memory technology that is designed for high-performance analytics and data-intensive reporting.
  • IBM PureData System for Operational Analytics is an expert integrated data system optimized specifically for the demands of an operational analytics workload. A complete solution for operational analytics, the system provides both the simplicity of an appliance and the flexibility of a custom solution.

IBM Big Data Analytics Concepts and Use Cases

Bluemix offers a wide variety of services for data repositories:

  • BigInsights for Apache Hadoop provisions enterprise-scale, multi-node big data clusters on the IBM SoftLayer cloud. Once provisioned, these clusters can be managed and accessed from this same service.

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical platform
  • Cloudant® NoSQL Database is a NoSQL Database as a Service (DBaaS). It's built from the ground up to scale globally, run non-stop, and handle a wide variety of data types like JSON, full-text, and geospatial. Cloudant NoSQL DB is an operational data store optimized to handle concurrent reads and writes and provide high availability and data durability.
  • dashDB™ stores relational data, including special types such as geospatial data. Then analyze that data with SQL or advanced built-in analytics like predictive analytics and data mining, analytics with R, and geospatial analytics. You can leverage the in-memory database technology to use both columnar and row-based tables. The dashDB web console handles common data management tasks, such as loading data, and analytics tasks like running queries and R scripts.

IBM BigInsights: Smart Analytics for Big Data

IBM product support for big data and analytics solutions in the cloud

Now that we've reviewed the component model for a big data and analytics solution in the cloud, let's look at how IBM products can be used to implement a big data and analytics solution. In previous sections, we highlighted IBM's end-to-end solution for deploying a big data and analytics solution in cloud.
The figure below shows how IBM products map to specific components in the reference architecture.

Figure 5. IBM product mapping

Ml, AI and IBM Watson - 101 for Business

IBM product support for data lakes using cloud architecture capabilities

The following images show how IBM products can be used to implement a data lake solution. In previous sections, we highlighted IBM's end-to-end solution for deploying data lake solutions using cloud computing.

Benefits of Transferring Real-Time Data to Hadoop at Scale

Mapping on-premises and SoftLayer products to specific capabilities

Figure 7 shows how IBM products can be used to run a data lake in the cloud.

Figure 7. IBM product mapping for a data lake using cloud computing

What is Big Data University?

Big Data Scotland 2017

Big Data Scotland is an annual data analytics conference held in Scotland. Run by DIGIT in association with The Data Lab, it is free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.

The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.


More Information:













0 reacties:

Post a Comment