• IBM Consulting

    DBA Consulting can help you with IBM BI and Web related work. Also IBM Linux is our portfolio.

  • Oracle Consulting

    For Oracle related consulting and Database work and support and Migration call DBA Consulting.

  • Novell/RedHat Consulting

    For all Novell Suse Linux and SAP on Suse Linux questions releated to OS and BI solutions. And offcourse also for the great RedHat products like RedHat Enterprise Server and JBoss middelware and BI on RedHat.

  • Microsoft Consulting

    For Microsoft Server 2012 onwards, Microsoft Client Windows 7 and higher, Microsoft Cloud Services (Azure,Office 365, etc.) related consulting services.

  • Citrix Consulting

    Citrix VDI in a box, Desktop Vertualizations and Citrix Netscaler security.

  • Web Development

    Web Development (Static Websites, CMS Websites (Drupal 7/8, WordPress, Joomla, Responsive Websites and Adaptive Websites).

27 April 2017

Microsoft touts SQL Server 2017 as 'first RDBMS with built-in AI'

The 2017 Microsoft Product Roadmap

Many key Microsoft products reached significant milestones in 2016, with next-gen versions of SharePoint Server, SQL Server and Windows Server all being rolled out alongside major updates to the Dynamics portfolio and, of course, Windows. This year's product roadmap looks to be a bit less crowded, though major changes are on tap for Microsoft's productivity solutions, while Windows 10 is poised for another landmark update. Here's what to watch for in the coming months 

With a constantly changing, and increasingly diversifying IT landscape– particularly in terms of heterogeneous operating systems (Linux, Windows, etc.) - IT organizations must contend with multiple data types, different development languages, and a mix of on-premises/cloud/hybrid environments, and somehow simultaneously reduce operational costs. To enable you to choose the best platform for your data and applications, SQL Server is bringing its world-class RDBMS to Linux and Windows with SQL Server v.Next.

You will learn more about the SQL Server on Linux offering and how it provides a broader range of choice for all organizations, not just those who want to run SQL on Windows. It enables SQL Server to run in more private, public, and hybrid cloud ecosystems, to be used by developers regardless of programming languages, frameworks or tools, and further empowers ‘every person and every organization on the planet to achieve more.’

Bootcamp 2017 - SQL Server on Linux

Learn More about:

  • What’s next for SQL Server on Linux
  • The Evolution and Power of SQL Server 2016
  • Enabling DevOps practices such as Dev/Test and CI/CD  with containers
  • What is new with SQL Server 2016 SP1: Enterprise class features in every edition
  • How to determine which SQL Server edition to deploy based on operation need, not feature set

SQL Server on Linux: High Availability and security on Linux

Why Microsoft for your operational database management system?

When it comes to the systems you choose for managing your data, you want performance and security that won't get in the way of running your business. As an industry leader in operational database management systems (ODBMS), Microsoft continuously improves its offerings to help you get the most out of your ever-expanding data world.

Read Gartner’s assessment of the ODBMS landscape and learn about the Microsoft "cloud first" strategy. In its latest Magic Quadrant report for ODBMS, Gartner positioned the Microsoft DBMS furthest in completeness of vision and highest for ability to execute. Gartner Reprint of SQL Server 2017

Top Features Coming to SQL Server 2017
From Python to adaptive query optimization to the many cloud-focused changes (not to mention Linux!), Joey D'Antoni takes you through the major changes coming to SQL Server 2017.

Top three capabilities to get excited about in the next version of SQL Server

Microsoft announced the first public preview of SQL Server v.Next in November 2016, and since then we’ve had lots of customer interest, but a few key scenarios are generating the most discussion.

If you’d like to learn more about SQL Server v.Next on Linux and Windows, please join us for the upcoming Microsoft Data Amp online event on April 19 at 8 AM Pacific. It will showcase how data is the nexus between application innovation and intelligence—how data and analytics powered by the most trusted and intelligent cloud can help companies differentiate and out-innovate their competition.

In this blog, we discuss three top things that customers are excited to do with the next version of SQL Server.

1. Scenario 1: Give applications the power of SQL Server on the platform of your choice

With the upcoming availability of SQL Server v.Next on Linux, Windows, and Docker, customers will have the added flexibility to build and deploy more of their applications on SQL Server. In addition to Windows Server and Windows 10, SQL Server v.Next supports Red Hat Enterprise Linux (RHEL), Ubuntu, and SUSE Linux Enterprise Server (SLES). SQL Server v.Next also runs on Linux and Windows Docker containers opening up even more possibilities to run on public and private cloud application platforms like Kubernetes, OpenShift, Docker Swarm, Mesosphere DC/OS, Azure Stack, and Open Stack. Customers will be able to continue to leverage existing tools, talents, and resources for more of their applications.

Some of the things customers are planning for SQL Server v.Next on Windows, Linux, and Docker include migrating existing applications from other databases on Linux to SQL Server; implementing new DevOps processes using Docker containers; developing locally on the dev machine of choice, including Windows, Linux, and macOS; and building new applications on SQL Server that can run anywhere—on Windows, Linux, or Docker containers, on-premises, and in the cloud.

SQL Server on Linux - march 2017

2. Scenario 2: Faster performance with minimal effort

SQL Server v.Next further expands the use cases supported by SQL Server’s in-memory capabilities, In-Memory OLTP and In-Memory ColumnStore. These capabilities can be combined on a single table delivering the best Hybrid Transactional and Analytical Processing (HTAP) performance available in any database system. Both in-memory capabilities can yield performance improvements of more than 30x, enabling the possibility to perform analytics in real time on operational data.

In v.Next natively compiled stored procedures (In-memory OLTP) now support JSON data as well as new query capabilities. For the column store both building and rebuilding a nonclustered column store can now be done online. Another critical addition to the column store is support for LOBs (Large Objects).

SQL Server on Linux 2017

With these additions, the parts of an application that can benefit from the extreme performance of SQL Server’s in-memory capabilities have been greatly expanded! We also introduced a new set of features that learn and adapt from an application’s query patterns over time without requiring actions from your DBA.

3. Scenario 3: Scale out your analytics

In preparation for the release of SQL Server v.Next, we are enabling the same High Availability (HA) and Disaster Recovery (DR) solutions on all platforms supported by SQL Server, including Windows and Linux. Always On Availability Groups is SQL Server’s flagship solution for HA and DR. Microsoft has released a preview of Always On Availability Groups for Linux in SQL Server v.Next Community Technology Preview (CTP) 1.3.

SQL Server Always On availability groups can have up to eight readable secondary replicas. Each of these secondary replicas can have their own replicas as well. When daisy chained together, these readable replicas can create massive scale-out for analytics workloads. This scale-out scenario enables you to replicate around the globe, keeping read replicas close to your Business Analytics users. It’s of particularly big interest to users with large data warehouse implementations. And, it’s also easy to set up.

In fact, you can now create availability groups that span Windows and Linux nodes, and scale out your analytics workloads across multiple operating systems.

In addition, a cross-platform availability group can be used to migrate a database from SQL Server on Windows to Linux or vice versa with minimal downtime. You can learn more about SQL Server HA and DR on Linux by reading the blog SQL Server on Linux: Mission-critical HADR with Always On Availability Groups Mission Critical HADR .

To find out more, you can watch our SQL Server on Linux webcast Linux Webinars . Find instructions for acquiring and installing SQL Server v.Next on the operating system of your choice at www.microsoft.com/sqlserveronlinux http://www.microsoft.com/sqlserveronlinux . To get your SQL Server app on Linux faster, you can nominate your app for the SQL Server on Linux Early Adopter Program, or EAP. Sign up now to see if your application qualifies for technical support, workload validation, and help moving your application to production on Linux before general availability.

To find out more about SQL Server v.Next and get all the latest announcements, register now to attend Microsoft Data Amp—where Data Amp—- where Data data gets to work.

Microsoft announced the name and many of the new features in the next release of SQL Server at its Data Amp Virtual Event on Wednesday. While SQL Server 2017 may not have as comprehensive of a feature set as SQL Server 2016, there is still some big news and very interesting new features. The reason for this is simple -- the development cycle for SQL Server 2017 is much shorter than the SQL Server 2016 development cycle. The big news at Wednesday's event is the release of SQL Server 2017 later this year on both Windows and Linux operating systems.

Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services

I was able to quickly download the latest Linux release on Docker and have it up and running on my Mac during today's briefing. (I have previously written about the Linux release here.) That speed to development is one of the major benefits of Docker that Microsoft hopes developers will leverage when building new applications. Docker is just one of many open source trends we have seen Microsoft adopt in recent years with SQL Server. Wednesday's soft launch not only introduced SQL on Linux, but also includes Python support, a new graph engine and a myriad of other features.

First R, Now Python
One of the major features of SQL Server 2016 was the integration of R, an open source statistical analysis language, into the SQL Server database engine. Users can use the sp_execute_external_script stored procedure to run R code that takes advantage of parallelism in the database engine. Savvy users of this procedure might notice the first parameter of this stored procedure is @language. Microsoft designed this stored procedure to be open-ended, and now adds Python as the second language that it supports. Python combines powerful scripting with eminent readability and is broadly used by IT admins, developers, data scientists, and data analysts. Additionally, Python can leverage external statistical packages to perform data manipulation and statistical analysis. When you combine this capability with Transact-SQL (T-SQL), the result is powerful.

SQL Server 2017: Advanced Analytics with Python
In this session you will learn how SQL Server 2017 takes in-database analytics to the next level with support for both Python and R; delivering unparalleled scalability and speed with new deep learning algorithms built in. Download SQL Server 2017: https://aka.ms/sqlserver17linuxyt

Big Changes to the Cloud
It is rare for a Microsoft launch event to omit news about cloud services, and Wednesday's event was no exception. Microsoft Azure SQL Database (formerly known as SQL Azure), which is the company's Database as a Service offering, has always lacked complete compatibility with the on-premises (or in an Azure VM) version of SQL Server. Over time, compatibility has gotten much better, but there are still gaps such as unsupported features like SQL CLR and cross-database query.

SQL Server 2017: Security on Linux

The new solution to this problem is a hybrid Platform as a Server (PaaS)/Infrastructure as a Service (IaaS) solution that is currently called Azure Managed Instances. Just as with Azure SQL Database, the Managed Instances administrator is not responsible for OS and patching operations. However, the Managed Instances solution supports many features and functions that are not currently supported in SQL Database. One such new feature is the cross-database query capability. In an on-premises environment, multiple databases commonly exist on the same instance, and a single query can reference separate databases by using database.schema.table notation. In SQL Database, it is not possible to reference multiple databases in one query which has limited many migrations to the platform due to the amount of code that must be rewritten. Support for cross-database queries in Managed Instances simplifies the process of migrating applications to Azure PaaS offerings, and should thereby increase the number of independent software vendor (ISV) applications that can run in PaaS.

SQL Server 2017: HA and DR on Linux

SQL Server 2017: Adaptive Query Processing

Microsoft also showcased some of the data protection features in Azure SQL Database that are now generally available. Azure SQL Database Threat Detection detects SQL Injection, potential SQL Injection vulnerabilities, and anomalous login monitoring. This can simply be turned on at the SQL Database level by enabling auditing and configuring notifications. The administrator is then notified when the threat detection engine detects any anomalous behavior.

Graph Database
One of things I was happiest to see in SQL Server 2017 was the introduction of a graph database within the core database engine. Despite the name, relational databases struggle in managing relationships between data objects. The simplest example of this struggle is hierarchy management . In a classic relational structure, an organizational chart can be a challenge to model -- who does the CEO report to? With graph database support in SQL Server, the concept of nodes and edges is introduced. Nodes represent entities, edges represent relationships between any two given nodes, and both nodes and edges can be associated with data properties . SQL Server 2017 also uses extensions in the T-SQL language to support join-less queries that use matching to return related values.

SQL Server 2017: Building applications using graph data
Graph extensions in SQL Server 2017 will facilitate users in linking different pieces of connected data to help gather powerful insights and increase operational agility. Graphs are well suited for applications where relationships are important, such as fraud detection, risk management, social networks, recommendation engines, predictive analysis, dependence analysis, and IoT applications. In this session we will demonstrate how you can use SQL Graph extensions to build your application using graph data. Download SQL Server 2017: Now on Windows, Linux, and Docker https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux

Graph databases are especially useful in Internet of Things (IoT), social network, recommendation engine, and predictive analytics applications. It should be noted that many vendors have been investing in graph solutions in recent years. Besides Microsoft, IBM and SAP have also released graph database features in recent years.

Adaptive Query Plans
One the biggest challenges of a DBA is managing system performance over time. As data changes, the query optimizer generates new execution plans which at times might be less than optimal . With Adaptive Query Optimization in SQL Server 2017, SQL Server can evaluate the runtime of a query and compare the current execution to the query's history, building on some of the technology that was introduced in the Query Store feature in SQL Server 2016 . For the next run of the same query, Adaptive Query Optimization can then improve the execution plan .

Because a change to an execution plan  that is based on one slow execution can have a dramatically damaging effect on system performance, the changes made by Adaptive Query Optimization are incremental and conservative. Over time, this feature handles the tuning a busy DBA may not have time to perform. This feature also benefits from Microsoft's management of Azure SQL Database because the development team monitors the execution data and the improvements that adaptive execution plans make in the cloud. They can then optimize the process and flow for adaptive execution plans in future versions of the on-premises product.

Are You a Business Intelligence Pro?
SQL Server includes much more than the database engine. Tools like Reporting Services (SSRS) and Analysis Services (SSAS) have long been a core part of the value proposition of SQL Server. Reporting Services benefited from a big overhaul in SQL Server 2016, and more improvements are coming in SQL Server 2017 with on-premises support for storage of Power BI reports in a SSRS instance. This capability is big news to organizations who are cloud-averse for various reasons. In addition, SQL Server 2017 adds support for the Power Query data sources in SSAS tabular models to expand. This capability means tabular models can store data from a broader range of data sources than it currently supports, such as Azure Blob Storage and Web page data.

2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell

And More...
Although it is only an incremental release, Microsoft has packed a lot of functionality into SQL Server 2017. I barely mentioned Linux in this article for a reason: From a database perspective SQL Server on Linux is simply SQL Server. Certainly, there are some changes in infrastructure, but your development experience in SQL Server, whether on Linux, Windows or Docker, is exactly the same.

Keep your environment always on with sql server 2016 sql bits 2017

From my perspective, the exciting news is not just the new features that are in this version, but also the groundwork for feature enhancements down the road. Adaptive query optimization will get better over time, as will the graph database feature which you can query by using standard SQL syntax. Furthermore, the enhancements to Azure SQL Database with managed instances should allow more organizations to consider adoption of the database as a service option. In general, I am impressed with Microsoft's ability to push the envelope on database technology so shortly after releasing SQL Server 2016.

Nordic infrastructure Conference 2017 - SQL Server on Linux Overview

You can get started with the CTP by downloading the package for Docker, https://hub.docker.com/r/microsoft/mssql-server-windows/ or the Linux, https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-red-hat platforms, or you can download the Windows release here https://www.microsoft.com/evalcenter/evaluate-sql-server-vnext-ctp .

More Information:












27 March 2017

IBM Power 9 CPU a Game Changer.

IBM Power 9 CPU

IBM is looking to take a bigger slice out of Intel’s lucrative server business with Power9, the company’s latest and greatest processor for the datacenter. Scheduled for initial release in 2017, the Power9 promises more cores and a hefty performance boost compared to its Power8 predecessor. The new chip was described at the Hot Chips event.

IBM Power9 CPU

The Power9 will end up in IBM’s own servers, and if the OpenPower gods are smiling, in servers built by other system vendors. Although none of these systems have been described in any detail, we already know that bushels of IBM Power9 chips will end up in Summit and Sierra, two 100-plus-petaflop supercomputers that the US Department of Energy will deploy in 2017-2018. In both cases, most of the FLOPS will be supplied by NVIDIA Volta GPUs, which will operate alongside IBM’s processors.

Power 9 Processor For The Cognitive Era

The Power9 will be offered in two flavors: one for single- or dual-socket servers for regular clusters, and the other for NUMA servers with four or more sockets, supporting much larger amounts of shared memory. IBM refers to the dual-socket version is as the scale-out (SO) design and the multi-socketed version as the scale-up (SU) design. They basically correspond to the Xeon E5 (EP) and Xeon E7 (EX) processor lines, although Intel is apparently going to unify those lines post-Broadwell.

The SU Power9 is aimed at mission-critical enterprise work and other application where large amounts of shared memory are desired. It has extra RAS features, buffered memory, and will tend to have fewer cores running at faster clock rates. As such, it carries on many of the traditions of the Power architecture through Power8. The SU Power9 is going to be released in 2018, well after the SO version hits the streets.

The SO Power9 is going after the Xeon dual-socket server market in a more straightforward manner. These chips will use direct attached memory (DDR4) with commodity DIMMs, instead of the buffered memory setup mentioned above. In general, this processor will adhere to commodity packaging so that Power9-based servers can utilize industry standard componentry. This is the platform destined for large cloud infrastructure and general enterprise computing, as well as HPC setups. It’s due for release sometime next year.

Distilling out the differences between the two varieties, here are the basics of the new Power9 (Power8 specs in parentheses for comparison):

  • 8 billion transistors (4.2 billion)
  • Up to 24 cores (Up to 12 cores)
  • Manufactured using 14nm FinFET (22nm SOI)
  • Supports PCIe Gen4 (PCIe Gen3)
  • 120 MB shared L3 cache (96 MB shared L3 cache)
  • 4-way and 8-way simultaneous multithreading (8-way simultaneous multithreading)
  • Memory bandwidth of 120 or 230 GB/sec (230 GB/sec)

From the looks of things, IBM spent most of the extra transistor budget it got from the 14nm shrink on extra cores and a little bit more L3 cache. New on-chip data links were also added, with an aggregate bandwidth of 7 TB/sec, which is used to feed each core at the rate of 256 GB/sec in a 12-core configuration. The bandwidth fans out in the other direction to supply data to memory, additional Power9 sockets, PCIe devices, and accelerators. Speaking of which, there is special support for NVIDIA GPUs in the form of NVLink 2.0 support, which promises much faster communication speeds than vanilla PCIe. An enhanced CAPI interface is also supported for accelerators that support that standard.

The accelerator story is one of the key themes of the Power9, which IBM is touting as “the premier platform for accelerated computing.” In that sense, IBM is taking a different tack than Intel, which is bringing accelerator technology on-chip and making discrete products out of them, as it has done with Xeon Phi and is in the process of doing with Altera FPGAs. By contrast, IBM has settled on the host-coprocessor model of acceleration, which offloads special-purpose processing to external devices. This has the advantage of flexibility; the Power9 can connect to virtually any type of accelerator or special-purpose coprocessor as long it speaks PCIe, CAPI or NVLink.

Understanding the IBM Power Systems Advantage

Thus the Power9 sticks with an essentially general-purpose design. As a standalone processor it is designed for mainstream datacenter applications (assuming that phrase has meaning anymore). From the perspective of floating point performance, it is about 50 percent faster than Power8, but that doesn’t make it an HPC chip, and in fact, even a mid-range Broadwell Xeon (E5-2600 V4) would likely outrun a high-end Power9 processor on Linpack. Which is fine. That’s what the GPUs and NVLink support are for.

IBM Power Systems Update 1Q17

If there is any deviation from the general-purpose theme, it’s in the direction of data-intensive workloads, especially analytics, business intelligence, and the broad category of “cognitive computing” that IBM is so fond of talking about. Here the Power processors have had something of an historical advantage in that they offered much higher memory bandwidth that their Xeon counterparts, in fact, about two to four times higher. The SO Power9 supports 120 GB/sec of memory bandwidth; the SU version, 230 GB/sec. The Power9 also comes with a very large (120 MB) L3 cache, which is built with eDRAM technology that supports speeds of up to 256 GB/sec. All of which serves to greatly lessen the memory bottleneck for data-intensive applications.

IBM Power Systems Announcement Update

According to IBM, Power9 was about 2.2 times faster for graph analytics workloads and about 1.9 times faster for business intelligence workloads. That’s on a per socket basis, comparing a 12-core Power9 to that of a 12-core Power8 at the same 4GHz clock frequency. Which is a pretty impressive performance bump from one generation to the next, although it should be pointed out that IBM offered no comparisons against the latest Broadwell Xeon chips.

The official Power roadmap from IBM does not say much in terms of timing, but thanks to the “Summit” and “Sierra” supercomputers that IBM, Nvidia, and Mellanox Technologies are building for the U.S. Department of Energy, we knew Power9 was coming out in late 2017. Here is the official Power processor roadmap from late last year:

And here is the updated one from the OpenPower Foundation that shows how compute and networking technologies will be aligned:

IBM revealed that the Power9 SO chip will be etched in the 14 nanometer process from Globalfoundries and will have 24 cores, which is a big leap for Big Blue.

That doubling of cores in the Power9 SO is a big jump for IBM, but not unprecedented. IBM made a big jump from two cores in the Power6 and Power6+ generations to eight cores with the Power7 and Power7+ generations, and we have always thought that IBM wanted to do a process shrink and get to four cores on the Power6+ and that something went wrong. IBM ended up double-stuffing processor sockets with the Power6+, which gave it an effective four-core chip. It did the same thing with certain Power5+ machines and Power7+ machines, too.

The other big change with the Power9 SO chip is that IBM is going to allow the memory controllers on the die to reach out directly and control external DDR4 main memory rather than have to work through the “Centaur” memory buffer chip that is used with the Power8 chips. This memory buffering has allowed for very high memory bandwidth and a large number of memory slots as well as an L4 cache for the processors, but it is a hassle for entry systems designs and overkill for machines with one or two sockets. Hence, it is being dropped.

The Power9 SU processor, which will be used in IBM’s own high-end NUMA machines with four or more sockets, will be sticking with the buffered memory. IBM has not revealed what the core count will be on the Power9 SU chip, but when we suggested that based on the performance needs and thermal profiles of big iron that this chip would probably have fewer cores, possibly more caches, and high clock speeds, McCredie said these were all reasonable and good guesses without confirming anything about future products.

LINUX on Power

The Power9 chips will sport an enhanced NVLink interconnect (which we think will have more bandwidth and lower latency but not more aggregate ports on the CPUs or GPUs than is available on the Power8), and we think it is possible that the Power9 SU will not have NVLink ports at all. (Although we could make a case for having a big NUMA system with lots and lots of GPUs hanging off of it using lots of NVLink ports instead of using an InfiniBand interconnect to link multiple nodes in a cluster together.)

The Power9 chip with the SMT8 cores are aimed at analytics workloads that are wrestling with lots of data, in terms of both capacity and throughput. The 24 core variant of the Power9 with SMT8 has 512 KB L2 cache memory per core, and 120 MB of L3 cache is shared across the dies in 10 MB segments with each pair of cores. The on-chip switch fabric can move data in and out of the L3 cache at 256 GB/sec, and adding in the various interconnects for memory controllers, PCI-Express 4.0 controllers, and the “Bluelink” 25 Gb/sec ports that are used to attach accelerators to the processors as well as underpinning the NVLink 2.0 protocol that will be added to next year’s “Volta” GV100 GPUs from Nvidia and IBM’s own remote SMP links for creating NUMA clusters with more than four sockets, and you have an on-chip fabric with over 7 TB/sec of aggregate bandwidth.

The Power9 chips will have 48 lanes of PCI-Express 4.0 peripheral I/O per socket, for an aggregate of 192 GB/sec of duplex bandwidth. In addition to this, the chip will support 48 lanes of 25 Gb/sec Bluelink bandwidth for other connectivity, with an aggregate bandwidth of 300 GB/sec. On the Power9 SU chips, 48 of the 25 Gb/sec lanes will be used for remote SMP links between quad-socket nodes to make a 16-socket machine, and the remaining 48 lanes of PCI-Express 4.0 will be used for PCI-Express peripherals and CAPI 2.0 accelerators. The Power9 chip has integrated 16 Gb/sec SMP links for gluelessly making the four-socket modules. In addition to the CAPI 2.0 coherent links running atop PCI-Express 4.0, there is a further enhanced CAPI protocol that runs atop the 25 Gb/sec Bluelink ports that is much more streamlined and we think is akin to something like NVM-Express for flash running over PCI-Express in that it eliminates a lot of protocol overhead from the PCI-Express bus. But that is just a hunch. It doesn’t look like the big bad boxes will be able to support this new CAPI or NVLink ports, by the way, since the Bluelink ports are eaten by NUMA expansion.

More Information:








21 February 2017

Why Cloudera's hadoop and Oracle?

Oracle 12c & Hadoop: Optimal Store and Process of Big Data

How to use the Hadoop Ecosystem tools to extract data from an Oracle 12c database, use the Hadoop Framework to process and transform data and then load the data processed within Hadoop into an Oracle 12c database.

Oracle big data appliance and solutions

This blog covers basic concepts:

  • What is Big Data? Big Data is the amount of data that one single machine cannot store and process. Data comes with different formats (structured, non - structured) from different sources and with great velocity of grow. 
  • What is Apache Hadoop? It is a framework allowing distributed processing of large data sets across many (can be thousands) of machines. Hadoop concept was first introduced by Google. Hadoop framework consists of HDFS and MapReduce. 
  • What is HDFS? HDFS (Hadoop Distributed File System): the Hadoop File System that enables storing large data sets across multiple machines. 
  • What is Map Reduce? The data processing component of the Hadoop Framework that consists of Map phase and Reduce phase. 
  • What is Apache Sqoop? Apache Sqoop(TM) is a tool to transfer bulk data between Apache Hadoop and structured data stores such as relational databases. It is part or the Hadoop ecosystem. 
  • What is Apache Hive? Hive is a tool to query and manage large datasets stored in Hadoop HDFS. It is also part of the Hadoop ecosystem. 
  • Where Does Hadoop Fit In? We will use the Apache Hadoop Ecosystem (Apache Sqoop) to extract data from an Oracle 12c database and store it into the Hadoop Distributed File System (HDFS). We will then use the Apache Hadoop Ecosystem (Apache Hive) to transform data and process it using the Map Reduce (We can also use Java programs to do the same). Apache Sqoop will be used to load the data already processed within Hadoop into an Oracle 12c database. The following image describes where Hadoop fits in the process. This scenario represents a practical solution to processing big data coming from Oracle database as a source; the only condition is that data source must be structured. Note that Hadoop can also process non – structured data like videos, log files etc.

Why Cloudera + Oracle?
For over 38 years Oracle has been the market leader of RDMBS database systems and a major influencer of enterprise software and hardware technology. Besides leading the industry in database solutions, Oracle also develops tools for software development, enterprise resource planning, customer relationship management, supply chain management, business intelligence, and data warehousing.  Cloudera has a long standing relationship with Oracle and has worked closely to develop enterprise class solutions that enable enterprise customers to quickly manage with big data workloads.
As the leader in Apache Hadoop-based data platforms, Cloudera has the enterprise quality and expertise that make them the right choice to work with on Oracle Big Data Appliance.
— Andy Mendelson, Senior Vice President, Oracle Server Technologies
Joint Solution Overview
Oracle Big Data Appliance
The Oracle Big Data Appliance is an engineered system optimized for acquiring, organizing, and loading unstructured data into Oracle Database 12c. The Oracle Big Data Appliance includes CDH, Oracle NoSQL Database, Oracle Data Integrator with Application Adapter for Apache Hadoop, Oracle Loader for Hadoop, an open source distribution of R, Oracle Linux, and Oracle Java HotSpot Virtual Machine.

Extending Hortonworks with Oracle's Big Data Platform

Oracle Big Data Discovery
Oracle Big Data Discovery is the visual face of Hadoop that allows anyone to find, explore, transform, and analyze data in Hadoop. Discover new insights, then share results with big data project teams and business stakeholders.

Oracle Big Data SQL Part 1-4

Oracle NoSQL Database
Oracle NoSQL Database Enterprise Edition is a distributed, highly scalable, key-value database. Unlike competitive solutions, Oracle NoSQL Database is easy-to-install, configure and manage, supports a broad set of workloads, and delivers enterprise-class reliability backed by enterprise-class Oracle support.

Oracle Data Integrator Enterprise Edition
Oracle Data Integrator Enterprise Edition is a comprehensive data integration platform that covers all data integration requirements: from high-volume, high-performance batch loads, to event-driven, trickle-feed integration processes. Oracle Data Integrator Enterprise Edition (ODI EE) provides native Cloudera integration allowing the use of the Cloudera Hadoop Cluster as the transformation engine for all data transformation needs. ODI EE utilizes Cloudera’s foundation of Impala, Hive, HBase, Sqoop, Pig, Spark as well as many others, to provide best in class performance and value. Oracle Data Integrator Enterprise Edition enhances productivity and provides a simple user interface for creating high performance to load and transform data to and from Cloudera data stores.

Oracle Loader for Hadoop
Oracle Loader for Hadoop enables customers to use Hadoop MapReduce processing to create optimized data sets for efficient loading and analysis in Oracle Database 12c. Unlike other Hadoop loaders, it generates Oracle internal formats to load data faster and use less database system resources.

How the Oracle and Hortonworks Handle Petabytes of Data

Oracle R Enterprise
Oracle R Enterprise integrates the open-source statistical environment R with Oracle Database 12c. Analysts and statisticians can run existing R applications and use the R client directly against data stored in Oracle Database 12c, vastly increasing scalability, performance and security. The combination of Oracle Database 12c and R delivers an enterprise-ready deeply-integrated environment for advanced analytics.

Discover Data Insights and Build Rich Analytics with Oracle BI Cloud Service

Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and Oracle R Enterprise will be available both as standalone software products independent of the Oracle Big Data Appliance.

Learn More
Download details about the Oracle Big Data Appliance
Download the solution brief: Driving Innovation in Mobile Devices with Cloudera and Oracle

Oracle is the leader in developing software to address a enterprise data management.  Typically known as a database leader, they also develop and build tools for software development, enterprise resource planning, customer relationship management, supply chain management, business intelligence, and data warehousing.  Cloudera has a long standing relationship with Oracle and have worked closely to develop enterprise class solutions that can enable end customers to more quickly get up and running with big data.

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle Big Data Discovery

Oracle Big Data SQL product, will be of interest to anyone who saw our series of posts a few weeks ago about the updated Oracle Information Management Reference Architecture, where Hadoop now sits alongside traditional Oracle data warehouses to provide what’s termed a “data reservoir”. In this type of architecture, Hadoop and its underlying technologies HDFS, Hive and schema-on-read databases provide an extension to the more structured relational Oracle data warehouses, making it possible to store and analyse much larger sets of data with much more diverse data types and structures; the issue that customers face when trying to implement this architecture is that Hadoop is a bit of a “wild west” in terms of data access methods, security and metadata, making it difficult for enterprises to come up with a consistent, over-arching data strategy that works for both types of data store.

Bringing Self Service Data Preparation to the Cloud; Oracle Big Data Preparation Cloud Services

Oracle Big Data SQL attempts to address this issue by providing a SQL access layer over Hadoop, managed by the Oracle database and integrated in with the regular SQL engine within the database. Where it differs from SQL on Hadoop technologies such as Apache Hive and Cloudera Impala is that there’s a single unified data dictionary, single Oracle SQL dialect and the full management capabilities of the Oracle database over both sources, giving you the ability to define access controls over both sources, use full Oracle SQL (including analytic functions, complex joins and the like) without having to drop down into HiveQL or other Hadoop SQL dialects. Those of you who follow the blog or work with Oracle’s big data connector products probably know of a couple of current technologies that sound like this; Oracle Loader for Hadoop (OLH) is a bulk-unloader for Hadoop that copies Hive or HDFS data into an Oracle database typically faster than a tool like Sqoop, whilst Oracle Direct Connector for HDFS (ODCH) gives the database the ability to define external tables over Hive or HDFS data, and then query that data using regular Oracle SQL.

Storytelling with Oracle Analytics Cloud

Where ODCH falls short is that it treats the HDFS and Hive data as a single stream, making it easy to read once but, like regular external tables, slow to access frequently as there’s no ability to define indexes over the Hadoop data; OLH is also good but you can only use it to bulk-load data into Oracle, you can’t use it to query data in-place. Oracle Big Data SQL uses an approach similar to ODCH but crucially, it uses some Exadata concepts to move processing down to the Hadoop cluster, just as Exadata moves processing down to the Exadata storage cells (so much so that the project was called “Project Exadoop” internally within Oracle up to the launch) - but also meaning that it's Exadata only, and not available for Oracle Databases running on non-Exadata hardware.

As explained by the launch blog post by Oracle’s Dan McClary https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one  , Oracle Big Data SQL includes components that install on the Hadoop cluster nodes that provide the same “SmartScan” functionality that Exadata uses to reduce network traffic between storage servers and compute servers. In the case of Big Data SQL, this SmartScan functionality retrieves just the columns of data requested in the query (a process referred to as “column projection”), and also only sends back those rows that are requested by the query predicate.

Unifying Metadata

To unify metadata for planning and executing SQL queries, we require a catalog of some sort.  What tables do I have?  What are their column names and types?  Are there special options defined on the tables?  Who can see which data in these tables?

Given the richness of the Oracle data dictionary, Oracle Big Data SQL unifies metadata using Oracle Database: specifically as external tables.  Tables in Hadoop or NoSQL databases are defined as external tables in Oracle.  This makes sense, given that the data is external to the DBMS.

Wait a minute, don't lots of vendors have external tables over HDFS, including Oracle?

 Yes, but Big Data SQL provides as an external table is uniquely designed to preserve the valuable characteristics of Hadoop.  The difficulty with most external tables is that they are designed to work on flat, fixed-definition files, not distributed data which is intended to be consumed through dynamically invoked readers.  That causes both poor parallelism and removes the value of schema-on-read.

  The external tables Big Data SQL presents are different.  They leverage the Hive metastore or user definitions to determine both parallelism and read semantics.  That means that if a file in HFDS is 100 blocks, Oracle database understands there are 100 units which can be read in parallel.  If the data was stored in a SequenceFile using a binary SerDe, or as Parquet data, or as Avro, that is how the data is read.  Big Data SQL uses the exact same InputFormat, RecordReader, and SerDes defined in the Hive metastore to read the data from HDFS.

Once that data is read, we need only to join it with internal data and provide SQL on Hadoop and a relational database.

Optimizing Performance

Being able to join data from Hadoop with Oracle Database is a feat in and of itself.  However, given the size of data in Hadoop, it ends up being a lot of data to shift around.  In order to optimize performance, we must take advantage of what each system can do.

In the days before data was officially Big, Oracle faced a similar challenge when optimizing Exadata, our then-new database appliance.  Since many databases are connected to shared storage, at some point database scan operations can become bound on the network between the storage and the database, or on the shared storage system itself.  The solution the group proposed was remarkably similar to much of the ethos that infuses MapReduce and Apache Spark: move the work to the data and minimize data movement.

The effect is striking: minimizing data movement by an order of magnitude often yields performance increases of an order of magnitude.

Big Data Analyics using Oracle Advanced Analytics12c and BigDataSQL

Big Data SQL takes a play from both the Exadata and Hadoop books to optimize performance: it moves work to the data and radically minimizes data movement.  It does this via something we call Smart Scan for Hadoop.

Oracle Exadata X6: Technical Deep Dive - Architecture and Internals

Moving the work to the data is straightforward.  Smart Scan for Hadoop introduces a new service into to the Hadoop ecosystem, which is co-resident with HDFS DataNodes and YARN NodeManagers.  Queries from the new external tables are sent to these services to ensure that reads are direct path and data-local.  Reading close to the data speeds up I/O, but minimizing data movement requires that Smart Scan do some things that are, well, smart.

Smart Scan for Hadoop

Consider this: most queries don't select all columns, and most queries have some kind of predicate on them.  Moving unneeded columns and rows is, by definition, excess data movement and impeding performance.  Smart Scan for Hadoop gets rid of this excess movement, which in turn radically improves performance.

For example, suppose we were querying a 100 of TB set of JSON data stored in HDFS, but only cared about a few fields -- email and status -- and only wanted results from the state of Texas.
Once data is read from a DataNode, Smart Scan for Hadoop goes beyond just reading.  It applies parsing functions to our JSON data, discards any documents which do not contain 'TX' for the state attribute.  Then, for those documents which do match, it projects out only the email and status attributes to merge with the rest of the data.  Rather than moving every field, for every document, we're able to cut down 100s of TB to 100s of GB.

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-Time and Predictive Analytics

The approach we take to optimizing performance with Big Data SQL makes Big Data much slimmer.

Data Reduction in Data Base:

Oracle In-database MapReduce in 12c (big data)
There is some interest from the field about what is In-database map-reduce option and why and how it is different than hadoop solution.
I though I will share my thoughts on it.

 In-database map-reduce is an umbrella term that includes two features.
            "SQL Map-reduce" or  "SQL pattern matching".
             In database container for Hadoop.  to be released in future release.

"SQL MapReduce" : Oracle database 12c introduced a new feature called PATTERN MATCHING using "MATCH_RECOGNIZE" clause in SQL. This is one of the latest ANSI SQL standards proposed and implemented by Oracle. The new sql syntax helps to intuitively solve complex queries that are not easy to implement using 11g analytical functions alone. Some of the use cases are fraud detection, gene sequencing, time series calculation, stock ticker pattern matching . Etc.  I found most of the use case for Hadoop can be done using match_recognize in database on structured data. Since this is just a SQL enhancement , it is there in both Enterprise & Standard Edition database.

Big Data gets Real time with Oracle Fast Data

"In database container for Hadoop  (beta)" : if you have your development team more skilled at Hadoop and not SQL , or want to implement some complex pre-packaged Hadoop algorithms, you could use oracle container for Hadoop (beta). It is a Hadoop prototype APIs  which run within the java virtual machine in the database.

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

It implements Hadoop Java APIs and interfaces with database using parallel table functions to read data in parallel. One interesting fact about parallel table functions is that it can run in parallel across RAC cluster and also can also route data to a specific parallel processes . This functionality is the key in making Hadoop scale across clusters and  this functionality exited in database for over 15 years now.  Advantage of in-database Hadoop  is:

  • No need to move data out of database for running Mapreduce functions and hence save time and resources.
  •  More  real time data could be used.
  •  Less redundant copies of data and hence better security & less disk space used.
  •  The servers could be used for not just MapReduce work, but also used to run the database making better resource utilization,
  • The output of the MapReduce is immediately available for analytic tools and can combine this functionality along with database features like "in-memory option (beta) to get near real time analysis of Big Data. 
  • Combine db features for security. Backup, auditing, performance with MapReduce. API.
  • The ability to stream the output of one parallel table function as input to the next parallel table function has an advantage of not needing to maintain any intermediate stages.
  • Features like graphical, test, spacial and semantic within oracle database can be used for further analysts.

In addition to this, Oracle 12c will support schema less access using JSON protocol. That will help big data use cases of NOSQL to run on data within Oracle database as well.

Having these features will help to solve MapReduce challenges when the data is mostly within database and reduce data movement and make better use of available resources..
If Most of your data is outside the DB, then sql Connectors for hadoop and Oracle Loader for Hadoop could be used.

More Information:

















25 January 2017

IBM Predictive Analytics

About Big Data Analytics

What's New in IBM Predictive Analytics

The 5 V’s of Big Data

Too often in the hype and excitement around Big Data, the conversation gets complicated very quickly. Data scientists and technical experts bandy around terms like Hadoop, Pig, Mahout, and Sqoop, making us wonder if we’re talking about information architecture or a Dr. Seuss book. Business executives who want to leverage the value of Big Data analytics in their organisation can get lost amidst this highly-technical and rapidly-emerging ecosystem.

Overview - IBM Big Data Platform

In an effort to simplify Big Data, many experts have referenced the “3 V’s”: Volume, Velocity, and Variety. In other words, is information being generated at a high volume (e.g. terabytes per day), with a rapid rate of change, encompassing a broad range of sources including both structured and unstructured data? If the answer is yes then it falls into the Big Data category along with sensor data from the “internet of things”, log files, and social media streams. The ability to understand and manage these sources, and then integrate them into the larger Business Intelligence ecosystem can provide previously unknown insights from data and this understanding leads to the “4th V” of Big Data – Value.

There is a vast opportunity offered by Big Data technologies to discover new insights that drive significant business value. Industries are seeing data as a market differentiator and have started reinventing themselves as “data companies”, as they realise that information has become their biggest asset. This trend is prevalent in industries such as telecommunications, internet search firms, marketing firms, etc. who see their data as a key driver for monetisation and growth. Insights such as footfall traffic patterns from mobile devices have been used to assist city planners in designing more efficient traffic flows. Customer sentiment analysis through social media and call logs have given new insights into customer satisfaction. Network performance patterns have been analysed to discover new ways to drive efficiencies. Customer usage patterns based on web click-stream data have driven innovation for new products and services to increase revenue. The list goes on.

IBM predictive analytics with Apache Spark: Coding optional, possibilities endless

Key to success in any Big Data analytics initiative is to first identify the business needs and opportunities, and then select the proper fit-for-purpose platform. With the array of new Big Data technologies emerging at a rapid pace, many technologists are eager to be the first to test the latest Dr. Seuss-termed platform. But each technology has a unique specialisation, and might not be aligned to the business priorities. In fact, some identified use cases from the business might be best suited by existing technologies such as a data warehouse while others require a combination of existing technologies and new Big Data systems.

With this integration of disparate data systems comes the 5th V – Veracity, i.e. the correctness and accuracy of information.

Behind any information management practice lies the core doctrines of Data Quality, Data Governance, and Metadata Management, along with considerations for Privacy and Legal concerns.

Big Data & Analytics Architecture

Big Data needs to be integrated into the entire information landscape, not seen as a stand-alone effort or a stealth project done by a handful of Big Data experts.

Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include different types such as structured/unstructured and streaming/batch, and different sizes from terabytes to zettabytes. Big data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency. And it has one or more of the following characteristics – high volume, high velocity, or high variety. Big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media - much of it generated in real time and in a very large scale.

What’s new in predictive analytics: IBM SPSS and IBM decision optimization

Analyzing big data allows analysts, researchers, and business users to make better and faster decisions using data that was previously inaccessible or unusable. Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing, businesses can analyze previously untapped data sources independent or together with their existing enterprise data to gain new insights resulting in significantly better and faster decisions.

  • Advanced analytics enables you to find deeper insights and drive real-time actions.
  • With advanced analytics capabilities, you can understand what happened, what will happen and what should happen.
  • Easily engage both business and technical users to uncover opportunities and address big issues. Operationalize analytics into business processes

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical platform

Prescriptive analytics

What if you could make strategic decisions based not only on what has occurred or is likely to occur in the future, but through targeted recommendations based on why and how things happen? Prescriptive analytics technology recommends actions based on desired outcomes, taking into account specific scenarios, resources and knowledge of past and current events. This insight can help your organization make better decisions and have greater control of business outcomes.

Prescriptive analytics is the next step on the path to insight-based actions. It creates value through synergy with predictive analytics, which analyzes data to predict a future outcome. Prescriptive analytics takes that insight to the next level by suggesting the optimal way to handle that future situation. Organizations that can act fast in dynamic conditions and make superior decisions in uncertain environments gain a strong competitive advantage.

IBM prescriptive analytics solutions provide organizations in commerce, financial services, healthcare, government and other highly data-intensive industries with a way to analyze data and transform it into recommended actions almost instantaneously. These solutions combine predictive models, deployment options, localized rules, scoring and optimization techniques to form a powerful foundation for decision management. For example, you can:

  • Automate complex decisions and trade-offs to better manage limited resources.
  • Take advantage of a future opportunity or mitigate a future risk.
  • Proactively update recommendations based on changing events.
  • Meet operational goals, increase customer loyalty, prevent threats and fraud, and optimize business processes.

The information management big data and analytics capabilities include :

Data Management & Warehouse: Gain industry-leading database performance across multiple workloads while lowering administration, storage, development and server costs; Realize extreme speed with capabilities optimized for analytics workloads such as deep analytics, and benefit from workload-optimized systems that can be up and running in hours.

Hadoop System: Bring the power of Apache Hadoop to the enterprise with application accelerators, analytics, visualization, development tools, performance and security features.

Stream Computing: Efficiently deliver real-time analytic processing on constantly changing data in motion and enable descriptive and predictive analytics to support real-time decisions. Capture and analyze all data, all the time, just in time. With stream computing, store less, analyze more and make better decisions faster.

Content Management: Enable comprehensive content lifecycle and document management with cost-effective control of existing and new types of content with scale, security and stability.

Information Integration & Governance: Build confidence in big data with the ability to integrate, understand, manage and govern data appropriately across its lifecycle.

From insight to action: Predictive and prescriptive analytics

The 5 game changing big data use cases

While much of the big data activity in the market up to now has been experimenting and learning about big data technologies, IBM has been focused on also helping organizations understand what problems big data can address.

We’ve identified the top 5 high value use cases that can be your first step into big data:

Big Data Exploration
Find, visualize, understand all big data to improve decision making. Big data exploration addresses the challenge that every large organization faces: information is stored in many different systems and silos and people need access to that data to do their day-to-day work and make important decisions.

What is the Big Data Exploration use case?

Big data exploration addresses the challenge faced by every large organization: business information is spread across multiple systems and silos and people need access to that data to meet their job requirements and make important decisions. Big Data Exploration enables you to explore and mine big data to find, visualize, and understand all your data to improve decision making. By creating a unified view of information across all data sources - both inside and outside of your organization - you gain enhanced value and new insights.

Ask yourself:

  • Are you struggling to manage and extract value from the growing volume and variety of data and need to unify information across federated sources?
  • Are you unable to relate “raw” data collected from system logs, sensors, or click streams with customer and line-of-business data managed in your enterprise systems?
  • Do you risk exposing unsecure personal information and/or privileged data due to lack of information awareness?
If you answered yes to any of the above questions, the big data exploration use case is the best starting point for your big data journey.

Introduction to apache spark v3

Enhanced 360º View of the Customer
Extend existing customer views by incorporating additional internal and external information sources. Gain a full understanding of customers—what makes them tick, why they buy, how they prefer to shop, why they switch, what they’ll buy next, and what factors lead them to recommend a company to others.

IBM Watson Analytics Presentation

What is the Enhanced 360º View of the Customer big data use case?

With the onset of the digital revolution, the touch points between an organization and its customers have increased many times over; organizations now require specialized solutions to effectively manage these connections. An enhanced 360-degree view of the customer is a holistic approach that takes into account all available and meaningful information about the customer to drive better engagement, more revenue and longterm loyalty. It combines data exploration, data governance, data access, data integration and analytics in a solution that harnesses the volume, velocity and variety. IBM provides several important capabilities to help you make effective use of big data and improve the customer experience.

Ask yourself:

  • Do you need a deeper understanding of customer sentiment from both internal and external sources?
  • Do you want to increase customer loyalty and satisfaction by understanding what meaningful actions are needed?
  • Are you challenged to get the right information to the right people to provide customers what they need to solve problems, cross-sell, and up-sell?

If you answered yes to any of the above questions, the enhanced 360 view of the customer use case is the best starting point for your big data journey.

With Enhanced 360º View of the Customer, you can:

Improve campaign effectiveness
Accurate, targeted cross-sell / up-sell
Retain your most profitable customers
Deliver superior customer experience at the point of service

Security Intelligence Extension

Lower risk, detect fraud and monitor cyber security in real time. Augment and enhance cyber security and intelligence analysis platforms with big data technologies to process and analyze new types (e.g. social media, emails, sensors, Telco) and sources of under-leveraged data to significantly improve intelligence, security and law enforcement insight.

What is the Security Intelligence big data use case?

The growing number of high-tech crimes - cyber-based terrorism, espionage, computer intrusions, and major cyber fraud - poses a real threat to every individual and organization. To meet the security challenge, businesses need to augment and enhance cyber security and intelligence analysis platforms with big data technologies to process and analyze new data types (e.g. social media, emails, sensors, Telco) and sources of under-leveraged data. Analyzing data in-motion and at rest can help find new associations or uncover patterns and facts to significantly improve intelligence, security and law enforcement insight.

Ask yourself:

  • Do you need to enrich your security or intelligence system with underleveraged or unused data sources (video, audio, smart devices, network, Telco, social media)?
  • Are you able to address the need for sub second detection, identification, resolution of physical or cyber threats?
  • Are you able to follow activities of criminals, terrorists, or persons in a blacklist and detect criminal activity before it occurs?

If you answered yes to any of the above questions, the security intelligence extension use case is the best starting point for your big data journey.
There are three main areas for Security Intelligence Extension>

Enhanced intelligence and surveillance insight. Analyzing data in-motion and at rest can help find new associations or uncover patterns and facts. This type of real or near real-time insight can be invaluable and even life-saving.

Real-time cyber attack prediction & mitigation. So much of our lives are spent online, and the growing number of high-tech crimes, including cyber-based terrorism, espionage, computer intrusions, and major cyber fraud, pose a real threat to potentially everyone. By analyzing network traffic, organizations can discover new threats early and react in real time.

Crime prediction & protection. The ability to analyze internet (e.g. email, VOIP), smart devices (e.g. location, call detail records) and social media data can help law enforcement organizations better detect criminal threats and gather criminal evidence. Instead of waiting for a crime to be committed, they can prevent them from happening in the first place and proactively apprehend criminals.

With Security Intelligence Extension, organizations can:

  • Sift through massive amounts of data - both inside and outside your organization - to uncover hidden relationships, detect patterns, and stamp out security threats
  • Uncover fraud by correlating real-time and historical account activity to uncover abnormal user behavior and suspicious transactions
  • Examine new sources and varieties of data for evidence of criminal activity, such as internet, mobile devices, transactions, email, and social media

Operations Analysis
Analyze a variety of machine and operational data for improved business results. The abundance and growth of machine data, which can include anything from IT machines to sensors and meters and GPS devices requires complex analysis and correlation across different types of data sets. By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.

What is the Operations Analysis big data use case?

Operations Analysis focuses on analyzing machine data, which can include anything from IT machines to sensors, meters and GPS devices. It’s growing at exponential rates and comes in large volumes and a variety of formats, including in-motion, or streaming data. Leveraging machine data requires complex analysis and correlation across different types of data sets. By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.

Ask yourself:

  • Do you have real-time visibility into your business operations including customer experience and behavior?
  • Are you able to analyze all your machine data and combine it with enterprise data to provide a full view of business operations?
  • Are you proactively monitoring end-to-end infrastructure to avoid problems?

If you answered yes to any of the above questions, the Operations Analysis use case is the best starting point for your big data journey.

Through Operations Analysis, organizations can:

  • Gain real-time visibility into operations, customer experience and behavior
  • Analyze massive volumes of machine data with sub-second latency to identify events of interest as they occur
  • Apply predictive models and rules to identify potential anomalies or opportunities
  • Optimize service levels in real-time by combining operational and enterprise data

Data Warehouse Modernization
Integrate big data and data warehouse capabilities to increase operational efficiency. Optimize your data warehouse to enable new types of analysis. Use big data technologies to set up a staging area or landing zone for your new data before determining what data should be moved to the data warehouse. Offload infrequently accessed or aged data from warehouse and application databases using information integration software and tools.

IBM Big Data Analytics Concepts and Use Cases

What is the Data Warehouse Modernization big data use case?

Data Warehouse Modernization (formerly known as Data Warehouse Augmentation) is about building on an existing data warehouse infrastructure, leveraging big data technologies to ‘augment’ its capabilities. There are three key types of Data Warehouse Modernizations:

  • Pre-Processing - using big data capabilities as a “landing zone” before determining what data should be moved to the data warehouse
  • Offloading - moving infrequently accessed data from data warehouses into enterprise-grade Hadoop
  • Exploration - using big data capabilities to explore and discover new high value data from massive amounts of raw data and free up the data warehouse for more structured, deep analytics.

Ask yourself:

  • Are you integrating big data and data warehouse capabilities to increase operational efficiency?
  • Have you taken steps to migrate rarely used data to new technologies like Hadoop to optimize storage, maintenance and licensing costs?
  • Are you using stream computing to filter and reduce storage costs? 
  • Are you leveraging structured, unstructured, and streaming data sources required for deep analysis?
  • Do you have a lot of cold, or low-touch data that is driving up costs or slowing performance?

If you answered yes to any of the above questions, the Data Warehouse Modernization use case is the best starting point for your big data journey.

With Data Warehouse Modernization, organizations can:

  • Combine streaming and other unstructured data sources to existing data warehouse investments
  • Optimize data warehouse storage and provide query-able archive
  • Rationalize the data warehouse for greater simplicity and lower cost
  • Provide better query performance to enable complex analytical applications
  • Deliver improved business insights to operations for real-time decision-making

Analytics and Big Data are pointless without good and Accurate Data. That is why IBM Launched the IBM DataFirst   http://www.ibmbigdatahub.com/blog/chief-takeaways-ibm-datafirst-launch-event

More Information: