How do you rate the contents of this Blog

Sunday, 9 December 2012

Getting started with Hadoop Configuration # 2

Continued from Getting started with Hadoop Configuration # 1...

The installation of Ubuntu on the Virtual environment begins....



It shall takes time to install Ubuntu & configure it files in the Virtual environment, after the successful installation, the following screen appears for you to login to the environment..

Login to Ubuntu environment with the user name and password, after the successful login, change the display settings to be in the appropriate screen resolution whichever you are comfortable. 
Please note that the user that you have created is not the "Root" user for the administrator however for hadoop configuration and subsequent operations, the root user privilege shall be required.
  • For the benefit of new users to Unix/Linux, a quick intro to "root" is that it is the super user name similar to that of administrator group in Windows environment.
  • By default, the Root user is disabled in the Ubuntu environment so that no one logins directly as root as it might be risky if you accidentally execute a wrong command incorrectly and destroy the system.
  • "sudo" is the command that enables the user to execute certain programs as Root without having to know the root password; which shall also be used to enable the root user.
Hence the first step in the hadoop configuration is to enable to "root".

Open a Terminal in the Ubuntu environment, in case if you are not able to find the Terminal; search for the same by clicking "Dash Home" with the keyword "Terminal".  The below screen shall be opened,


Follow the below steps,

ramjir@ubuntu:~$ sudo -i[sudo] password for ramjir:
root@ubuntu:~#

Enter the password, the Root user is enabled; to change the password for the root user as follows,

root@ubuntu:~# sudo passwd root
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully

The next steps in Hadoop configuration is to install the pre-requisites for Hadoop and install hadoop which shall be followed very soon....

Catch u later..

Wednesday, 28 November 2012

Getting started with Hadoop Configuration # 1

I was planning to continue my series on Business Performance & OSSM however I could not do due to various reasons (one being very hectic in my work); I am extemely sorry about that.  

Recently one of my friend was discussing with Hadoop configuration on Ubuntu in a VM environment on Windows platform and were sharing our individual experiences; which made me to write this blog.

Before getting into the details of configuration, would like to cover some of the basic terminologies.

Apache Hadoop
Apache Hadoop is an open-source framework, designed for cluster architecture built out of commodity hardware supporting data-intensive distributed applications through its 2 main kernels - MapReduce and HDFS (Hadoop Distributed File System).  Hadoop is one of the top-notch Apache project supported by community of developers written in the Java Programming language.  For more details refer the following web sites,
  • apache.org
  • wikipedia - Hadoop, MapReduce, HDFS, etc.,
Virtual Machine

A Virtual Machine, as many of you know, it is a simulation of a computer implemented by software that executes program as a physical machine. Two categories of virtualization - Hardware & Application virtualization.

Hardware virtualization enables to have multiple OS environments co-exist in the same computer however each of them are isolated completely; for example have multiple environments (Windows & Linux) existing on the same computer.   There are many VM tools that can be used; in this blog, I am referring to the usage of VMware Player.

Application virtualization, enables normal application to run inside host OS by a single process; mainly to provide Platform-independent programming environment.  For example, Java Programming language running on a separate JVM (Java Virtual Machine).

VMware's Player (VMware Player) is a freeware from VMware Inc. 

Ubuntu
Ubuntu is an operating system based on the Debian Linux distribution and distributed as free and open source software. Ubuntu is an operating system based on the Debian Linux distribution and distributed as free and open source software. Ubuntu as a platform provide multiple versions/categories of OS such as Desktop, Server, Cloud, Ubuntu for Android, Ubuntu TV etc., One of the key thing about Ubuntu is that the Ubuntu team committing to provide scheduled releases on a predictable six-month basis and every fourth release, on a two-year basis; above all providing long-term support (LTS).

The blog shall be primarily about configuration of Hadoop in the following environment,
  • VMware Player 4.0.4
  • Ubuntu 12.0.4 Desktop version
  • Hadoop 0.23 version
Note: the above configuration is recommended for self-learning purpose only and for any development/production setup, you might need to reach vmware for the appropriate products to setup a virtualized environment and similarly Ubuntu for respective products (though Ubuntu is free).  Also Hadoop is an open source however there are hadoop based distribution systems such as Cloudera etc., is to be considered for such development/production environment)

The following are the high level steps and/or pre-requisites involved for the configuration,
  1. Download VMware Player 4.0.4 from the site www.vmware.com under the Desktop & End-user computing based on your windows machine configuration (i.e., 32 bit or 64 bit)
  2. Download the Ubuntu 12.0.4 Desktop version (ubuntu-12.04.1-desktop-i386.iso) from www.ubuntu.com/download
  3. Install the VMWare Player
  4. In the windows machine, identify a Drive / partition which has the larger available / empty space (say around 60+ GB space); create an empty folder say for example : G:\UbuntuVM
(Note: the version would have been updated for the above ones as VMware player has 5.0 available right now, you may desire to download whichever that is convenient for you as well as based on availability).

Configuration of Ubuntu on VM environment

Open the VMWare Player (Windows Start -> Programs -> VMware -> VMware Player); the following screen is opened,




Click on "Create a New Virtual Machine"


If you have obtained Ubuntu in the form of CD, use the first option, else select the "Installer disc image file (iso):" option and select the already downloaded CD image file i.e., iso file (ubuntu-12.04.1-desktop-i386.iso)


Click Next


Specify the above details and Click Next; please note that this user shall be the username & password for your virtual OS login.


Specify the Virtual Machine name and the Location (say as per our example - G:\UbuntuVM i.e., an empty folder created in the drive which has 60+ GB space). Click Next


Click Next

Click Finish.
Note: In case if you want to specify any Customized Hardware specific options such as Memory, Processors, etc., you can provide them).

It might take some time, after you which you are ready with Ubuntu setup..

Further updates shall continue very soon....

Sunday, 17 June 2012

Business Performance Management & OSSM - Part # 2

In my last blog, wrote about the following,
  • What is Enterprise Performance Management?
  • Balanced Scorecard Framework & its four perspectives
  • Scorecard Components

This section(s) of the blog shall cover,

  • Terminologies such as KPIs, Objectives, Initiatives, Strategy Maps, etc.,
  • Oracle BI - EPM Architecture


KPI (Key Performance Indicator) - In general, Key Performance Indicator represents the result of a business measure (for example - Operational Cost, Attrition Rate, etc.,) that are being evaluated against a specific target value associated to that measure.  In the context of Balance Scorecard, these KPIs are used to assess the progress of the objectives and intiatives that comprises of the overall Organizational Strategy.

Objectives - Objectives are the desired / expected outcomes of the overall Organizational Strategy; objectives shall become meaningless unless otherwise associated with one or more KPIs to measure the progress of the objectives. For example, an Objective can be improved Employee satisfaction within the Organization for which KPIs that can be associated are Attrition rate; Employee Satisfaction Index etc.,

Initiatives - Initiatives are set of tasks or in some cases could be even projects that have been carried out to accomplish the objectives in a stipulated period of time within an organization.  As the tasks / projects involve various activities within an intiative, it might be supporting many objectives.  For example, Go-Green Initiative in a department can reduce the Operational cost and also improves the profitability.

Strategy Maps - Strategy Map is a pictorial representation of the strategic goals that are being defined within an organization depicting the objectives that are being defined for the scorecard aligned with two or more perspectives.  In addition to the BSC framework's 4 perspectives, user can define their own perspective if required.

Oracle BI - EPM Architecture

The following diagram depicts the Oracle BI-EPM architecture (original source by Oracle however redefined for this blog)


Oracle BI (Business Intelligence) Foundation Suite consists of the following,
  • Oracle Business Intelligence Enterprise Edition (OBIEE) 11g
  • Oracle BI Publisher
  • Oracle Essbase
  • Oracle Scorecard & Strategy Management
 
In my personal opinion, OBIEE11g has been significantly competitive with other reporting/BI tools when compared to OBIEE 10g.
 
Some of my experiences with respect to OBIEE 11g key features are as follows (there are many more J),
  • Integration of Scorecard & Strategy Management with OBIEE
  • Security is delegated to Oracle Fusion Middleware (i.e., Oracle Enterprise Manager Console) and leveraged using the Oracle Platform Security Services
  • Usage of Flash components for Advanced Vizualization
    • Zoom in & Zoom out of Graphs at both X-axis enables us to define a single report with a specific hierarchy of a dimension; leveraging the zoom facility to view the report at a specific level
    • Slider facility
  • Integration of Action Framework
  • Mobile Integration (iPad & iPhone) with a limitation of not having drilling of reports from the device
  • Seamless integration of Geospatial stuff within the dashboards
  • etc.,
OSSM is an integrated component of OBIEE i.e., bundled with the OBIEE 11g along with Answers; the advantage is that the KPIs are defined in the core metadata (RPD of OBIEE) and can be easily correlated with the objectives defined in OSSM. Since the KPIs are defined in the core metadata, the actual / target data from operational source systems or datawarehouses in order to measure the progress of an objective is done easily.

In the next section of the blog, shall provide the following,
  • Terminologies within OSSM such as Scorecard Editor, KPI watchlist etc.,
  • Steps to create a Scorecard in OSSM
Until then.. please wait.. I appreciate your patience

Tuesday, 29 May 2012

Business Performance Management & OSSM - Part # 1

Recently I was having a discussion with one of my colleague on the Oracle BI & EPM Technology Stack and apparently found that if anyone has not followed the Oracle BI & EPM journey with the acquisitions of Oracle for last 2-3 years, it might be confusing to them on the components and some of the tools might replicate the same functionality in multiple toolsets.

Though my intention is not to clarify the entire Oracle BI & EPM stack but am trying to break a tip of iceberg conceptually as well as technology stack perspective - Oracle Scorecard & Strategy Management (OSSM) - an Enterprise Performance Management tool.

Before getting into the OSSM as such, it is important that we need to understand the basics of the theory associated with this tool.

What is Enterprise Performance Management?
Enterprise Performance Management is a broader term covering the business methodologies, metrics, processes (such as Planning, Budgeting, Consolidation etc.,) and systems used to drive the overall performance of an enterprise.

To be in simple terms, for an enterprise to be successful, there are 3 main activities,
  1. Identify the Right Goals & Objectives to achieve
  2. Consolidation of relevant information to organization's progress against these goals
  3. Improve Performance and initiate appropriate process improvements for achieving these goals
Synonyms of Enterprise Performance Management includes Business Performance Management & Corporate Performance Management.

EPM enables an enterprise to model and change its business processes to meet the specific needs of the business quickly and more cost-effectively by linking the respective strategies into execution.

EPM has been strongly influenced by the Balanced Scorecard Framework.

Balanced Scorecard Framework
In early 1990s, Dr. Robert S. Kaplan & David Norton developed a management approach called "Balanced Scorecard Framework" that defines a set of measures linked to the vision and strategy against the following four Perspectives,
  • Financial
  • Customers
  • Internal Business Processes
  • Learning & Growth


Balanced Scorecard Framework - Transalation Vision & Strategy - Four Perspectives
In the above diagram, the Customer perspective is also being referred as "Stakeholder" within the framework.

 
An Enterprise Strategy is summarized in a Strategy Map, which is a visual representation of what the execution team determines to drive their strategy which is usually a series of objectives that shall lead to accomplish the goals of the above 4 perspectives that are defined within the Strategy.

Scorecard Components


The above scorecard components are tightly linked together.  Strategic goals link down to objectives, objectives link down to measurements, and measurements link to targets. 

 
In the next section(s) of the blog shall cover the following aspects,
  • Sample of Strategy Map
  • Constructing a Strategy Map
  • Role of OSSM within Oracle BI - EPM Stack
  • OSSM Concepts with a sample
Till then.. please wait.. I appreciate your patience though I shall try my level best not to take another 1 month for me to complete those sections J

     
 

Friday, 27 April 2012

Operational Analytics & Hybrid Data Integration

Traditional Data Warehousing & Business Intelligence solutions has been designed and built to address the Strategic decision making in an enterprise among the C-grade Executives and key decision makers however in addition to the Strategic BI initiatives, the trend is among developing solutions with the Operational BI capability.

What is this new Buzz word - Operational Analytics?
Operational Analytics is primarily to provide decision making ability for the mid-level management staff and operational managers to manage and optimize the day-to-day business operations with appropriate information on the business events.  For example, a store manager making a decision for on-the-spot promos/offers on Car related products such as Car Perfumes, Car cleaning materials based on the event of Car parking getting filled/occupied in the store parking lots.

It is important that the historical data and the on-going operational data needs to be combined to make the operational analytics more effective; this includes various product lookups, inventory status, past promo effectiveness, capturing of events and alerting of them, etc.,   Hence for any such realtime data warehousing; the data acquisition and data integration is a critical factor for the success of the initiative.

There are different types of Data acquisition for Real time data warehousing
  • Batch oriented ETL/ELT (with near real-time)
  • EAI
  • Log-based Change Data capture approach
Each type of Data acquisition has its own Pros & Cons however EAI being able to handle either Low or Medium amount of data volume, it is always been treated as low-priority or not an ideal approach for data warehousing. 

Log based Change data capture approach is more towards a "Push" approach to deliver data from source to targets. In this approach, the changed data from the database transaction logs are captured which does not impact the performance on the source systems, which is not the case that of change data capture that uses database trigger or table scanning.  Oracle Golden Gate uses log based, CDC capabilities to enable real-time data integration and management by capturing and delivering updates of critical information as the changes occur and providing continuous synchronized data across heterogeneous environments.

ELT (Extract, Load & Transform) has been a key factor for the real-time or operational data warehousing as the transformations tends to take place in the datawarehouse. 

For Operational datawarehousing, a hybrid approach of Log-based CDC & ELT is being leveraged to consolidate data from the heterogeneous source systems into the appropriate data warehouse/datamarts. 

The solution shall be designed in such a way it offers transactional, real-time data capture using the appropriate "Push" approach, i.e., as soon as a new database transaction is committed in the source system, the data is immediately captured via the database transaction logs and loaded to the datawarehouse/staging area of the datawarehouse.  However, in this approach, the data transformation is expected to be minimal as much as possible, in fact, only basic row-level transformations are performed.  For heavy transformation needs, the solution can be integrated with appropriate ETL/ELT components to enable end-to-end solution for data integration in the data warehouse/data mart.

For example - Oracle Golden Gate can be integrated with Oracle Data Integrator (ODI) for integrating the data from source systems in a real-time data warehouse to handle data load using log based CDC with minimal transformations and leveraging the ELT capabilities of ODI for heavy transformations.

Operational data warehousing & analytics allows the users to leverage the underlying historical data and real-time transactional data to access and respond to information in real time to improve business decisions and actions. Continuous low-latency data capture and delivery infrastructure is a critical success factor for the establishment and maintenance of such real-time data warehouse.  It is becoming evident and getting proved that Organizations that leverage the most up-to-date BI in their day-to-day operations significantly improve their operational efficiency, reducing operational costs and thereby enhancing their productivity and the overall Gross Margins.

Wednesday, 8 February 2012

Is Project Management for DW-BI Projects different from traditional application development / maintenance project?

There has been many of my colleagues question me  that "Is Project Management for DW-BI Projects different from traditional application development / maintenance project?"

My personal opinion towards it is "YES" but the principles of Project Management remains the same.

The reason towards that is pretty simple,
  • Along with the Project Management responsibilities, the DW-BI projects requires a deep dive on the technical aspects of Data warehousing & Business Intelligence implementation in order to understand the scope & manage it better
  • Non-technical competent Project Manager requires a strong dependency from the Technical Lead / SME to understand the day-to-day project scope related changes
    • Impact to one column of a table can impact the ETL & Reporting scope to a larger extent
  • A complete End-to-End DW-BI implementation involves multiple tools and technologies and more importantly multiple skillsets which makes the Project Manager to be careful on resource loading and leveling while executing the project
  • Technical risks shall occur throughout the life cycle of the project (as compared to traditional application development/maintenance); continuous monitoring and proper risk management of these risks are very critical.
  • Very important to identify the dependency of tasks well in advance (for example - resolution to data quality to be determined and closed as per schedule) and managed well in the Schedule management
  • Quality Management has been challenging particularly in areas such as Configuration Management and Release Management as there are many tools getting involved (though currently many BI tools provide versioning features) and Traceability of requirements; maintennace of traceability matrix has been challenging in DW-BI projects
The above reasons clearly signifies that managing DW-BI projects requires some technical background and/or ability to appreciate technial aspects however the core 9 knowledge areas as stated in PMBOK remains the same however focus on certain knowledge areas (such as Scope Management, Risk Management, Schedule Management and Quality Management) with technical appreciation is very much needed.

Any specific thoughts??