Bigdata Ready Enterprise Open Source Software

Table of Contents

License Objective Features Demo videos        Data Ingestion        Workflow Builder        Bulk Data Manufacturing        Web Crawler Architecture Installation Operational Metadata Management System How To Contribute

License

Released under Apache Public License 2.0. You can get a copy of the license at http://www.apache.org/licenses/LICENSE-2.0.

Objective

Big Data Ready Enterprise(BDRE) makes big data technology adoption simpler by optimizing and integrating various big data solutions and providing them under one integrated package. BDRE provides a unified framework for a Hadoop implementation that can drastically minimize development time and fast track the Hadoop implementation. It comprises a reusable framework that can be customized as per the enterprise ecosystem. The components are loosely integrated and can be de-coupled or replaced easily with alternatives.

The primary goal of BDRE is to accelerate Bigdata implementations by supplying the essential frameworks that are most likely to be written from scratch. It can drastically reduce effort by eliminating hundreds of man hours in operational framework development. Big Data implementations however, require specialized skills, significant development effort on data loading, semantic processing, DQ, code deployment across environments etc.

Features

Demo Videos

Data Ingestion

RDBMS Data Ingestion

BDRE RDBMS data ingestion demo video

Streaming Data Ingestion

BDRE Twitter Ingestion demo video

Directory Monitoring and File Ingestion

BDRE File ingestion demo video

Workflow Builder

BDRE Workflow Designer demo video

Bulk Data Manufacturing

Demo video TBD

Web Crawler

BDRE Web Crawling

Architecture

image

Installation

Overview

This section will help you build BDRE from source. Audience for this document are developers and architects who want be part of BDRE framework development or may just want to evaluate it.

General Prerequisite

For testing/development purpose and to save time, use the fully loaded Hadoop VMs from Cloudera or Hortonworks because all the required software are typically installed and configured.

You should be able to do the same in Mac or Windows but note that setting up a Hadoop cluster might be tricky in Windows and might require more involvement. However to deploy and run the jobs we recommend a Linux system. BDRE is typically installed in Hadoop edge node in a multi-node cluster.

Preparation

Building BDRE from source

  1. Obtain the source code

    • cd to the home directory of openbdre.

      [openbdre@sandbox ~]# cd ~
    • Pull BDRE source from this git repository. To find out your repository link navigate to the repository in this website and copy the https repo URL.

      [openbdre@sandbox ~]# git clone https://github.com/WiproOpenSourcePractice/openbdre.git
    • cd to the cloned source dir (so you can be in /home/openbdre/openbdre)

      [openbdre@sandbox ~]# cd openbdre
  2. Database Setup

    • Execute the dbsetup.sh script without any parameters as shown below. In this example, we are going to use MySQL as BDRE backend as it's already available in the HDP Sandbox. If you would like to use another database please select it accordingly.
    [openbdre@sandbox ~]# sh dbsetup.sh
    [openbdre@sandbox openbdre]$ sh dbsetup.sh⏎
    Supported DB
    1) Embedded (Default - Good for running BDRE user interface only. )
    2) Oracle
    3) MySQL
    4) PostgreSQL
    
    Select Database Type(Enter 1, 2, 3 , 4 or leave empty and press empty to select the default DB):3⏎
    
    Enter DB username (Type username or leave it blank for default 'root'):⏎
    Enter DB password (Type password or leave it blank for default '<blank>'):⏎
    Enter DB hostname (Type db hostname or leave it blank for default 'localhost'):⏎
    Enter DB port (Type db port or leave it blank for default '3306'):⏎
    Enter DB name (Type db name or leave it blank for default 'bdre'):⏎
    Enter DB schema (Type schema or leave it blank for default 'bdre'):⏎
    Please confirm:
    
    Database Type: mysql
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection URL: jdbc:mysql://localhost:3306/bdre
    Database Username: root
    Database Password:
    Hibernate Dialect: org.hibernate.dialect.MySQLDialect
    Database Schema: bdre
    Are those correct? (type y or n - default y):y⏎
    Database configuration written to ./md-dao/src/main/resources/db.properties
    Will create DB and tables
    Tables created successfully in MySQL bdre DB
  3. Building

    • Now build BDRE using (note BDRE may not compile if the settings.xml is not passed from the command line so be sure to use the -s option. When building for the first time, it might take a while as maven resolves and downloads the jar libraries from different repositories.

      mvn -s settings.xml clean install -P hdp22
    • Note: Selecting hdp22 will compile BDRE with HDP 2.2 libraries and automatically configure BDRE with properties from databases/setup/profile.hdp22.properties . These properties can later be altered from the BDRE Settings page under Administration.

      databases/setup/profile.hdp22.properties looks like this.

      bdre_user_name=openbdre
      name_node_hostname=sandbox.hortonworks.com
      name_node_port=8020
      job_tracker_port=8050
      flume_path=/usr/hdp/current/flume-server
      oozie_host=sandbox.hortonworks.com
      oozie_port=11000
      thrift_hostname=sandbox.hortonworks.com
      hive_server_hostname=sandbox.hortonworks.com
      drools_hostname=sandbox.hortonworks.com
      hive_jdbc_user=openbdre
      hive_jdbc_password=openbdre
Building BDRE for Cloudera QuickStart VM
Similarly one should be able to build this using -P cdh52 which will configure BDRE for CDH 5.2 QuickStart VM. During building it'll pick up the environment specific configurations from /databases/setup/profile.cdh52.properties. BDRE virtually works with any Hadoop distribution including IBM's BigInsight platform in Bluemix
```shell
$ mvn -s settings.xml clean install -P hdp22
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
    .......blah blah.........
    .......blah blah.........
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:39.479s
[INFO] Finished at: Wed Dec 30 01:50:02 PST 2015
[INFO] Final Memory: 127M/2296M
[INFO] ------------------------------------------------------------------------
```
  1. Installing BDRE

    • After building BDRE successfully run

      sh install-scripts.sh local
    • It'll install the BDRE scripts and artifacts in /home/openbdre/bdre

Using BDRE

 sudo service bdre start

Creating, Deploying and Running a Test Job

Operational Metadata Management System

BDRE provides complete job/operational metadata management solution for Hadoop. At its core acts as a registry and tracker for different types of jobs running in different Hadoop clusters or as a standalone. It provides APIs to integrate with virtually any jobs.

image

BDRE uses RDBMS database to store all job related metadata. A set of stored procedures are there to interface will the tables which are exposed via Java APIs to manage/create/update the static and run time metadata information. Below is the data model for BDRE metadata operational database.

eer

How to Contribute

Contribution for the enhancements in BDRE are welcome and humbly requested by us. To contribute, please navigate to our GitHub project page and fork BDRE main repository under your own account. You can make changes to your own forked repository and then open a Pull Request to merge your change with the main repo.

Goto BDRE@GitHub

git clone "https://github.com/WiproOpenSourcePractice/openbdre.git"
cd openbdre
git remote add myrepo https://<your id>:<your password>@github.com/<YOUR ACCT NAME>/openbdre.git
git checkout -b mybranch
git commit -am "My changes"
git push myrepo mybranch
git checkout develop
git pull origin develop
git checkout mybranch
git merge develop
git push myrepo mybranch

Developed using Intellij Idea Built with Intellij Idea

Analytics