Flamingo

Introduction

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google’s Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require. Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the “Drillbit” service which is responsible for accepting requests from the client, processing the queries, and returning results to the client. When a Drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes.

Drill Architecture:

User: Providing interfaces such as a command line interface (CLI), a REST interface, JDBC/ODBC, etc., for human or applicationdriven interaction.
Processing: Allowing for pluggable query languages as well as the query planner, execution, and storage engines.
Data sources: Pluggable data sources either local or in a cluster setup, providing in-situ data processing.

Note that Apache Drill is not a database but rather a query layer that works with a number of underlying data sources. It is primarily designed to do full table scans of relevant data as opposed to, say, maintaining indices. Not unlike the MapReduce part of Hadoop provides a framework for parallel processing, Apache Drill provides for a flexible query execution framework, enabling a number of use cases from quick aggregation of statistics to explorative data analysis.

Apache Drill can query data residing in different file formats (CSV, TSV, JSON, PARQUET, AVRO) and in different data sources (Hive, HBASE, HDFS, S3, MongoDB, Cassandra, and others). Drill provides a unified query layer that can interact with different file formats in different data sources thus avoiding any ETL necessary to bring data in different places to one location.

Use Case

This use case is based on financial stock data that’s been downloaded in parts to multiple files in different formats such as JSON, CSV, and TSV to demonstrate how Apache Drill can easily infer the schema of these files and enable us to perform joins and aggregations easily. The next blog in this series will persist these files in different data stores such as S3, MongoDB, and HDFS and demonstrate Drill’s capability on querying them on the fly.

What we want to do:

Prerequisites
About Data Files
Self Describing Data Exploration

Solution

Prerequisites

Install Apache Drill in Embedded Mode: We will be using Drill 0.7 version and installing it on Linux is very easy. Follow the instructions on at the below link to download and install.

Note: Drill requires JDK 1.7 and above.

wget http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz
tar xf apache-drill-0.7.0.tar.gz

Verify Drill Set up:

cd apache-drill-0.7.0
bin/sqlline -u jdbc:drill:zk=local
show databases;

About Data Files

Understanding Data Files:

The required data files are tailor made to demonstrate the power of Apache Drill and attached in this post. Copy over these files and store it in a separate directory “data”

energy_overview.tsv: Tab separated file that contains basic details of a ticker along with its name, sector and sub-sector. A quick look at the file is as follows
energy_technical.json: A JSON file that contains technical information about the stocks present in energy_overview.tsv. These information include ATR (Average True Range), different standard moving average ranges, RSI (relative strength index), 52 week high/low percentages. Please note that SMA and W52High/Low values are in percentages.
stock_data.csv: A comma separated value of stock and other basic information of tickers in energy_overview file. The headers are removed so that Drill doesn’t consider them as data. This file contains information about the below columns:

Status,Name,Symbol,LastPrice,Change,ChangePercent,Timestamp,MarketCap,Volume,ChangeYTD,ChangePercentYTD,High,Low,Open

Self Describing Data Exploration

Apache Drill can seamlessly infer the schema from different files in different format and enable us to join these files with ANSI SQL. Apache Drill’s self describing data exploration behavior becomes more powerful when these data files that are of different formats persisted in different distributed data stores such as Hive, HBase, HDFS, S3, MongoDB, Cassandra, and local file systems.

Drill’s Query Semantics: Drill uses specific semantics in the From clause of a query to find out where each data files are located. Drill can be configured with specific prefix to indicate different data stores that includes local file system, HDFS, MongoDB, Hive, HBase, Cassandra, and etc.

Basic Queries: Show first 5 rows from all data files

cd /opt/drill/apache-drill-0.7.0
bin/sqlline -u jdbc:drill:zk=local

SELECT * FROM dfs.`/opt/drill/data/energy_overview.tsv` limit 5;
SELECT * FROM dfs.`/opt/drill/data/energy_technical.json` limit 5;
SELECT * FROM dfs.`/opt/drill/data/stock_data.csv` limit 5;

Simple Join: Joins all 3 files with ticker symbol and retrieves ticker, name, sector, subsector, MarketCap, Volume, and moving averages. JSON data have built-in schema available but to reference data in TSV and CSV, column positions can be used as in array.

select eo.columns[0] as Ticker, eo.columns[1] as Name, eo.columns[2] as Sector, eo.columns[3] as SubSector, sd.columns[7] as MarketCap, sd.columns[8] as Volume, et.SMA20, et.SMA50, et.SMA200
from 
dfs.`/opt/drill/data/energy_overview.tsv` eo,
dfs.`/opt/drill/data/stock_data.csv` sd,
dfs.`/opt/drill/data/energy_technical.json` et
where
eo.columns[0] = sd.columns[2] and
et.ticker = eo.columns[0] limit 10;

Sum of Volume Traded by each Subsector: Drill allows us to perform casting to different data types as we see fit.

select eo.columns[3] as subsector, sum(cast(sd.columns[8] as int)) as total_volume
from  
dfs.`/opt/drill/data/energy_overview.tsv`  eo,
dfs.`/opt/drill/data/stock_data.csv` sd
where eo.columns[0] = sd.columns[2]
group by eo.columns[3];

Top 10 companies in the order of high trading volume:

select eo.columns[0] as Name, cast(sd.columns[8] as int) as volume
from  dfs.`/opt/drill/data/energy_overview.tsv`  eo, dfs.`/opt/drill/data/stock_data.csv` sd
where eo.columns[0] = sd.columns[2]
order by cast(sd.columns[8] as int) desc limit 10;

Top companies that closed higher than their open:

select eo.columns[0] as Name, cast(sd.columns[3] as float) as current_price, cast(sd.columns[13] as float) as open_price
from  
dfs.`/opt/drill/data/energy_overview.tsv`  eo,
dfs.`/opt/drill/data/stock_data.csv` sd
where eo.columns[0] = sd.columns[2] and
cast(sd.columns[3] as float) > cast(sd.columns[13] as float)
order by cast(sd.columns[3] as float) desc limit 10;

Top companies that are overbought in recent time:

select eo.columns[0] as Name, cast(et.RSI as float) as RSI
from  dfs.`/opt/drill/data/energy_overview.tsv`  eo, dfs.`/opt/drill/data/energy_technical.json` et
where eo.columns[0] = et.Ticker and cast(et.RSI as float) > 65
order by cast(et.RSI as float) desc limit 10;

Top companies that are oversold in recent time:

select eo.columns[0] as Ticker, eo.columns[1] as Name, cast(et.RSI as float) as RSI
from  dfs.`/opt/drill/data/energy_overview.tsv`  eo, dfs.`/opt/drill/data/energy_technical.json` et
where eo.columns[0] = et.Ticker and cast(et.RSI as float) < 35
order by cast(et.RSI as float) desc limit 10;

Top 10 companies that are trading 20% closer to their 52 week high:

select eo.columns[1] as Name, sd.columns[3] as LastPrice, concat(et.W52High,'%') as W52High
from  dfs.`/opt/drill/data/energy_overview.tsv`  eo, dfs.`/opt/drill/data/energy_technical.json` et,
dfs.`/opt/drill/data/stock_data.csv` sd
where eo.columns[0] = sd.columns[2] and eo.columns[0] = et.Ticker and  cast(et.W52High as float) > -20
order by et.W52High desc limit 10;

Top 10 companies that are trading 80% lower to their 52 week high:

select eo.columns[1] as Name, sd.columns[3] as LastPrice, concat(et.W52High,'%') as W52High
from  dfs.`/opt/drill/data/energy_overview.tsv`  eo, dfs.`/opt/drill/data/energy_technical.json` et,
dfs.`/opt/drill/data/stock_data.csv` sd
where eo.columns[0] = sd.columns[2] and eo.columns[0] = et.Ticker and 
cast(et.W52High as float) < -80
order by et.W52High desc limit 10;

Conclusion

Why Drill: Get started fast, schema-free JSON model, Query complex, semi-structured data in-situ, Real SQL – not “SQL-like”, leverage standard BI tools, access multiple data sources, custom user defined functions, high performance, and scales from a single laptop to a 1000-node cluster.
Drill can be installed in embedded or distributed mode depending on the need.
No need to define schema upfront or perform ETL activity to run analytics on different datasets persisted in different datastores.

References

Apache Drill: http://drill.apache.org/
Drill Architecture: http://drill.apache.org/architecture/
Files: Click here to download data_files.zip

Introduction

SoapUI, a free and open source cross-platform Functional Testing tool implemented in Java is very stable and robust providing an easy-to-use graphical interface. SoapUI’s key features include Functional Testing, Service Simulation, Security testing, Load Testing, Technology Support, Automation, Analytics, Recording and Ecosystem.

SoapUI works very well with Soap and REST web services:

Web service regression testing
Web service test automation
Web service load testing

Let’s discuss about API Functional Testing in this blog.

Use Case

Problem definition

Let’s say your project has several web service APIs through which its data can be used on web applications. Today, most of the APIs are based on web services due to their ease of use, security and performance.

Testing web services is difficult, time consuming and error prone without a proper testing tool. The regression and load testing for a web service are impossible in these cases and testing becomes a manual recursive procedure without a chance for automation.

The SOAP UI testing tool solves these issues.

Solution

Pre-requisites

Download and install the compatible SoapUI from http://www.soapui.org/
Install JDK 1.6+

Create a new REST service in SoapUI using the graphical interface

Open SoapUI
Go to File -> Select New REST Project
Type a webservice URL that is used in your project. Here is a sample URL – http://localhost:8080/restful/getdetails
Click OK, a Request window opens that has
- Method: Get/POST
- Endpoint: Host of the webservice (http://localhost:8080)
- Resource: Name of the webservice
- Parameters: If the webservice has field and values then it is shown in Parameters
Click the green button to run the webservice. The screenshot below illustrates the JSON response for the above webservice:

Headers button at the bottom of the window shows the status of the HTTP request, content-type, data etc.,

Create TestCase for the above webservice

Click the ‘+’ button next to STOP button
It prompts to create a TestSuite, TestCase and TestStep. For the service mentioned above, we need to create a TestSuite in the project initially and then add this testcase to the testsuite

Once a TestCase for the service is created, a new window appears with details such as endpoint, resource/method similar to creating a new REST service

Validating the service responses

Next, validate the webservices response using Assertions.

Now, before proceeding with the validation process, let’s understand about Assertions.

Assertions are used to validate the message received by a TestStep during execution, usually comparing parts of the message (or the entire message) to some expected value. We can add any number of assertions to a TestStep. During execution, if any TestStep fails, the status is marked as “failed” in the TestCase view and a corresponding FAILED entry is shown in the Test Execution Log.

There are two ways to validate the service response:

Add Assertions to the REST test step
Get the service response using Groovy script and add Assertions to the response

Adding Assertions to the REST test step

At the bottom of TestStep window, appears a button labeled “Assertions”.

SoapUI supports many assertions such as contains, validating HTTP status code, Xpath, script assertions etc.

Let’s understand the basic assertions – contains and validating HTTP status code.

Click Assertions button and click + button, to add assertions to the TestStep
In Add Assertion window, select Property Content then select Contains and click Add button
Check for the content db_healthy and ignore the case in comparison
If the JSON response has a content db_healthy then the Assertion status turns Green which is a passed testcase, otherwise the testcase status has failed and appears Red

The status of assertion is:

Get the service response using Groovy script and add Assertions to the response

SoapUI has many TestSteps like REST Test Request, HTTP Test, Groovy script, JDBC Request, Properties etc.,

Let’s talk about Groovy Test Request:

Right click TestCase, select Add Step and then select Groovy script

The screenshot below illustrates the Groovy script to validate the HTTP status using JsonSlurper class for JSON response highlighting the assertions used to validate the service:

Conclusion

SoapUI, the world’s leading and most complete SOA and web services testing tool allows you to rapidly create and execute automated functional, regression, compliance, and load tests at ease
It offers a complete test coverage and supports all the standard protocols and technologies in a single test environment
SoapUI is an essential tool for web service testing similar to MS Excel that facilitates creating spreadsheets

References

SoapUI: https://www.soapui.org/apidocs/index.html?overview-summary.html

Introduction

Nagios

Nagios is an Open Source monitoring tool that provides a comprehensive monitoring environment. It monitors the entire IT infrastructure such as servers, switches, applications, network and services sending alert messages about issues and system recovery.

Technical benefits of Nagios

Monitors our entire IT infrastructure
Enables problem forecasting
Generates immediate alerts on issues
Facilitates Information sharing with stakeholders
Minimizes downtime and business loss

System Monitoring Tool Voting Results

The screenshot below illustrates the output of the system monitoring tool – “voting results”

Use case

This use case details the procedures to setup Nagios monitoring and Nagios plug-in installations with Puppet services.

Prerequisites

Install & Configure Puppet
Configure Nagios using Puppet

Solution

Install & Configure Puppet

Numerous blog resources provide information on Puppet installation and configuration procedures.

Click the links below to learn more:

http://terokarvinen.com/2012/puppetmaster-on-ubuntu-12-04

https://www.digitalocean.com/community/tutorials/how-to-install-puppet-to-manage-your-server-infrastructure

Configure Nagios using Puppet

#puppet resource package puppetdb ensure=latest

Create /etc/puppet/puppetdb.conf

# vi /etc/puppet/puppetdb.conf
[main]
server = ubuntu.server.com 
port = 8081
soft_write_failure = false

Add to /etc/puppet/puppet.conf

# vi /etc/puppet/puppet.conf
[main]
storeconfigs = true
storeconfigs_backend = puppetdb

Create /etc/puppet/routes.yaml

# vi /etc/puppet/routes.yaml
master:
facts:
terminus: puppetdb
cache: yaml

#service puppetdb start 
#service puppetmaster restart

# vi nagios/manifests/init.pp

class nagios {
    include nagios::install
    include nagios::service
    include nagios::import
}

# vi nagios/manifests/install.pp

class nagios::install {
    		package { [ 'nagios', 'nagios-plugins', 'nagios-nrpe-plugin' ]:
        		ensure  => present,
    }
}

# vi nagios/manifests/service.pp

class nagios::service {

    exec { 'fix-permissions':
        command    => "find /etc/nagios/conf.d -type f -name '*cfg' | xargs chmod +r",
        refreshonly  => true,
    } 

    service { 'nagios':
        ensure      	=> running,
        enable      	=> true,
        require    	 => Class[ 'nagios::install' ],
    }
}

# vi nagios/manifests/import.pp

class nagios::import {

    Nagios_host <<||>> {
        require 	=> Class[ 'nagios::install' ],
        notify  	=> Class[ 'nagios::service' ]
    }

    Nagios_service <<||>> {
        require 	=> Class[ 'nagios::install' ],
        notify  	=> Class[ 'nagios::service' ]
    }
}

#vi nagios/manifests/nrpe.pp

class nagios::nrpe {

    package { [ 'nagios-nrpe-server', 'nagios-plugins' ]:
        ensure      => present,
    }

    service { 'nagios-nrpe-server':
        ensure      => running,
        enable      => true,
        require     => Package[ 'nagios-nrpe-server', 'nagios-plugins' ] ],
    }
}

# vi nagios/manifests/export.pp

class nagios::export {

    @@nagios_host { $::hostname :
        ensure              	=> present,
        address            	 => $::ipaddress,
        use                	 => 'generic-host',
        check_command      => 'check-host-alive',
        hostgroups          	=> 'all-servers',
        target             	 => "/etc/nagios/conf.d/${::hostname}.cfg",
    }

    @@nagios_service { "${::hostname}_check-load":
        ensure              	=> present,
        use                 	=> 'generic-service',
        host_name           	=> $::hostname,
        service_description => 'Current Load',
        check_command      => 'check_nrpe!check_load',
        target              	=> "/etc/nagios/conf.d/${::hostname}.cfg",
    }

    @@nagios_service { "${::hostname}_check-disk":
        ensure              	=> present,
        use                 	=> 'generic-service',
        host_name           	=> $::hostname,
        service_description => 'Disk Usage,
        check_command      => 'check_nrpe!check_hda1',
        target              	=> "/etc/nagios/conf.d/${::hostname}.cfg",
    }
}

# vi manifests/nodes.pp

node default {
    include nagios::nrpe
    include nagios::export
}

Restart the puppetmaster and puppetdb service

Finally, apply the puppet

# puppet apply nodes.pp

Check http://localhost/nagios – It works…

Conclusion

Nagios provides an insight into our network performance and availability. It enables a fast response to errors in our network, and most importantly quickly notifies errors even before others sense the issues. It’s a very good example to understand how the Open Source community can assist you in network management.

References

Refer https://exchange.nagios.org/directory/Tutorials/Nagios-Core-Tutorials for any further information.

Introduction

Big Data is booming and many businesses have started embracing it to stay competitive but there is still lot of misconception that Hadoop which is being used interchangeably with Big Data is the silver bullet and the only solution. It’s hard to talk about Big Data without 3 V’s (Volume, Velocity, and Variety) but we will keep it short. Big Data or may be called as Hyper Data or Hybrid Data has either one or combination of V’s. It’s important for businesses to look beyond the Hadoop hype and figure out what tools and tech stack they need to fit their specific business problems. Not all businesses have Facebook and Yahoo volume problems.

Unfortunately, the complexity of Hadoop ecosystem and associated skills shortage are the two main impediments for many businesses to adopt Big Data. This blog post tries to put out some facts about Hadoop, why it is not the only Big Data solution, why businesses need to think about a different tech stack that doesn’t need a Zoo, and the tech stack that we have architected to address Big Data variety challenge.

Hadoop Facts

“Apache Hadoop, by all means, has been a huge success on the open source front. Thousands of people have contributed to the codebase at the Apache Software Foundation, and the Hadoop project has spawned off into dozens of happy and healthy Apache projects like Hive, Impala, Spark, HBase, Cassandra, Pig, Tez, Ambari, and Mahout. Apart from the Apache Web Server, the Apache Hadoop family of projects is probably the ASF’s most successful project ever.”

“It’s not that Hadoop is just an immature technology – rather, it’s unsuitable for many mainstream Big Data projects.”

“There is no doubt that some companies have gotten great results out of Hadoop and are using it to hammer petabytes of less-structured data into usable insights. But these success stories are predominantly relegated to either the biggest firms in their respective industries, or well-funded startups looking to leverage new Internet business models to disruptive existing industries. By and large, it hasn’t trickled down into the marketplace as a whole, at least not yet.”

“Hadoop will disappear just like other underlying database technologies have disappeared, Cloudera’s chief strategy officer Mike Olson says. Hadoop distributors hope that predictive analytics via machine learning becomes a requirement and that Hadoop-powered analytics get built in and integrated with other offerings. Fast new frameworks, like Apache Spark, can abstract away the complexity and allow organizations to use big data analytic systems without becoming data scientists or brilliant architects themselves. But even if Spark and the rest help abstract away some of the underlying complexity, the complexity is still there under the covers.”

“Despite considerable hype and reported successes for early adopters, 54% of survey respondents report no plans to invest at this time,” said Nick Heudecker, research director at Gartner. “Furthermore, the early adopters don’t appear to be championing for substantial Hadoop adoption over the next 24 months. In fact, there are fewer who plan to begin in the next two years than already have.”

“Only 26 percent of respondents claim to be deploying, piloting or experimenting with Hadoop, while 11 percent plan to invest within 12 months and seven percent are planning investment in 24 months. Responses pointed to two interesting reasons for the lack of intent. First, several responded that Hadoop was simply not a priority. The second was that Hadoop was overkill for the problems the business faced, implying the opportunity costs of implementing Hadoop were too high relative to the expected benefit.”

“A source from Gartner says, Skills gaps continue to be a major adoption inhibitor for 57 percent of respondents, while figuring out how to get value from Hadoop was cited by 49 percent of respondents. The absence of skills has long been a key blocker. While tools are improving, they primarily support highly skilled users rather than elevate the skills already available in most enterprises.”

“71% of Data Scientists feel Taming Data Variety is proving to be more important than Volume according to Paradigm4 research and Hadoop only takes you so far. Hadoop was unrealistically hyped as universal and disruptive Big Data solution, it is a technology not a solution. This is causing concerns for companies to embark on Big Data opportunities.”

No Elephant Big Data Tech Stack

Component	Description
Mediation/Routing/Messaging	Apache Camel DSL is used for mediation/routing logic to invoke appropriate input adapter in the Ingestion Layer via Kafka & ActiveMQ message queue
Ingestion	Selenium with Jsoup is used to scrape interactive websites with multiple depth levels to get data. Camel protocol & data adapters are used to ingest JSON/XML/CSV/TSV, and multiple custom plugin based adapters are written to process Mainframe/DB2 data files
Aggregation/Transformation Process	Talend is used to perform lot of transformation process, R/Python for statistical aggregation calculations, and other custom adapters to perform user defined functions (UDF)
Security	All Layers goes thru LDAP authentication for tighter access control list
Data Stores	ElasticSearch is used as metadata repository that contains ontologies, mapping, and transformation rules. Neo4j contains nodes and relationships between business entities and data records. MongoDB contains unstructured scrapped content and geospatial documents that are transformed by process layer. Cassandra contains timeseries tuples of pre-computed data
Product	Spring based Java REST API that interacts with polygot data stores using Spring XD and other custom adapters to get graph relations, metadata, and timeseries. A frontend app built with AngularJS, Highcharts, and D3.js for visualization and other user based preferences.
Data Science Tools	A revamped version of OpenRefine is used for Analysts to analyze the raw source data. A custom analytical app built using Shiny + R is used to provide visualizations and different statistical calculations for Analysts/Data Scientists to come up with appropriate computation to apply on the raw source data.

Conclusion

Hadoop is a great technology and one of the main catalysts that enabled businesses to adopt Big Data to solve interesting problems. However, it is not THE ONLY solution for all Big Data challenges and opportunities.
More and more businesses are facing challenges with taming data variety and should think of the tech stack that best suit their needs.
Treselle Systems has been voted as one of the top 25 most promising Big Data vendors of 2015 by Outsourcing Gazette Magazine and has good expertise with Big Data from strategy to design to implementation to deployment.

References

Introduction

Puppet

Puppet is an open-source tool that assists system administrators to manage server configurations. Puppet makes automation easy and simplifies standardizing configurations across multiple Linodes or other servers for both the front end and the backend.

Why use Puppet?

Saves time for deployment
Avoids repetitive tasks
Manages Physical and Virtual devices including cloud
Maintains System Consistency and Integrity

Use case

This use case explains how to create AWS EC2 Instance through AWS-CLI commands and Setup LAMP (Apache, MySQL and PHP) services using Puppet.

Prerequisites

Install & Configure Puppet
Setup AWS CLI tools
Create AWS EC2 instance with Puppet
Install LAMP services with Puppet

Solution

Install & Configure puppet

Extensive blog resources on how to install & configure Puppets are available.

To explore more, follow the links below:

http://terokarvinen.com/2012/puppetmaster-on-ubuntu-12-04

https://www.digitalocean.com/community/tutorials/how-to-install-puppet-to-manage-your-server-infrastructure

Setup AWS CLI

Our previous blog post “Install Amazon CLI Tools” detailed the installation procedures. Click the link below for further reference:

http://www.treselle.com/blog/install-amazon-cli-tools/

Create EC2 Instances with Puppet

To create EC2 instance, follow the steps below:

Create a key pair
Create a Security group
Create EC2 instance with ensure puppet service

Create a key pair

# ec2-create-keypair test-key-pair

Grant permission

# sudo chmod 600 test-key-pair.pem

Create security groups

Create “test-security-group” for AWS EC2 with the following command:

# ec2-create-group test-security-group -d “Security Group"

Follow the commands below to add security rules SSH-22 and RDP-3389 into test-security-group:

# ec2-authorize test-security-group -p 22 -s 0.0.0.0/24

# ec2-authorize test-security-group -p 3389 -s 0.0.0.0/24

Install puppet service

Let us create a bash script to install puppet service.

#!/bin/bash
set -e -x
apt-get update && apt-get -y upgrade 
# Find the current IP of the puppet slave and add the entry in /etc/hosts file
puppet_slave_ip=$(host ec2-2 | grep "has address" | head -1 | awk '{print $NF}')
echo $puppet_slave_ip ubuntu.server.com >> /etc/hosts
apt-get -y install puppet 
sed -i /etc/default/puppet -e 's/START=no/START=yes/'
service puppet restart

Next, create EC2 instance with the command below:

# ec2-run-instances ami-XXXXX -t m1.medium -k test-key-pair.pem -g test-security-group --user-data-file start_puppet.sh

Finally, we have succeeded in creating a new EC2 instance!!!!

Setup LAMP on EC2 instance through Puppet service

Step 1: Create a module under puppet configuration and install the lamp server.

Step 2: Follow the commands below to create a manifest file:

# cd /etc/puppet/modules
# mkdir -p lamp/manifests
# vi lamp/manifests/init.pp
# First update the packages through apt-get 
class lamp {
exec	{
		'apt-update':
			command => '/usr/bin/apt-get update'
}

#Install the apache2 service
package { 
		'apache2':
			require	=> exec['apt-update'],
			ensure	=> 'installed'
		}

# Running the apache2 service
service	{
		'apache2':
			ensure	=> 'running' 
}

#Install the mysql server
$password = "password123"
package{
			'mysql-server':
			require	=> exec['apt-update']
			ensure	=> 'installed'
			}

# Running the mysql-server service
service	{
		'mysql-server':
			enable	=> ‘true’,
			ensure	=> 'running' ,
			require 	=> Package['mysql-server']
}
exec 	{ 	
		'Set MySQL-Server root password':
			unless => "mysqladmin –u root -p$password status",
                			path 	=> 	['/bin', '/usr/bin'],
                			command => 	"mysqladmin –u root password $password",
                			require 	=> 	service['mysqld']
        }

# Install php service
package 	{
			'php5':
				require	=> exec['apt-update']
				ensure	=> 'installed'
}

# Add the info.php file
file	{
			'/var/www/html':
				ensure	=> 'file'
				content	=> '<?php phpinfo(); ?>',
				require => package['apache2']
}
}

3.9 Using a module in main manifest

# vi /etc/puppet/manifests/site.pp

node default {} 
node (ec2-XX-XX-XX-XX.compute-1.amazonaws.com)     
{ 
include lamp 
}

Now apply the puppet

# puppet apply –test 

Info: Caching certificate for ec2-XX-XX-XX-XX.compute-1.amazonaws.com
Info: Caching certificate_revocation_list for ca 
Info: Caching certificate for ec2-XX-XX-XX-XX.compute-1.amazonaws.com
Info: Retrieving pluginfacts Info: Retrieving plugin 
Info: Caching catalog for ec2-XX-XX-XX-XX.compute-1.amazonaws.com
Info: Applying configuration version '1415506032' 
Notice: /Stage[main]/Lamp/Exec[apt-update]/returns: executed successfully 
Notice: /Stage[main]/Lamp/Package[php5]/ensure: ensure changed 'purged' to 'present' 
Error: /Stage[main]/Lamp/Service[mysql]: Could not evaluate: Could not find init script or upstart conf file for 'mysql' 
Notice: /Stage[main]/Lamp/Package[mysql-server]/ensure: ensure changed 'purged' to 'present' 
Notice: /Stage[main]/Lamp/File[/var/www/html/info.php]/ensure: defined content as '{md5}d9c0c977ee96604e48b81d795236619a' 
Info: Creating state file /var/lib/puppet/state/state.yaml Notice: Finished catalog run in 25.34 seconds

Now, the puppet has been successfully applied.

Next, let’s check whether LAMP has been installed on EC2 instance.

Open browser and check http://ec2-XX-XX-XX-XX.compute-1.amazonaws.com

Check the PHP version

Check the MySQL services.

Good job! You have successfully setup a LAMP with Puppet! Happy Puppeting…

Conclusion

EC2 instance can be created through AWS-CLI commands
Amazon provides AWS-CLI commands that can manage the entire EC2 instance process in AWS Management console
Puppet is a remarkable tool for automation and deployment in IT infrastructure

References

To know more about CLI Commands, follow the link below:

http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/command-reference.html

To learn more about Puppet, click the links below:

Introduction

Most of the Client-Server web applications use REST API that can be accessed using the URI. Gatling is one of the open source performance testing tool used to test the REST API’s performance. Gatling generates HTML reports from test execution that includes several graphs, breaking down the performance across different metrics, including global sessions and per-API request.

Use Case

Problem Definition

API performance measured manually leads to inaccuracy and erroneous results. Automation of such performance tests would resolve these issues. Gatling enables simulation of multiple concurrent users and API inputs. Testing the performance on a single user would be inadequate.

Solution

Gatling Test Plan for the Restful API

Add Simulation information Package name and Class name
Add Encoding format, HTTP Request Defaults, Sampler HTTP Request for REST
Add CSV file and configure the Feeders
Multi – Scenarios Simulation (User Simulations)
Simulation Setup with Injection for the Scenarios
Run the Simulation Script and view HTML Report

The following steps explain configuring Gatling and using it for Performance testing of Restful API:

Add Simulation information

As a first step, use the recorder batch file to create the simulation with the package name and the class name.

Set Encoding format

Encoding is used during compiling simulations and build requests. Configure the proper encoding in the gatling.conf file.

Add “HTTP Request Defaults” Config element

Create a Header.scala file in the following location:

“gatling-charts-highcharts-2.0.1\user-files\simulations\rest_api_performance_test”.

In this file add the Domain name or IP address of the web server.

Creating\Adding CSV file and Configuring the Feeders

Gatling uses CSV test data dynamically. The various advantages of using CSV file are listed below:

Multiple data objects for multiple recursions can be stored in a single file
To update data values, .scala file need not be updated to alter the CSV file

This expected file should be placed in the data directory in Gatling distribution. This location can be overridden in the gatling.conf file.

REST API is added with the authid in the csv file.

The following sample illustrates how to save csv file with the name service.csv

The next step is to configure the Feeder in the simulation. Create the feeders.scala file in the following location:

“gatling-charts-highcharts-2.0.1\user-files\simulations\rest_api_performance_test”

Every time a virtual user reaches this step, it will pop a record out of the Feeder that will be injected into the user’s session resulting in a new session instance.

Create multiple User Scenarios

We create multiple scenarios in the given location with the extension .scala:

“gatling-charts-highcharts-2.0.1\user-files\simulations\rest_api_performance_test”

The screenshot below illustrates the scenario with the name “detailPage.scala”:

Simulation Setup with Injection for the Scenarios

Now, we will define the load to scenarios. These load scenarios will be injected in the server. Inject method is used to define the server load. This method takes sequence of injection steps as arguments that will be processed sequentially.

Run the Simulation Script and view HTML Report

Run the simulation with the help of gatling.bat file available in the gatling-charts-highcharts-2.0.1\bin folder. The execution results are available in the “gatling-charts-highcharts-2.0.1\results” folder.

Conclusion

API services can be simulated using Gatling with the load pretty close on every deployment in the production at ease
Nevertheless, we can also monitor the performance of the individual API services and group of services which helps to maintain the product’s performance
Reports generated by Gatling are simple and can be maintained easily

Reference

Simulation Setup: http://gatling.io/docs/2.0.0-RC2/general/simulation_setup.html
Configuration of Virtual users: http://gatling.io/docs/2.0.2/advanced_tutorial.html?highlight=configure%20virtual%0users
Reports in Gatling: http://gatling.io/docs/2.0.2/general/reports.html?highlight=reports

Introduction

Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive projects on the web especially for mobile, making front-end web development job faster and a lot easier. But, how many of us know the feature benefits of Bootstrap? Hence, this article is aimed to unveil those secrets of Bootstrap’s extensive capabilities. It is a handy tool for techies of varying skill levels, for devices of all shapes and sizes, and for projects of all sizes.

Bootstrap’s architectural framework is based on HTML and CSS design templates making it useful for typography, forms, buttons, tables, navigation, modals, image carousels and many other including optional JavaScript plugins. Bootstrap’s ability to easily create responsive designs is yet another interesting feature.

Use case

There might be programming contexts when you intend to use your own custom CSS and JavaScript code to develop sites or application, without even realizing the mighty capabilities of bootstrap which could have eased your task thus saving time.

Certain things are mandatory or optional before you begin to create the website – five columns layout, hover dropdown, column ordering and so on. Now, here is the good news!!! You need not create a custom code for all these tasks. We shall provide the readymade bootstrap code thus helping you to save your valuable development time.

The prime motive of this article is to share our learned knowledge about bootstrap and exhibit the framework’s capabilities:

Five columns Layout
How to Enable Bootstrap 3 Hover Dropdowns
Don’t forget Container Fluid for Full Width Rows
Column ordering
Labels for Screen Readers
No Gutter Column

Solution

Five columns Layout

Bootstrap does not provide any grid system by default that allows us to create five columns layout, but I can assure its simplicity.

This is the most advanced solution since it works smoothly with Bootstrap 3. Reusing classes conjointly with the current Bootstrap classes for responsive design is possible.

Initially, you need to create a default column definition in a similar method as Bootstrap does it. Call them col-*-5ths to avoid chaos with other names. Next, define the width of new classes, if media queries are different.

CSS:

Add this to your global stylesheet, or even to the bottom of your Bootstrap.css document.

.col-xs-5ths,
.col-sm-5ths,
.col-md-5ths,
.col-lg-5ths {
    position: relative;
    min-height: 1px;
    padding-right: 10px;
    padding-left: 10px;
}
.col-xs-5ths {
    width: 20%;
    float: left;
}
@media (min-width: 768px) {
    .col-sm-5ths {
        width: 20%;
        float: left;
    }
}
@media (min-width: 992px) {
    .col-md-5ths {
        width: 20%;
        float: left;
    }
}
@media (min-width: 1200px) {
    .col-lg-5ths {
        width: 20%;
        float: left;
    }
}

Sample HTML:

For instance, if you want to create a div element typical to a five column layout on medium screens and two columns on smaller ones, just use a similar code below:

<div class="row">
    <div class="col-md-5ths col-xs-6">
       ...
    </div>
</div>

How to Enable Bootstrap 3 Hover Dropdowns

The default navigation behaviour of Bootstrap does not have out-of-the-box hover dropdowns and the mobile menu is pretty boring. Still, if you prefer using it, here are some tricks to fine tune without complete customization.

Some clients, sites, or apps will expect you to have hover dropdowns on desktop. As seen below, this is never available out-of-the-box with Bootstrap 3.

You can easily add hover dropdowns with the following CSS:

@media only screen and (min-width : 768px) {
    /* Make Navigation Toggle on Desktop Hover */
    .dropdown:hover .dropdown-menu {
        display: block;
    }
}

The CSS makes the dropdowns show all independently without leading you to the parent link on mouse click. Though not the most ideal solution, the following javaScript (jQuery) may help you achieve desired results:

$('.dropdown-toggle').click(function() {
    var location = $(this).attr('href');
    window.location.href = location;
    return false;
});

Don’t forget Container Fluid for Full Width Rows

During the first instance of using Bootstrap 3, we did not use the container-fluid class for full width rows. If that was our requirement, then we would have simply omitted the container. This makes sense and the grid system tends to work without it. But, it is problematic because Bootstrap’s row class has a -15px left and right margin.

Please find an example below to know more:

Note: You may view in a full window to understand better.

Column ordering

Alter the order of the grid columns with .col-md-push-* and .col-md-pull-* classes:

The push class will move columns to the right while the pull class will move columns to the left.

The order of the columns in your HTML mark-up should be as expected in mobile displays. This also signifies that the pushing and pulling is done on the larger desktop views.
Let’s study the following example:

We have two columns of equal width which will be pushed and pulled on sm or larger viewports.

<div class="row">
  <div class="col-sm-6 col-sm-push-6">
    <!-- Column A -->
  </div>
  <div class="col-sm-6 col-sm-pull-6">
    <!-- Column B -->
  </div>
</div>

On the larger desktop views (sm or larger) the columns are pushed and pulled.

On the mobile view (xs) the columns are fetched in the natural order of the markup.

Labels for Screen Readers

Using placeholders for your input fields is usually the best practice to even add a hidden label for the input, so that screen readers can read the page.

The following example helps us to understand how to set that up fast with Bootstrap:

<label for="id-of-input" class="sr-only">My Input</label>
<input type="text" name="myinput" placeholder="My Input" id="id-of-input">

No Gutter Column

Bootstrap facilitates customizing and compiling your own build based on your requirements. This is anything from colors, container sizes, and to gutter padding size. However, sometimes you might encounter an instance where you just want a single row without padding. In most cases, you will just select the column individually and get rid of the padding with CSS. You can build your own utility class helper to achieve this. This utility class will cover all column sizes and still supports full responsiveness.

Take a look at CSS snippet below:

.no-gutter > [class*='col-'] {
    padding-right:0;
    padding-left:0;
}

And this is how you can use it in your HTML:

<div class="row no-gutter">
    <div class="col-md-4">
        ...
    </div>
    <div class="col-md-4">
        ...
    </div>
    <div class="col-md-4">
        ...
    </div>
</div>

Conclusion

Bootstrap is a powerful front-end framework with extensive featured tools that speeds up tasks saving your time
You can selectively use any piece and part or build off based on your usage to ease your front-end job, though bootstrap cannot be relied for the entire job
It may not offer solution for all client requirements where some own coding might be required for task completion
You can try the code tips from this blog article in your own code based on your needs
Hope front-end developers gained insight into the unknown facts about bootstrap’s wonderful features.

References

Introduction

Django is a high-level python web framework that eases building better web apps more quickly and with less code.

Django MongoDB Engine is a MongoDB backend for Django, the Python Web framework for perfectionists with deadlines.

Features

It consists of an object-relational mapper which mediates between data models (defined as Python classes) and a relational database (“Model”) – a system for processing requests with a web template system (“View”) and a regular-expression-based URL dispatcher (“Controller”).
Serialization system which can produce and read XML and/or JSON representations of Django model instances. A system for extending the capabilities of the template engine.
Django can be run in conjunction with Apache, NGINX using WSGI using flup (a Python module). Django also includes the ability to launch a FastCGI server and it is also possible to use other WSGI-compliant web servers.

Use case

Install Django
Install MongoDB
Install MongoDB dependencies
Start project & app
Add our app into project
Configure database
Create model
Create forms
Create views & URL
Add data with shell
Create Templates
Run Project

Solution

Install Django

Run the following command to install Django:

$ pip install django==1.6.1

Install MongoDB

Download and install MongoDB using the link below:

http://docs.mongodb.org/manual/installation/#tutorials-installation

Install MongoDB dependences

$ pip install https://bitbucket.org/wkornewald/django-nonrel/get/tip.tar.gz
$ pip install https://bitbucket.org/wkornewald/djangotoolbox/get/tip.tar.gz
$ pip install https://github.com/django-nonrel/mongodb-engine/tarball/master

Start project & App

Once the MongoDB setup is done and MongoDB server is started, we can start a Django project with the command below:

$ django-admin.py startproject demoproject

The following structure is created showing a typical Django project layout:

Define the following command in project directory. This will create our first app.

$ python manage.py startapp demoapp

This structure will be created showing a typical Django application layout:

Django creates the following files for you:

demoapp/__init__.py: an empty, but special python file.
models.py: data managed with data.
views.py: where you’ll define all of the view functions.
tests.py: Like bacon, testing is good for you, but we’ll leave that for another day.

Add our app into project

We need to add your app name into the project. Go into settings.py file and add your app name:

INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django_mongodb_engine',
    ‘demoapp’,
)

Configure database

The database is the foundation of most web apps and the Django app is no exception. In this blog, we are going to configure the database settings and then use a Django management command to sync the database to the models.

Change the DATABASES dictionary with the following code:

DATABASES = {
   'default' : {
      'ENGINE' : 'django_mongodb_engine',
      'NAME' : 'my_database'
   }
}

Create models

Models are used to generate database tables and they have a rather nifty object relational mapping (ORM) API for getting things in and out of the database. We add the following code into models.py file:

class Company(models.Model):
       company_name = models.CharField( max_length=255)
       store_name = models.CharField( max_length=255, unique=True)
       address =models.CharField(max_length = 200, blank= True)
       latitude = models.DecimalField(max_digits=6, decimal_places=3)
       longitude = models.DecimalField(max_digits=6, decimal_places=3)
       class Meta:
       ordering = ['-company_name']
       def __unicode__(self):
       return self.company_name, self.store_name

We’ve also given our model class a couple of methods. The first one __Unicode__ is used; this method returns a Unicode object. The second one is inner Meta class on the models. This is where you’re telling the model class how it should be ordered. In this case, company object is ordered by the company_name.

Now, you can create your database with the following command:

$ Python manage.py syncdb

Note: The Views & URL files can be modified as required for usage at relevant instances in the application.

Add data with shell

The manage .py provides a shell interface for the application that you can use to insert data into the Company. Begin by issuing the following command to load the Python shell:

$ python manage.py shell

Create the first post using the following sequence of operations:

Create Views, URL & Templates

We first need to import the demoaapp model objects and create presentation of the data. This means we must add the following import statement at the top of the file.

from demoapp.models import *

Next, the following code adds our new presentation data:

def index(request):
             return render_to_response('index.html')
# Getting JSON Response
def company_List(request):
      company_List = []
    for details in Company.objects.all():
           company_List.append({'company_name': details.company_name,'store_name':details.store_name,'lat': details.latitude,'lng':details.longitude, 'address': widget.address})
         data = simplejson.dumps(company_List)
    return HttpResponse(data, mimetype='application/javascript')

To accomplish this, we created an empty array and then looped through all the details, and for each details, we appended the desired JSON to the array. Finally, we used the simple json.dumps function to convert the array to JSON. Finally, the above presentation data code includes index page and how to get the json response into database.

To create index page

In the templates folder, we added the following two HTML files. The layout.html file is a base template and index.html derives from layout.html.

In the base template, we defined the html that will be used for all templates in the demoapp package. We define the links UI style sheets and the references to the jQuery js files:

<!DOCTYPE html>
<html lang="en">
  <head>
    <link href='/static/css/style.css' rel='stylesheet' type='text/css'/>
    <link href='/static/css/kendo.common.min.css' rel='stylesheet' type='text/css'/>
    <link href='/static/css/kendo.default.min.css' rel='stylesheet' type='text/css'/>
  </head>
  <body>
    <div id="content">
      {% block content %}{% endblock %}
    </div>
   <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
   <script type='text/javascript' src='/static/js/kendo.all.min.js'></script>
    {% block scripts %}{% endblock %}
  </body>
</html>

In the index.html template, we extend the layout.html template and add following codes:

{% extends "layout.html" %}
{% block content %}
<h2>Demo App</h2>
<body>
<ul>
		<a href="/demoapp/map/">Stores_location</a>
</ul>
<ul>
		<a href="/demoapp/Add_store/">Add_store</a>
</ul>
</body>
<div id='widgetsGrid'>
</div>
{% endblock %}
{% block scripts %}
<script type="text/javascript">
console.log('ok');
	$(document).ready(function() {
		$('#widgetsGrid').kendoGrid({
			dataSource: {
				transport: {
					read: {
						url: '/demoapp/company_List/',
						contentType: 'application/json; charset=utf-8',
						type: 'GET',
						dataType: 'json'
					}
				}
pageSize: 10
			},
columns: [
				{
					field: 'company_name',
					title: 'Company_name'
				},
				{
					field: 'store_name',
					title: 'Store_name'
				},
				{
					field: 'lat',
					title: 'latitude',
				},
				{
					field: 'lng',
					title: 'longitude',
				},
				{
					field: 'address',
					title: 'address',
				},
			],
			height: 500,
			sortable: true,
			pageable: true
	});
	});
</script>
{% endblock %}

The HTML code example demonstrates how we utilize the data passed to the template via its context. We make use of the company_name, store_name, lat&lng variable and our Company objects. If a page is undefined or contains no elements, we display a message stating there are no pages present. Otherwise, the pages within the Company objects are presented in a HTML list.

To Create URL

Before we could run the applications, we need to create demoapp URL to display the index.html page.

We add the following code:

urlpatterns = patterns(demoapp.views',
     url(r'^$', 'index'),
    url(r'^company_List/$', company_List),
)

Run the project

Start the development server:

$ python manage.py runserver

Visit the URL, http://127.0.0.1:8000/demoapp/. You should now be able to see the screenshot below.

Create a form

First, create a froms.py file in demoapp directory and import the demoapp model objects. As we already have one model defined for demoapp (Company), we will create ModelForms.

Add the following code:

class CompanyForm(forms.ModelForm):
    company_name = forms.CharField(max_length=128, help_text="Please enter the company_name.")
    store_name = forms.CharField(max_length=200, help_text="Please enter the store_name ")
    address = forms.CharField(max_length=200, help_text="Please enter the address ")
    latitude = forms.DecimalField(max_digits=12, help_text="Please enter the latitude ")
    longitude = forms.DecimalField(max_digits=12, help_text="Please enter the longitude")
    class Meta:
        model = Company
        fields = ('company_name', 'store_name', 'address', 'latitude', 'longitude')

To create an Add New Store on views.py

With our ComapnyForm class now defined, we are now ready to create a new view to display the form and handle the posting of form data.

To do this, add the following code:

def Add_store(request):
    Context = RequestContext(request)
    if request.method == 'POST':
        form = CompanyForm(request.POST)
        if form.is_valid():
            form.save(commit=True)
            return index(request)
        else:
            print form.errors
    else:
        form = CompanyForm()
return render_to_response('add_new_store.html', {'form': form}, context_instance=RequestContext(request))

First, we check the HTTP request method, to determine if it is a HTTP GET or POST. Now, we can use the POST method for this process and validate the form. Finally we save the form.

To create add New Store Template

We create an add_new_store.html file on template directory and add the following code:

<html>
    <head>
        <title>Demo App</title>
    </head>
    <body>
        <h1>Add a store</h1>
        <form id="CompanyForm" method="post" action="/widgets/Add_store/">
            {% csrf_token %}
            {% for hidden in form.hidden_fields %}
                {{ hidden }}
            {% endfor %}
            {% for field in form.visible_fields %}
                {{ field.errors }}
                {{ field.help_text }}
                {{ field }}
            {% endfor %}
            <input type="submit" name="submit" value="Create New Store " />
        </form>
    </body>
</html>

To create a view for Add New Store

We need to create demoapp URL to display the add_new_store.html page.

We add the following code:

urlpatterns = patterns('widgets.views',
    url(r'^$', 'index'),
   url(r'^company_List/$', company_List),
    url(r'^Add_store/$', 'Add_store')
)

To display the Json object into Google Maps

Create views for map function:

We are ready to create a views function to display the json object into Google map using Company objects (store latitude & longitude values).

Add following code into views.py file:

def map(request):
    return render_to_response(map.html')

To create a Google Map Template

First, we create a map.html file on template directory. In this template, we have used “show” to store exact location using Json object.

Add the following code:

<html>
<head>
<title>Demo app for Google Maps Marker using External JSON</title>
<style type="text/css">
html { height: 100% }
body { height: 100%; margin: 0; padding: 0 }
#map_canvas { height: 100% }
</style>
<script type="text/javascript" src="http://code.jquery.com/jquery-latest.min.js"></script>
<script type="text/javascript" src = "http://maps.google.com/maps/api/js">
</script>
<script type="text/javascript">
function initialize() {
var mapOptions = {
center: new google.maps.LatLng(12, 80),
zoom: 5,
mapTypeId: google.maps.MapTypeId.ROADMAP
};
var infoWindow = new google.maps.InfoWindow();
var map = new google.maps.Map(document.getElementById("map_canvas"), mapOptions);
  $.getJSON('/demoapp/companyList/', function(data) {
  //console.log(data);
            $.each( data, function(i, value) {
            //console.log(value);
    var myLatlng = new google.maps.LatLng(value.lat, value.lng);
    //alert(myLatlng);
   // console.log(myLatlng);
    var marker = new google.maps.Marker({
    position: myLatlng,
    map: map,
    title: "text "+value.lon
                });
    });
});
}
</script>
</head>
<body onload="initialize()">
<form id="form1" runat="server">
<div id="map_canvas" style="width: 950px; height: 650px"></div>
</form>
</body>
</html>

Now, the above code gets the json object and puts the object such as (Latitudes & longitudes) into Google Maps Marker.

To create URL for Map function

Add the following code to show the map.html page:

urlpatterns = patterns(demoapp.views',
    url(r'^$', 'index'),
    url(r'^company_List/$', company_List),
    url(r'^map/$', 'map'),
    url(r'^Add_store/$', 'Add_store')
)

Visit the URL, http://127.0.0.1:8000/demoapp/map/ . You should now be able to see the screenshot below:

Conclusion

In this blog, we have discussed Django framework with MongoDB process.
We have shown how to create JSON object into Google maps.
We have discussed about the features.

References

Introduction

Celery is an asynchronous task/job queue framework that allows you to create distributed systems, where tasks (execution units) are executed concurrently on multiple workers using multiprocessing. It also supports scheduling and scales really well since you can horizontally scale workers.

Celery feature benefits

Celery is great at firing both synchronous and asynchronous tasks such as sending emails, processing credit cards, writing transactions to a general ledger, etc. One of its most beneficial features is the ability to chain multiple tasks to create workflows.

Celery features include:

Monitoring
Workflows
Time & Rate Limits
Scheduling
Auto reloading
Auto scaling
Resource Leak Protection
User Components

How does Celery work

Celery requires a message broker. This broker acts as a middleman sending and receiving messages to Celery workers which in turn process tasks as they receive them. Usually, Celery’s recommended message broker is RabbitMQ and Celery beat is used for scheduling that kicks off tasks at regular intervals, which are then executed by the worker nodes available in the cluster.

Here are some points to understand RabbitMQ and Celery Flower:

RabbitMQ is a complete and highly reliable enterprise messaging system based on the emerging AMQP (Advanced Message Queuing Protocol) standard and a proven platform, offering exceptionally high reliability, availability and scalability.

Celery Flower is a real time monitoring tool used to monitor celery events like Task progress, Task details, Task statistics etc., and also acts as a remote control for the celery process.

The following diagram illustrates the RabbitMQ process flow:

Use Case

Let’s continue with the use case of celery with RabbitMQ as a messaging broker:

Install Celery and Celery dependencies
Install RabbitMQ and Start RabbitMQ server
Create a Celery instance using Python
Start Celery beat
Start Celery workers
Install Celery flower

Solution

Install Celery and Celery dependencies

Install Celery

Use the pip command to install Celery:

$ pip install celery

Install Celery dependencies

Use the pip command to install Celery dependencies:

$ pip install pytz
$ pip install Billiard

Install RabbitMQ and Start RabbitMQ server

Install RabbitMQ

Download and install RabbitMQ from this link:

http://www.rabbitmq.com/download.html

Start RabbitMQ server

We need to configure RabbitMQ for message broker services before running the celery. Once the RabbitMQ is successfully started it can be checked using the web UI located at:

http://localhost:15672/

Note: Username and Password is default.

Create a Celery instance using Python

Create a file called tasks.py

from celery import Celery
celery = Celery('tasks')
celery.config_from_object('celeryconfig')

Creating tasks

@celery.task
def add(x, y):
    return x + y

@celery.task
def mul(x, y):
    return x * y

Start Celery Beat

Create a celeryconfig.py file for configuration and scheduling. You can schedule task execution, for example, at a particular time of day or day of the week, using the crontab schedule type given below:

from celery.schedules import crontab

BROKER_URL = 'amqp://guest:guest@localhost//'

CELERYBEAT_SCHEDULE = {
    'every-minute_add': {
        'task': 'app.add',
        'schedule': crontab(minute='*/1'),
        'args':(2,2),
    },
    'every-minute_mul': {
        'task': 'app.mul',
        'schedule': crontab(minute='*/1'),
        'args':(3,4),
    },

}

The following command starts Celery beat:

$ celery -A tasks beat

Start Celery Workers

We can now start worker processes that will be able to accept connections from applications. It will use the file we just created to learn about the tasks it can perform.

Starting a worker instance is as easy as calling out the application name with the celery command.

$ celery -A tasks worker --loglevel=info

The above screen shot shows tasks queue with task list and “Celery ready” console for RabbitMQ connection states.

Once we have successfully started RabbitMQ with Celery, we should login RabbitMQ for checking the “Overview”, “Connections” and “Channels”.

The screen shot below explains the celery process with RabbitMQ messaging broker services GUI:

The screen shot also shows “queued messages” and “message rates” with analytic information by clicking the “Overview” tab. Other tab options include “Connections” to show task names, Protocol, Client (From and to), Timeout, “Channels” to show the Channels list, Virtual host information and State results, “Exchanges”, “Queues” and “Admin”.

Install Celery Flower

Installation

The following command installs Celery Flower:

$pip install flower

Usage

Once the Celery Flower is successfully started we can check using the web UI located at:

http://localhost:5555

The following are the output screens:

We can also see Workers tab with Celery names and status (Online or Offline), Concurrency, Completed tasks, Running Tasks and queues.

The screen shot below depicts Celery monitoring with RabbitMQ showing succeeded tasks and Task times analytics process:

Conclusion

In this blog, we have discussed Celery as a message processor with RabbitMQ.
We have created how to schedule the real time task using Celery beat.
We can easily monitor the real time Celery process using Celery Flower.

References

Introduction

MVEL is an Apache licensed powerful Expression Language (EL), written in Java for java based applications and hence its syntax is mostly similar to Java. In general, the regular expression language is used to match, find and replace the strings and numerals from large content using search patterns.

MVEL is similar to OGNL and it provides variety of expressions as given below:

Property Expressions
Boolean Expressions
Method invocations
Variable assignments
Functions definition

Why MVEL

When used as an Expression Language the operators directly support collection, array and string matching as well as regular expressions.
When used as a Templating Language, can be used for String construction and configuration.
MVEL performance is very faster compared to OGNL.

Use case

Usage of property expressions
Usage of Boolean expressions
Usage of functions definition
Collections usage in MVEL
Templating with MVEL

Pre-requisite

Setup JDK1.6+
MVEL jars

Basic Syntax

MVEL expression may be a simple statement or a complicated script.
The script will contain multiple statements separated by semicolon, denoting the statement termination and newline is not a substitute for the semicolon.
The expression execution is based on the operator precedence.
By default, the expression will return the value out of the last statement in the expression or script based on the last out value principle.

Usage of Property Expressions

Property expression is used to extract the property value out of the variable or from the model objects. Extracting from the property is always straight forward and known to the programmers.

The following example illustrates accessing the property using the MVEL expression:

Example

“Employee” is the java bean containing the firstName, lastName and age as properties. The example code is used to extract the bean property using the MVEL expression:

//Here input to the MVEL expression is a map.
        Map<String, Object> input = new HashMap<String, Object>();
        Employee e = new Employee();
        e.setFirstName("john");
        e.setLastName("michale");
        input.put("employee", e);
        /*
         * Property Expression - Used to extract the property value out of the variable.
         */
	String lastName = MVEL.evalToString("employee.lastName",input); 
       System.out.println("Employee Last Name:"+lastName)

Usually developers access the object using its reference, actually this is not the case. In this example, the “Employee” object is passed with the employee key to the MVEL expression. If we try to access the value using its reference “e”, then the following exception will be thrown.

Here, “input“ is the object context to the expression and MVEL maps the variable, finds the lastName property and extracts the value.

Usage of Boolean Expressions

Boolean expressions are used to validate a particular condition using the operators. MVEL has numerous operators to check the criteria. The following are the operators that are available in the MVEL:

Unary operators (>=, <=, ==, >, <, ! =)
Logical operators (&&, ||)
Bitwise operator (&, |, ^, -, /, *, %)
contains, instance of, string concatenation (+), projections (in)

Example

MVEL.evalToBoolean("employee.lastName == \"john\"", input)
input.put("numeric", new Double(-0.253405));
System.out.println(MVEL.evalToBoolean("numeric > 0",input));

Usage of Functions Definition

Functions definition is a simple method calling. In MVEL, the function is defined using the keyword “def” or “function”. We can pass the parameters to this function and it returns the output value. It always returns a single output by the last statement.

Example

/*
* Function definitions
*/
System.out.println(MVEL.eval(new String(loadFromFile(new File("concat.mvel"))), new HashMap()));

concat.mvel

def stringcon(input){	
	return ($.firstName+$.lastName in input);	
}
stringcon(employeeList);

In the above example, the function is defined to concatenate the “firstName” and “lastName” of the employee bean and finally return the list of concatenated strings as output.

The following is the complete program:

MVELController.java

package com.main;

import static org.mvel2.util.ParseTools.loadFromFile;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.mvel2.MVEL;

import com.model.Employee;

/**
 * This class is responsible to show the different features possible by the MVEL.
 */
public class MVELController {
    public static List<Employee> employeeList = populateEmployee();

    public static void main(String args[]) throws IOException {
        // Here input to the MVEL expression is a map.
        Map<String, Object> input = new HashMap<String, Object>();
        Employee e = new Employee();
        e.setFirstName("john");
        e.setLastName("michale");
        input.put("employee", e);
        /*
         * Property Expression - Used to extract the property value out of the expression.
         */
        String lastName = MVEL.evalToString("employee.lastName", input);
        System.out.println("Employee Last Name:" + lastName);
        /*
         * Boolean Expression
         */
        System.out.println("Is employee name is John:" + MVEL.evalToBoolean("employee.lastName == \"john\"", input));
        input.put("numeric", new Double(-0.253405));
        System.out.println(MVEL.evalToBoolean("numeric > 0", input));
        input.put("input", employeeList);
        System.out.println(MVEL.eval("($ in input if $.firstName == \"john\")",input ));
        /*
         * Function definitions
         */
        System.out.println(MVEL.eval(new String(loadFromFile(new File("concat.mvel"))), new HashMap()));

    }

    private static List<Employee> populateEmployee() {
        List<Employee> employees = new ArrayList<Employee>();
        Employee e = new Employee();
        e.setFirstName("john");
        e.setLastName("michale");

        Employee e1 = new Employee();
        e1.setFirstName("merlin");
        e1.setLastName("michale");

        employees.add(e);
        employees.add(e1);
        return employees;
    }
}

Employee.java

package com.model;

public class Employee {
    private String firstName;
    private String lastName;
    private int    age;

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public String toString(){
        return "First Name:"+firstName;
    }

    }

Output

Note: Method invocations are possible using the MVEL expression language.

Collections usage in MVEL

Comparing the collection

By using the MVEL expression we can compare two collections and can apply the operators (>, < and =) on the collection to find output.

Example

To find the numeric greater than constant from the numeric list:

String operator = ">";
List<Double> computedValue = new ArrayList<Double>();
computedValue.add(new Double(50.0));
computedValue.add(new Double(97.9));
computedValue.add(new Double(68.9));

Map<String, Object> input = new HashMap<String, Object>();
input.put("actual_value", new Double(55.9));
input.put("computed_value", computedValue);
List output = null;

if (operator.equals(">")) {
                output = (List) MVEL.eval("($ in computed_value if $ > actual_value)", input);
            } else {
                output = (List) MVEL.eval("($ in computed_value if $ < actual_value)", input);
            }

Templating with MVEL

Like other templating language MVEL also supports the following:

Binding dynamic values with the static HTML content.
Possible to call the static java method from the MVEL template to return the output.
Ability to use the java code in the MVEL template.
Binding dynamic values with static content is a general use case possible with other templating language. We have used MVEL templating for different use cases to generate the dynamic queries, to form the API calls, R script calls and many more.

Example

Here is the complete example explaining the different use cases for MVEL templating:

package com.main;

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.mvel2.templates.CompiledTemplate;
import org.mvel2.templates.TemplateCompiler;
import org.mvel2.templates.TemplateRuntime;
import org.mvel2.util.ParseTools;

/**
 * Class is responsible to showcase MVEL template usage for different use case
 */
public class MVELTemplateController {
    public static void main(String args[]) {
        MVELTemplateController controller = new MVELTemplateController();

        // Usecase1: Injecting the dynamic property to the static HTML content.
        // MVEL supports the decision making tags to place the default values in case of the actual property value is null
        // Input map should contain the key name otherwise it will throw the exception
        System.out.println("***** Usecase1: Injecting the dynamic property Started *****");
        String message = "<html>Hello @if{userName!=null && userName!=''}@{userName}@else{}Guest@end{}! Welcome to MVEL tutorial<html>";
        System.out.println("Input Expression:" + message);
        Map<String, Object> inputMap = new HashMap<String, Object>();
        inputMap.put("userName", "Blog Visitor");
        System.out.println("InputMap:" + inputMap);
        String compliedMessage = controller.applyTemplate(message, inputMap);
        System.out.println("compliedMessage:" + compliedMessage);
        System.out.println("***** Usecase1: Injecting the dynamic property Ended  *****\n");

        // Usecase2: Loading the MVEL expression from the configuration or property file
        // MVEL library have build-in utility to load the expression from the file input.
        // Usually the mvel script,template will be save with .mvel extension
        try {
            String templateExpression = new String(ParseTools.loadFromFile(new File("input/declaretemplate.mvel")));
            System.out.println("templateExpression:" + templateExpression + "\n");
        } catch (IOException e) {
            e.printStackTrace();
        }

        // Usecase3: Accessing the java class methods from the MVEL template to return the output
        System.out.println("***** Usecase3: Accessing static method Started *****");
        String methodExpression = "@if{com.main.MVELTemplateController.validateInput(userName)}Valid Input@else{} Invalid Input@end{}";
        String validateExpression = controller.applyTemplate(methodExpression, inputMap);
        System.out.println("validateExpression:" + validateExpression);
        System.out.println("***** Usecase3: Accessing static method Ended *****\n");

        // Usecase4: Forming dynamic query by binding the dynamic values
        // We can build complex queries by using the decision making tags@if,@else and for loop tags @for
        // We can bind the values from the bean to expression
        System.out.println("***** Usecase4: Forming dynamic Query Started *****");
        String queryExpression = "select * from @{schemaName}.@{tableName} where @{condition}";
        Map<String, Object> queryInput = new HashMap<String, Object>();
        queryInput.put("schemaName", "testDB");
        queryInput.put("tableName", "employee");
        queryInput.put("condition", "age > 25 && age < 30");
        String query = controller.applyTemplate(queryExpression, queryInput);
        System.out.println("Dynamic Query:" + query);
        System.out.println("***** Usecase4: Forming dynamic Query Ended*****\n");

        // Usecase5: Forming dynamic API calls
        System.out.println("***** Usecase5: Forming dynamic API calls Started *****");
        String weatherAPI = "http://api.openweathermap.org/data/2.5/weather?lat=@{latitude}&lon=@{longitude}";
        Map<String, Object> apiInput = new HashMap<String, Object>();
        apiInput.put("latitude", "35");
        apiInput.put("longitude", "139");
        String weatherAPICall = controller.applyTemplate(weatherAPI, apiInput);
        System.out.println("weatherAPICall:" + weatherAPICall);
        System.out.println("***** Usecase5: Forming dynamic API calls Ended *****\n");
    }

    /**
     * Method used to bind the values to the MVEL syntax and return the complete expression to understand by any other engine.
     * 
     * @param expression
     * @param parameterMap
     * @return
     *         Jun 19, 2015
     */
    public String applyTemplate(String expression, Map<String, Object> parameterMap) {
        String executeExpression = null;

        if (expression != null && (parameterMap != null && !parameterMap.isEmpty())) {
            // compile the mvel expression
            CompiledTemplate compliedTemplate = TemplateCompiler.compileTemplate(expression);
            // bind the values in the Map input to the expression string.
            executeExpression = (String) TemplateRuntime.execute(compliedTemplate, parameterMap);
        }

        return executeExpression;
    }

    /**
     * Method used to validate the input
     * 
     * @param input
     * @return
     *         Jun 19, 2015
     */
    public static boolean validateInput(String input) {
        boolean isValid = false;

        if (input != null && input.equals("example")) {
            isValid = true;
        }

        return isValid;
    }
}

Output

Conclusion

MVEL expression is clean and its syntax is easy to understand.
There will be a number of use cases where the MVEL will be used efficiently and effectively.
MVEL engine performance is good compared to other EL engine.
Apache drools use MVEL template for the dynamic code generation and Apache Camel use the MVEL template for the message generation.

References

Introduction

Why Microservice?

Microservices are an approach to developing a single application as a set of lightweight, independent collaborating services. The main benefit of using microservices is that, unlike a monolithic architecture style, a change made to a small part of the application does not require the entire structure to be rebuilt and redeployed.

Microservices have many coordinating moving parts. While the tools are getting better, it is still a highly complex architecture with lots of moving parts.

Applications grow more complex over time, and that complexity creates challenges in development. There are two essential strategies to manage this problem: a team can keep everything together (create a monolith) or a team can divide a project into smaller pieces (create microservices).

Microservices, on the other hand, describe a strategy for decomposing a large project into smaller, more manageable pieces.

A microservice is a piece of application functionality factored out into its own code base, speaking to other microservices over a standard protocol.

Microservices can be distributed globally for low latency and can even run multiple versions of the same service simultaneously for redundancy.

Monolithic app is One Big Program with many responsibilities; microservice-based apps are composed of several small programs, each with a single responsibility.

Use Case

Prerequisites

To accomplish this, first divide your business requirements into related groups like Store management logic, Product logic, Company logic and a web user interface. Write a program to provide each service and connect each service to a language-agnostic protocol like HTTP, AMQP, or Redis. Finally, pass messages via the protocol to exchange data with other services.

What we need to do

Create three Maven projects in this hands-on, each of them will symbolize back-end functionality, i.e. reusable APIs, and one of them held a composition, that is, will be a consumer of the other.

To begin, let’s create 3 simple Maven projects, Customer-backend, Bank-backend, HealthCare-backend and Retail-backend.

In the poms of the 3 projects, we will add the dependencies for the creation of our REST services and startup Spring Boot

<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.2.0.RELEASE</version>
</parent>

<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jersey</artifactId>
</dependency>
</dependencies>

The first class that we create named as MicroservicesApp, will be identical in all three projects, because it only works as an initiator to Spring Boot – as defined by @SpringBootApplication annotation – rising a Spring context and the embedded server.

@SpringBootApplication
public class MicroservicesApp {
  public static void main(String[] args) {
   SpringApplication.run(MicroservicesApp.class, args);
  }
}

The next class we will see is the MicroservicesAppConfig. In this class, which uses the @Configuration Spring annotation to indicate to the framework that it is a resource configuration class, we set the Jersey, which is our ResourceManager responsible for exposing REST services for the consumers.

Note: In a real application, this class would also create data sources for access to databases and other resources, hence we will use mocks to represent the data access.

We will also instantiate a RestTemplate. RestTemplate is a standardized and very simple interface that facilitates the consumption of REST services. This class is one of the new features in the Spring Framework.

@Configuration
public class MicroservicesAppConfig {
@Named
	static class JerseyConfig extends ResourceConfig {
		public JerseyConfig() {
		  this.packages(com.microservices.poc.rest");
		  }
	}
@Bean
public RestTemplate restTemplate() {
RestTemplate restTemplate = new RestTemplate();

return restTemplate;
}
}

We create a class of DTO and a REST service, the class which is a simple POJO:

public class Store {

	private long storeId;
	private String storeName;
	private String storeLoc;
	public long getStoreId() {
		return storeId;
	}
	public void setStoreId(long storeId) {
		this.storeId = storeId;
	}
	public String getStoreName() {
		return storeName;
	}
	public void setStoreName(String storeName) {
		this.storeName = storeName;
	}
	public String getStoreLoc() {
		return storeLoc;
	}
	public void setStoreLoc(String storeLoc) {
		this.storeLoc = storeLoc;
	}

The REST service, in turn, has only 2 capabilities, a search for all Banks and a Bank Detail query from the given ifsc_code:

@Named
@Path("/")
public class StoreRest {

	private static List<Store> store = new ArrayList<Store>();

	static {

		Store store1 = new Store();
		store1.setStoreId(101);
		store1.setStoreName("Store 1");
		store1.setStoreLoc("Location 1");

		Store store2 = new Store();
		store2.setStoreId(102);
		store2.setStoreName("Store 2");
		store2.setStoreLoc("Location 2");

		Store store3 = new Store();
		store3.setStoreId(103);
		store3.setStoreName("Store 3");
		store3.setStoreLoc("Location 3");

		Store store4 = new Store();
		store4.setStoreId(104);
		store4.setStoreName("Store 4");
		store4.setStoreLoc("Location 4");

		Store store5 = new Store();
		store5.setStoreId(105);
		store5.setStoreName("Store 5");
		store5.setStoreLoc("Location 5");

		store.add(store1);
		store.add(store2);
		store.add(store3);
		store.add(store4);
		store.add(store5);

	}

	@GET
	@Produces(MediaType.APPLICATION_JSON)
	public List<Store> getStores() {
		return store;
	}

	@GET
	@Path("store")
	@Produces(MediaType.APPLICATION_JSON)
	public Store getStore(@QueryParam("storeId") long storeId) {

		Store st = null;

		for (Store s : store) {

			if (s.getStoreId() == storeId)
				st = s;

		}

		return st;
	}

}

This concludes our REST Store service.

For products, we have methods to search all products or a single product through one of its ids and finally we have the company service, which through a submitCompany method gets the data of a product and a store – whose keys are passed as parameters to the method and returns a company header. The classes that make up our services are the following:

public class Product {

	private long productId;
	private String productName;
	private String productDesc;

	public long getProductId() {
		return productId;
	}

	public void setProductId(long productId) {
		this.productId = productId;
	}

	public String getProductName() {
		return productName;
	}

	public void setProductName(String productName) {
		this.productName = productName;
	}

	public String getProductDesc() {
		return productDesc;
	}

	public void setProductDesc(String productDesc) {
		this.productDesc = productDesc;
	}

}

public class Product {

	private long productId;
	private String productName;
	private String productDesc;

	public long getProductId() {
		return productId;
	}

	public void setProductId(long productId) {
		this.productId = productId;
	}

	public String getProductName() {
		return productName;
	}

	public void setProductName(String productName) {
		this.productName = productName;
	}

	public String getProductDesc() {
		return productDesc;
	}

	public void setProductDesc(String productDesc) {
		this.productDesc = productDesc;
	}

}

Finally, the classes that make up our above mentioned company service in the Company-backend project is:

public class Company {

	private long companyId;
	private String companyName;
	private String companyLoc;
	private Store store;
	private Product product;
	private long stock;
	private Date stockDate;

	public long getStock() {
		return stock;
	}

	public void setStock(long stock) {
		this.stock = stock;
	}

	public Date getStockDate() {
		return stockDate;
	}

	public void setStockDate(Date stockDate) {
		this.stockDate = stockDate;
	}

	public Store getStore() {
		return store;
	}

	public void setStore(Store store) {
		this.store = store;
	}

	public Product getProduct() {
		return product;
	}

	public void setProduct(Product product) {
		this.product = product;
	}

	public long getCompanyId() {
		return companyId;
	}

	public void setCompanyId(long companyId) {
		this.companyId = companyId;
	}

	public String getCompanyName() {
		return companyName;
	}

	public void setCompanyName(String companyName) {
		this.companyName = companyName;
	}

	public String getCompanyLoc() {
		return companyLoc;
	}

	public void setCompanyLoc(String companyLoc) {
		this.companyLoc = companyLoc;
	}

}

@Named
@Path("/")
public class CompanyRest {

	private long id = 1;

	@Inject
	private RestTemplate restTemplate;

	@GET
	@Path("company")
	@Produces(MediaType.APPLICATION_JSON)
	public Company submitCompany(@QueryParam("idStore") long idStore,
			@QueryParam("idProduct") long idProduct,
			@QueryParam("stock") long stock) {

		Company company = new Company();

        if(isPortInUse("localhost", 8081)){
		Store store = restTemplate.getForObject(
				"http://localhost:8081/store?id={id}", Store.class, idStore);
           company.setStore(store);

         }
         if(isPortInUse("localhost", 8082)){
		Product product = restTemplate.getForObject(
				"http://localhost:8082/product?id={id}", Product.class,
				idProduct);
               company.setProduct(product);
         }

		company.setCompanyId(id);
		company.setStock(stock);
		company.setStockDate(new Date());

		id++;

		return company;
	}
    private boolean isPortInUse(String host, int port) throws Exception{
	  // Assume no connection is possible.
	  boolean result = false;

	  try {
	    (new Socket(host, port)).close();
	    result = true;
	  }
	  catch(SocketException e) {
	    // Could not connect.
		  e.printStackTrace();
	  }

	  return result;
	}
}

“isPortInUse(<host_name>, <port_number>)” is used to check the service if the connected microservice is broken.

Solution

Create

Centralized router in-between all services
Throw Exceptions into the queues (bunch of exceptions in memcache)
Queues in-between services provide buffers that “smooth internal traffic”

The major benefits of microservices are that these services can work independently and together, posing recurring challenges.

Since each part works independently, there is the risk of latency when each piece is brought together.

We can create a circuit breaker which acts like a discovery service, where one microservice realizes another is “sick” and notifies the main circuit breaker. From that point on, a microservice will be able to check the service discovery to determine if the microservice it is connected to is broken, in order to prevent calls being made to or from the said microservice. Setting a “time out” for about ten minutes is recommended.

Conclusion

Building and scaling the teams
It makes your app more scalable with smaller teams, enabling modularity and is more open to generic changes
Uses a smaller codebase
Creates a gatekeeper to facilitate one service contacting the database schema at once instance
The key benefits of microservice is it eases scaling your engineering team larger [and] your application
Microservices often experience compatibility issues, as these services built by different teams on top of different technologies leading to an unstable environment where people are building in different directions.
There are methods to deal with such problems and the five most common challenges of using microservices are:
- Building Microservices First
- Information Barriers to Productivity
- How Do Your Services Find Each Other?
- How Can Services Communicate with Each Other?
- Decentralized Data Isn’t Always a Positive
Difference between SOA and Microservices:
- Microservices also lend themselves to agile, scrum and continuous delivery software development processes with shorter life cycles
- This architecture distinguishes itself from service-oriented architecture (SOA) because each microservice belongs to one application only and not multiple

References

Introduction

The advent of NoSQL databases has lead many application developers, designers, and architects to apply the most appropriate means of data storage to each specific aspect of their systems, and this may involve implementing multiple types of database and integrating them into a single solution. The result is a polyglot solution.

Designing and implementing a polyglot system is not a straightforward task and there are a number of questions that need to be addressed including:

How can you implement a uniform data access strategy that is independent of the different databases? The business logic of an application should not be dependent on the physical structure of the data that it processes as this can introduce dependencies between the business logic and the databases. This issue is especially important of your data is spread across different types of database, where a single business method might retrieve data from a variety of data stores.
How can you make the best use of different databases to match the business requirements of your applications? This is the driving force behind building polyglot solutions.
- If your applications need to store and retrieve structured data very quickly you might consider using a document database
- If you need to perform more complex analyses on the same data then a column-family database might be more appropriate
- If you need to track and manage complex relationships between business objects, then a graph database could provide the features that you require
Some of the challenges to address are:
- Maintaining consistency across different databases
- Increased application complexity
- Increased deployment complexity
- Training for developers and operational staff

Why OrientDB?

OrientDB is a tool capable of defining, persisting, retrieving and traversing information. It can play as a

Document DB

Similar to MongoDB or Couchbase, OrientDB can store documents and can take an arbitrary document (e.g., a JSON document) and store it. After it has been stored we can query it using path expressions, as we would expect from any document database.

Graph DB

Similar to famous Neo4j and Titan, handling relationships are responsibilities of graph databases. Graph databases typically implement the relationships as first class citizens called edges that connect vertices. A vertex, in most graph databases, is a simple cluster of name-value pairs. Is it possible to make each document in the document database as a vertex? OrientDB has done exactly that. Instead of each node being a flat set of properties, it can be a complete document with nested properties.

Object DB

Another interesting feature that OrientDB offers is the object-oriented implementation under the document DB: with OrientDB we are able to define a hierarchy between tables (they are called “classes”) and thus being able to take advantage of inheritance.

OrientDB allows us to define classes that the objects (vertices or documents) must conform to but does not enforce us to do so. OrientDB can run in strict schema mode (all objects are typed and must conform to the class definitions), in a hybrid mode (all objects must AT LEAST conform to the rules of the classes but may add any other properties not specified in the classes) or in schema-less mode.

Use Case

OrientDB being both Document and Graph database, it’s very obvious to use such multi-model NOSQL data store to address the below use cases that would otherwise end us up dealing with two NOSQL databases with all the challenges mentioned in the previous section.

Use Case 1

An application stores detailed information about each employee as a collection of documents in a document database, and maintain the information about the managers that they report to as a graph in a graph database. In this way, the data for each employee can be as complicated as the document database will allow, and the graph database only needs to hold the essential details of each employee needed to perform the various graph-oriented queries required by the application

Use Case 2

Wanderu provides the ability for consumers to search for bus and train tickets, with journeys combining legs from multiple different transportation companies. The route data is stored in JSON, making a document storage engine a great solution for their route leg data storage. However, they also needed to be able to find the optimal path from origin to destination. This is perfect for a graph database as they can understand the data relationships between different transit route legs.

Conclusion

The power of a Distributed Graph Database engine with the flexibility of a Document Database all in one product. That is OrientDB.
With OrientDB, ACID transactions are distributed across servers.
HTTP Rest + JSON API, Multi-Master Replication, user and roles security, embedded & server deployment, data sharding, and SQL support are some of the key capabilities of OrientDB.

References

OrientDB: http://www.orientdb.com/
OrientDB vs Neo4j: http://www.slideshare.net/kwoxer/orientdb-vs-neo4j-and-an-introduction-to-nosql-databases
OrientDB vs MongoDB: http://www.slideshare.net/StefanoCampese/mongodb-vs-orientdb
Getting Started: http://orientdb.com/getting-started/

Introduction

A data-driven organization will use the data as critical evidence to help inform and influence strategy. To be data-driven means cultivating a mindset throughout the business to continually use data and analytics to make fact-based business decisions. Becoming a data-driven organization is no longer a choice, but a necessity. Making decisions based on data-driven approaches not only increases the accuracy of results but also provides consistency in how the results are interpreted and fed back into the business.

However, several challenges continue to hinder businesses from becoming data-driven. The average organization today collects more data than ever before, and the variety of data types that are stored, managed, and analyzed has increased exponentially. Added to this, data is also spread across different locations, databases, cloud, and so on. Finally, talent with different data skills are needed to ingest, transform, aggregate, model, analyze and create insights.

To become effective data-driven organization, businesses need to perform the following:

Data Collection & Processing:

Data undoubtedly is a key ingredient. It can’t just be any data; it has to be the right data. the dataset has to be relevant to the question at hand. It also has to be timely, accurate, clean, unbiased and most importantly, it has to be accurate.

Data Access & Modeling:

Data must be accessible and queryable. Having accurate, timely, and relevant data is not sufficient for an organization to be data-driven. It must be properly modeled so that it can be joinable, queryable, and shareable. The data must be in a form that can be joined with other enterprise data when necessary. There must be a data-sharing culture within the organization so that data can be joined such as combining master data with transaction data. So, siloed data is always going to reduce the scope of what can be achieved. There must be appropriate tools to query and slice/dice the data and reporting/analysis requires filtering, grouping, and aggregating data to reduce the large amounts of raw data into a smaller high value data.

Reporting:

Reporting is the process of organizing data into informational summaries in order to monitor how different areas of a business are performing. Reporting tells what happened in the past and also provides a baseline from which to observe changes and trends. It is fundamentally backward view of the world and not entirely sufficient for an organization to be data-driven.

Analysis:

Transforming data assets into competitive insights that will drive business decisions and actions using people, processes, and technologies. Reporting says what happened but Analysis says why it happened. Reporting is descriptive but Analysis is Prescriptive.

Reporting	Analysis
Descriptive	Prescriptive
Backward-looking	Forward-looking
Raise questions	Answer questions
Data –> Information	Data + Information –> Insights
Reports, dashboards	Findings, predictions, recommendations
What is happening	Why it is happening

courtesy: Dykes 2010 reporting vs analysis

DATA TEAM – WHO & WHAT

Where do they fit

Conclusion

The human component of a great data-driven organization is a great analytics organization. Who are those people, and how should they be organized is the key.
Analytics is a team sport. A well-oiled, data-driven organization is going to have both a range of analytical personnel with different roles and also with complementary skills.
Skills & Qualities of Data Team: Numerate, detail-oriented and methodical, appropriately skeptical, data curious & lovers, good story tellers, and patient

Introduction

Apache Zeppelin, a web based notebook that enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin support many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R.

We have implemented SFO Crime Analysis with plain R, Shiny & R, and OpenRefine in the past and this time with Zeppelin & R. We will also briefly show how Spark with R programs on the same Crime dataset can be used. We suggest to refresh our previous blog posts as this one re-uses most of the stuff on a different environment. Please take a look at the Reference section on how to get to the previous blog post.

Use Case

This use case is based on San Francisco Incidents derived from SFPD (San Francisco Police Department) Crime Incident Reporting system for calendar year 2013. This dataset contains close to 130 K records that contains type of crime, date and time of the incident, day of the week, latitude and longitude of the incident. We will analyze this dataset and extract some meaningful insights.

What we want to do:

Prerequisites
Download Crime Incident Dataset
Data Extraction & Exploration
Data Manipulation
Data Visualization
Simple Spark & R Exploration

Solution

Prerequisites

Install Apache Zeppelin with Spark

Installation steps are bit involved and out of the scope of this post. However, there are many articles and blogs that shows how to set up Zeppelin with built-in Spark modules or integrate with existing Spark environment. Below are some installation references.

http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/

https://github.com/apache/incubator-zeppelin

http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/

Download Crime Incident Dataset

Download Dataset: This use case is based on San Francisco Incidents derived from SFPD (San Francisco Police Department) Crime Incident Reporting system for calendar year 2013. Download the dataset by clicking on the below link and unzip into your working directory on a machine where Zeppelin is running.

SFPD dataset: SFPD_Incidents.zip

Understanding Dataset: This dataset contains the following columns. We will be dropping some columns, rename few, and add few extra columns that will help in our analysis.

IncidntNum	This is the incident number for this incident. Unfortunately, some of these incident numbers are duplicated
Category	This is the category of the crime. Eg: Robbery, Fraud, Theft, Arson, and so on. This use case will combine similar crimes to keep the categories handful
Descript	This is the description of the crime. We won’t need this and hence this column will be dropped
DayOfWeek	Which day this incident happened in the week.
Date	Date of the incident
Time	Hour and Minute representation
PdDistrict	Which district this area corresponds. We won’t need this and will be dropped
Resolution	What happened to the culprits in the incident?
Address	Street address of the incident
X	Longitude of the location. This column will be renamed to longitude
Y	Latitude of the location. This column will be renamed to latitude
Location	Comma separated of latitude and longitude. We won’t need this and will be dropped

Basic Setup

%r

library(chron)
library(ggplot2)
library(googleVis)
setwd("/home/blog")

# Read SFPD_Incidents.csv file into a data frame
sfo_crime_data <- read.csv(file = "SFPD_Incidents.csv")

# Save the data frame as RData for future purpose
save(sfo_crime_data, file="sfo_crime_data.rdata")

# Load the RData into memory
load("sfo_crime_data.rdata")
# Find number of rows
nrow(sfo_crime_data)

Install Packages: This use case requires library “chron” to be installed to perform some date and time manipulation, and ggplot2 for visualization purpose
```
install.packages(“chron”)
library(chron)

install.packages(“ggplot2”)
library(ggplot2)

install.packages(“googleVis”)
library(googleVis)
```

Structure of the Data

%r
str(sfo_crime_data)

Summary of the Data

%r
summary(sfo_crime_data)

Data Manipulation Step 1 – Order & Remove Duplicates

%r

# Order the data frame by Incident Number

sfo_crime_data <- sfo_crime_data[order(sfo_crime_data$IncidntNum),]

#It is evident from above that IncidntNum is duplicated multiple times. The before and after row count shows the difference.

nrow(sfo_crime_data)

# So, the duplicates can be removed by performing subset along with duplicated function

sfo_crime_data <- subset(sfo_crime_data, !duplicated(sfo_crime_data$IncidntNum))

nrow(sfo_crime_data)
head(sfo_crime_data)

Data Manipulation Step 2

%r

#Date and Time field are stored separately that needs to be combined into a new field so that proper date operations can be executed on #them. Use paste function to combine Date and Time fields. The Time field from the raw data had only hour and minute but missing #seconds column. Add a default 00 seconds to all Time field

sfo_crime_data$datetime <- paste(sfo_crime_data$Date, paste(sfo_crime_data$Time,":00",sep=''))

#Create a new column Incident_Date by converting the factorial datetime column into Date & Time data type by using POSIXIt function

sfo_crime_data$Incident_Date <- as.POSIXlt(sfo_crime_data$datetime, format="%m/%d/%Y %H:%M")

#Split time from the Incident_Date into Incident_Time to perform analysis on the data by time of the day

sfo_crime_data$Incident_Time <- times(format(sfo_crime_data$Incident_Date, "%H:%M:%S"))

#Split Date from the Incident_Date field to perform analysis on day of the month

sfo_crime_data$Incident_Date <- as.POSIXlt(strptime(sfo_crime_data$Incident_Date, format="%Y-%m-%d"))

#Drop columns that are not needed for this analysis to keep it clean. The chosen columns are stored in a Vector

drops <- c("Descript", "PdDistrict", "Address", "Location", "datetime", "Date", "Time")

sfo_crime_data <- sfo_crime_data[, !(names(sfo_crime_data) %in% drops)]
head(sfo_crime_data)

Data Manipulation Step 3

%r

#Change the column names of X and Y to longitude and latitude respectively. These columns are in 5th and 6th position in the data #frame.

colnames(sfo_crime_data)[c(5,6)] <- c("longitude", "latitude")

#The frequency of crimes need not be consistent throughout the day as certain crimes happen more in the night than the rest of the #hour. To check this, we can bucket the timestamps into few categories and then analyze the distribution of the crimes across different #time intervals. Let’s create 4 buckets: Early Morning, Morning, Evening, and Night by grouping certain hours together.

time.tag <- chron(times=c("00:00:00", "06:00:00", "12:00:00", "18:00:00", "23:59:59"))

#Create a new column Incident_Time_Tag and use Chron’s cut function to put different hours into the time breaks

sfo_crime_data$Incident_Time_Tag <- cut(sfo_crime_data$Incident_Time, breaks=time.tag, labels=c("Early Morning","Morning", "Evening", "Night"), include.lowest=TRUE)

head(sfo_crime_data)

Data Manipulation Step 4 – Find Month of the event

%r

#Identify which month of the year a particular crime incident has happened. We already have DayOfWeek from the raw data about #which day of the week but month of the year is missing

sfo_crime_data$Incident_Month <- months(sfo_crime_data$Incident_Date, abbreviate=TRUE)

#Explore different crime categories that comes with the raw data and identify total number of crime categories

head(sfo_crime_data$Category)
length(unique(sfo_crime_data$Category))

Data Manipulation Step 5 – Group Similar Crimes

%r

#Create a new column Crime_Category from Category column and factorize it
sfo_crime_data$Crime_Category <- as.character(sfo_crime_data$Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("SEX OFFENSES, FORCIBLE", "PROSTITUTION", "SEX OFFENSES, NON FORCIBLE", "PORNOGRAPHY/OBSCENE MAT"), 'SEX', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("DRIVING UNDER THE INFLUENCE", "DRUG/NARCOTIC", "DRUNKENNESS", "LIQUOR LAWS"), 'DRUGS', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("FORGERY/COUNTERFEITING", "FRAUD", "BAD CHECKS"), 'FRAUD', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("BURGLARY", "ROBBERY", "STOLEN PROPERTY", "EXTORTION"), 'ROBBERY', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("LARCENY/THEFT", "VEHICLE THEFT", "RECOVERED VEHICLE", "EMBEZZLEMENT", "RECOVERED VEHICLE"), 'THEFT', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("VANDALISM", "ARSON"), 'ARSON', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("MISSING PERSON", "KIDNAPPING"), 'KIDNAPPING', sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("BRIBERY", "DISORDERLY CONDUCT", "FAMILY OFFENSES", "GAMBLING", "LOITERING", "RUNAWAY", "OTHER OFFENSES", "SUSPICIOUS OCC"), 'OTHER OFFENSES',sfo_crime_data$Crime_Category)
sfo_crime_data$Crime_Category <- ifelse(sfo_crime_data$Crime_Category %in% c("NON-CRIMINAL", "SUICIDE"), 'NON-CRIMINAL', sfo_crime_data$Crime_Category)
head(sfo_crime_data$Crime_Category)
length(unique(sfo_crime_data$Crime_Category))

Data Visualization

Crimes by each Category:

%r
#Aggregate crime category to find the count for different crime categories
crime_count <- aggregate(sfo_crime_data$Crime_Category, by = list(sfo_crime_data$Crime_Category), FUN = length)
names(crime_count) <- c("Crime_Category", "Count")

crime_count_plot <- gvisColumnChart(data = crime_count, xvar="Crime_Category", yvar="Count", options=list(title="# of Crimes", chartArea="{left:50,top:50,width: '75%', height: '75%'}",  hAxis="{textPosition: 'out'}", vAxis= "{textPosition: 'out'}", width=800, height=320))
crime_count_plot

Crimes by Time of the Day:

%r
#Aggregate crime category by different time in a day
crime_count_time_tag <- aggregate(sfo_crime_data$Crime_Category, by = list(sfo_crime_data$Incident_Time_Tag), FUN = length)
names(crime_count_time_tag) <- c("Incident Time", "Count")

crime_count_time_tag_plot <- gvisColumnChart(data = crime_count_time_tag, xvar="Incident Time", yvar=("Count"), options=list(title="# of Crimes By Time", chartArea="{left:50,top:50,width: '75%', height: '75%'}",  hAxis="{textPosition: 'out'}", vAxis= "{textPosition: 'out'}", width=600, height=300))

crime_count_time_tag_plot

Crimes by Day of the Week:

%r
#Aggregate crime category by different days in a week
crime_count_dayofweek <- aggregate(sfo_crime_data$Crime_Category, by = list(sfo_crime_data$DayOfWeek), FUN = length)
names(crime_count_dayofweek) <- c("Day of Week", "Count")

crime_count_dayofweek_plot <- gvisColumnChart(data = crime_count_dayofweek, xvar="Day of Week", yvar=("Count"), options=list(title="# of Crimes By Day of Week", chartArea="{left:50,top:50,width: '75%', height: '75%'}",  hAxis="{textPosition: 'out'}", vAxis= "{textPosition: 'out'}", width=800, height=300))

crime_count_dayofweek_plot

Crimes by Month of the Year:

%r
#Aggregate crime category by different months in a year
crime_count_monthofyear <- aggregate(sfo_crime_data$Crime_Category, by = list(sfo_crime_data$Incident_Month), FUN = length)
names(crime_count_monthofyear) <- c("Incident Month", "Count")
crime_count_monthofyear_plot <- gvisColumnChart(data = crime_count_monthofyear, xvar="Incident Month", yvar=("Count"), options=list(title="# of Crimes By Day of Week", chartArea="{left:50,top:50,width: '75%', height: '75%'}",  hAxis="{textPosition: 'out'}", vAxis= "{textPosition: 'out'}", width=800, height=300))

crime_count_monthofyear_plot

Simple Spark & R Exploration

Please note that we use a different interpreter (%spark.r) instead of (%r) to indicate Zeppelin to execute these as Spark jobs instead of ordinary R programs.

Spark Dataframe Setup:

Read the CSV file similar to R CSV read and create a Spark Dataframe to explore. Caching Spark Dataframe will help in running further analysis faster so that Spark doesn’t have to go thru the whole action process due to its laziness.

%spark.r

# Read SFPD_Incidents.csv file into a data frame
sfo_crime_data1 <- read.csv(file = "/home/blog/SFPD_Incidents.csv")

df <- createDataFrame(sqlContext, sfo_crime_data1)
cache(df)
registerTempTable(df, "SFODF")

SparkSQL – Group by Day of Week:

Please note that we use a SparkSQL interpreter (%sql) to work on the temp table (SFODF) that was registered in the previous section

%sql

select DayOfWeek, count(*) from SFODF group by DayOfWeek

SparkSQL – Group by Crime Category:

%sql

select Category, count(*) from SFODF group by Category

SparkSQL – Group by Parameter:

Zeppelin enables form parameters so that users can check the reports by passing different parameters. In this example, We have provided two group by categories (District and Resolution) and by choosing from the dropdown, pie chart will change

%sql

select ${GroupBy=Resolution,Resolution|PdDistrict} , count(*) from SFODF group by ${GroupBy=Resolution,Resolution|PdDistrict}

By Resolution:

By District:

Zeppelin Dashboard

Conclusion

Zeppelin provides built-in Apache Spark Integration with Scala and PySpark. With a bit of custom integration, SparkR is also possible.
Automatic SparkContext and SQLContext injection and sharing of SparkContext across Scala, Python, and SparkR comes very handy
Zeppelin can dynamically create some input forms into your notebook as showin in SparkSQL – Group by Parameter

References

Introduction

Metabase, an open source, easy-to-use database visualization tool, is built and maintained by a dedicated Metabase team and comes with a Crate driver. It is written in Clojure and offers multiple options such as Mac application, Docker image, cloud images, and a jar file, which are specifically designed for particular use cases.

Metabase is mainly used for analyzing your existing data on a daily basis by quickly fetching answers to your common queries without dealing with complex workflows. It supports around 7 different type charts, which were used to plot charts using data from both NoSQL and MySQL databases using common/basic SQL Queries.

This blog content deals with data visualization using Metabase and MongoDB to analyze the users’ access of the application on a daily basis.

Use Cases

Let us consider a use case to analyze the user data using Metabase in connection with MongoDB.

What we need to do

Pre-requisites
Data Loading into MongoDB
Setup MongoDB with Metabase
User Account Creation and Access Restriction
Dashboard Creation with Different Query Layers

Solution

Pre-requisites

JDK 1.6 +
Install MongoDB and Metabase on same node (both to be in running state)

Install MongoDB and Metabase

Few links for installation references and reference documents are:

https://docs.mongodb.com/manual/tutorial
http://www.metabase.com/start/jar.html

Data Loading into MongoDB

Connect to MongoDB shell, and create a database using the below comment:

Use user_info

(Note: A new database will be automatically created if not already present)

Insert/Import the documents using the below command.
Create a new collection using the below command

db.createCollection(“user_info”)

(Note: A new collection will be automatically created if not already present)

Import the sample json to your MongoDB using the below command:

mongoimport --db user_info --collection user_info --file user_info.json

Sample Document and its Fields:

{
	"_id": {
		"$oid": "57b2d737d29aba239c4513b1"
	},
	"action": "baby\u0026kids",
	"brand": "Vans",
	"date": {
		"$date": "2016-07-01T15:05:00.000Z"
	},
	"ip": "192.168.0.138",
	"item": "kids footwear",
	"user_agent": "Browser name:Chrome, Version 48.0.2564.116,OS :Windows",
	"useraccount": "isabella@yahoo.in",
	"username": "Isabella"
}

Field	Description
id	MongoDB HashID
Username	Specific name of the user
IP	IP address of the machine accessed by the user
Item	Type of product
Date	User accessed Date and Time (it should be in ISODate format not a string)
user_agent	Browser Name and Version, OS
Action	Provides the information about the page accessed by the user
Brand	Legal marketing name of the company

Download Sample document: user_info

Setup MongoDB with Metabase

To setup MongoDB, perform the following steps:

Run the downloaded Metabase jar using the below command:

java -jar metabase.jar

Create your own account by providing details such as First Name, Last Name, Email Address, Password, and Company or Team Name using the below link:

http://localhost:3000/setup/

Provide your DB details such as DB Name, Host, Port, DB User Name, Password, and so on.

User Account Creation and Access Restriction

You can create multiple new users by choosing People tab, and assign a role to each of them to enable access restriction.

You can add multiple databases by choosing Databases tab.

Data Preprocessing

To analyze your data, you need to preprocess your data in a questioner format with Metabase. Metabase will provide answers to your questions almost instantly. It will allow you to create custom filters and provide sorting options to refine or rearrange the output answers.

Dashboard Creation with Different Query Layers

You can create your own dashboard by combining your questions and answers and can even layer them on top of each other for better comparison.
You can save your frequently used questions in the Dashboard and can review them at later point of time.

Click “New Question” tab
Connect to the db and table/collection.
Add where condition in the filter by column by choosing your field and value.

Here are few sample queries:

Total Record Count

Choose view type as “Count of rows” and execute the queries.

Total Page Count by Action

Choose the field in group by session as “Action”.

Total Page Count by Brand

Choose the field in group by as “Brand”.

User Access Count

Choose the field in group by as “user account”.
Filter the column/field by not empty.

Total Page Count by Item and Brand

Apply not empty filter for both column and add the same in group by column.

Women – Count of User Based on Brand and User Account

Apply filter for action column as “women” and group by “user account and brand”.

Electronics – Count of User Based on Item

Choose action as “electronics” and group by column as “item and user account”.

User Account Based on Brand Sort by Access Count

Apply “brand and user account” in group by column, and choose “sort” based on “count” with descending.

Home furniture’s with past 7 days based on user account

Choose filter as past 7 days in “date time” field and action as “home & future” with group by as “user account”.

Save all the query by clicking “Save” button.
Add the graph to the dashboard.

The dashboard would look similar to the one as shown below:

Note: Metabase time filter will work only on MongoDB version greater than 3.2.

Limitations

Selection of tables/collection in Metabase is limited to two for each database connection.
JVM size need to be increased if table size is too large so as to perform Metabase process.

Conclusion

Connect with different type of databases in a comparatively simple manner without requiring high technical knowledge.
Metabase allows you to easily analyze billions of data in databases and analyze user access on a daily basis.
Store billions of users’ data in a more reliable and productive manner in MongoDB.
Handle huge volume of request such as 50,000 requests with MongoDB and Metabase.

Reference

Overview

Bluemix, the latest cloud offering from IBM, is offered as Platform as a Service (PaaS) and is based on cloud foundry open technology. It is one of the best programming environment for Internet of Things (IoT) applications using Node-RED. Several nodes in IBM Bluemix makes the programming much simpler.

Bluemix helps companies to drive pervasive transformation and provides enterprise-level services to easily integrate with your cloud applications. It enables organizations and developers to quickly and easily create, deploy, and manage the applications on cloud.

In this blog, let us discuss about creating simple IoT applications on Bluemix with Node-RED.

Pre-requisites

Create a Bluemix Account in the below link:
https://console.ng.bluemix.net/registration/
Open IBM IoT Simulator:
https://quickstart.internetofthings.ibmcloud.com/iotsensor/

Use Case

IoT works with cloud and connects everything around you with Internet by using networks of data gathering sensors. You can use a temperature sensor to collect the data and send it to the centralized system, where centralized system IoT application runs in Bluemix. You can also use a web simulator as a temperature sensor.

In this use case, the web simulator is used as the temperature sensor to test the IoT application on Bluemix.

Creating IoT Application on Bluemix

To create an IoT application on Bluemix, perform the following:

Log in to your Bluemix Account (Trial or Paid Account) using the below link:
https://console.ng.bluemix.net/registration/

On the top right corner of the page, click Catalog to navigate to catalog page.
Search “Internet of Things”.
Select Boilerplates category and choose Internet of Things Platform Starter as shown in the below diagram:

You will be redirected to the Internet of Things Platform Starter page.

In the Internet of Things Platform Starter page, provide valid IoT application name.
Select the plan and SDK platforms as shown in the below diagram:
For example, App name is given as “Treselle-IoT” and Node.Js is selected as “SDK platform”.

Click Create button.

Note: You need to wait for few seconds to start and access the application. On successfully starting the application, notification presence will be represented in green color as shown in the below diagram:

Creating Bluemix IoT Application with Node-RED

To create Bluemix IoT Application with Node-RED, perform the following steps:

Once your application starts running, click “Visit App URL” on the top of the page or click Route link if you are on the dashboard to start the Node-RED flow process.
Click Next to move on to the next screen.
Apply secure Node-RED authentication as shown in the below screen (2) diagram:

Click Finish.
You will be redirected to Node-RED on IBM Bluemix get start page as shown in the below screen (1) diagram:

Click “Go to your Node-RED flow editor” in the screen (1) to go to the login page (as shown in screen (2)) for accessing Node-RED editor.
Provide valid user name and password to log in to Node-RED.
You will be rendered with the Node-RED flow editor page with sample IBM IoT application flow model as shown in the below diagram:

Note: Click all nodes, check preset details, and understand the usage of the nodes and its flows.

Launching Simulator on Bluemix

To launch a simulator on Bluemix, perform the following steps:

As Bluemix quick simulator is already hosted on IBM Bluemix for testing purpose, click the below URL to use that web simulator: https://quickstart.internetofthings.ibmcloud.com/iotsensor/

Bluemix simulator will provide unique simulator ID on every browser hit. The sensor will generate temperature and humidity data and will communicate with Node-RED flow to generate the data and dashboard charts.

Copy your simulator unique ID and paste it into Node-RED flow ibmiot node as shown in the below diagram:

Click Simulator button like real time temperature hardware to run the Node-RED flow and to increase or decrease the temperature.
You can view the debug area and output data results based on simulator sensor value changes as shown in the below diagram:

Note: To create charts based on your simulator sensor data, drag the chart node from the node palate and drop it into the workspace.

Conclusion

In this blog, we discussed about IoT application creation with IBM Bluemix and demonstrated Node-RED IBM IoT app flow with simulator. In upcoming blog posts, we will discuss some complex use cases with real-time data stream output. To visualize data using Node-RED, refer our blog on Visualizing Real Time Stream Data using Node-RED.

References

Connecting Raspberry Pi as a Device to Watson IoT using Node-RED:
https://developer.ibm.com/recipes/tutorials/deploy-watson-iot-node-on-raspberry-pi/
UI Dashboard for IoT Device data using Node-RED:
https://developer.ibm.com/recipes/tutorials/ui-dashboard-for-iot-device-data-using-node-red

Overview

Real-time analytics and visualization of stream data have become more vital to transform live data into actionable and valuable insights and to help users focus on their real data needs. Many modern technologies have emerged to manage volume and velocity of big data.

Node-RED dashboard provides real-time stream data visualization and data streaming analytics. In this blog, let us discuss about visualizing and analyzing random car speed data generated by Node-RED function programming. It is used to monitor the speed of the leading car via Node-RED dashboard data analytics.

Let us learn about creating random data generation for multiple objects, using multiple charts, sharing the results via social medias such as Twitter, and providing notification for speed ranges based on random data generated.

Pre-requisites

Download the following from the provided links:

Getting Started with Node-RED

To start Node-RED, perform the following:

Open your command prompt.
Start the Node-RED by using below command:

node-red

Note: For more details about installing Node-RED, refer our previous blog – Node-RED on Windows.

Browse your Node-RED local host IP with port number to view Node-RED workspace panel as shown in the below diagram:

Creating Node-RED Flow

To create a Node-RED flow, perform the following steps:

From the left panel, drag the required nodes from node palate and drop it into the workspace.
Connect all the nodes one by one as shown in the below diagram:

In this use case, the following nodes are used:

Inject node (time stamp) – To set time intervals
Function node – To pass each message
Debug node (Message payload) – To display payload of the message or entire message
Chart node – To create different type of charts
Notification node – To notify leading speed of the car
Twitter node – To tweet speed of a particular car in Twitter

Note: Drag and drop multiple chart nodes as this use case involves multiple charts.

Open inject node (timestamp) to set time intervals as shown in the below diagram:
For example: If the time interval is set as 5 seconds, new data required for visualization and other processes will be uploaded every 5 seconds.

Open the main function node to create simple random data generator code as shown in the below diagram:

Open the chart nodes and create different chart types for different nodes as shown in the below diagram:

Click the notification node and set the time interval to notify current leading cars and its speed ranges as shown below:

Deploying Flow and Debugging

On successfully creating the flow, deploy the flow by clicking on Deploy on the top right corner.

On successfully deploying the flow, debug the flow as shown in the below diagram:

Viewing Results in Twitter

On successfully completing deploy and debug, speed range of that particular car will be tweeted in Twitter as shown in the below diagram:

Similarly, the results can be shared via Email and social networks on using the respective nodes.

Performing Data Stream Analytics

The speed ranges of the cars can be monitored in Node-RED dashboard. Multiple charts and multiple values can be displayed in the dashboard as shown in the below diagram for performing data analytics:

Conclusion

In this blog, we have discussed about random data generator function programming, multiple chart creation, dashboard analytics, notification, and posting results on Twitter. In our next blog, let us discuss some complex use cases with real time data stream analytics. To create IoT applications on Bluemix with Node-RED, refer our blog on IBM Bluemix with Node-RED IoT.

References

Getting Started with Node-RED on Raspberry Pi:
https://randomnerdtutorials.com/getting-started-with-node-red-on-raspberry-pi/
Getting started with Node-RED:
https://opensource.com/life/16/5/getting-started-node-red

Overview

Thingsboard, a leading open-source Internet of Things (IoT) platform, is used for managing devices, collecting data, and processing and visualizing your IoT projects. It aids device connectivity through industry standard IoT protocols such as MQTT, CoAP, and HTTP. It supports both cloud and on-premises deployment.

Thingsboard provides live demo instance for quick testing. Thingsboard can also be installed locally for long term projects. In this blog, let us discuss about step by step procedure for installing Thingsboard on Windows.

Prerequisites

Download and install the following:

Java 8 – https://www.java.com/en/download/help/download_options.xml
Apache Cassandra – http://cassandra.apache.org/download/
Thingsboard Service – https://github.com/thingsboard/thingsboard/releases/download/v1.2.3/thingsboard-windows-1.2.3.zip

Hardware

Windows Machine with OS version 7 and above
Windows 32-bit and 64-bit (both are compatible) machines
RAM 4 GB

Installing Java 8

Install Java 8 on Windows machine. If already installed, check the version of Java by using the below command:

java -version

Installing Apache Cassandra

As Apache Cassandra is needed for Thingsboard service, you need to install Apache Cassandra. In this section, let us discuss about installing Cassandra. If Cassandra is already installed, skip this step.

Installing DataStax

To install DataStax, perform the following steps:

Download DataStax from the required link based on the Windows bit:
- Windows (32 bit): http://downloads.datastax.com/community/datastax-community-32bit_3.0.9.msi
- Windows (64 bit): http://downloads.datastax.com/community/datastax-community-64bit_3.0.9.msi
Double click on the downloaded DataStax Community MSI package to RUN it.
You will be redirected to the setup page as shown in the following diagram (1).
Click Next to move to the End-User License Agreement page as shown in the below diagram (2) and start the installation process of DataStax Community Edition:

Provide the path of the destination folder in the Destination Folder page as shown in the below diagram (3) and click Next.
Check the option in Service Configuration page as shown in the below diagram (4):

You will be redirected to the installation setup completed page.

Click Finish to close the installation window.

On successfully installing DataStax Community Edition, open your installation package from your local machine and check Apache Cassandra related CQL Shell files and so on.

Installing Thingsboard Service

In this section, let us discuss about installing and configuring Thingsboard service.

To install Thingsboard service, perform the following steps:

Download the Thingsboard service package using the below Thingsboard Service URL:
https://github.com/thingsboard/thingsboard/releases/download/v1.2.3/thingsboard-windows-1.2.3.zip
On successfully downloading the service archived file, unarchive the file.
You can get the files as shown in the below diagram:

Open thingsboard folder.
You can get the files as shown in the below diagram:

Open Windows PC administrator command prompt by searching cmd and clicking “Ctrl + Shift + Enter” keys.
Enter the below command to install the Windows batch file:

install.bat

Now, Thingsboard has been successfully installed on your Windows machine.

Provisioning Database Schema

To provision database schema, perform the following steps:

Open Apache Cassandra installation file from your location machine.
Execute 3 cqslh scripts as shown in the below diagram using “Cassandra CQL Shell“:

Execute the below list of queries one by one:
- cqlsh> source ‘c:\thingsboard\data\schema.cql’;
- cqlsh> source ‘c:\thingsboard\data\system-data.cql’;
- cqlsh> source ‘c:\thingsboard\data\demo-data.cql’;

Starting Thingsboard Service

To start Thingsboard service, perform the following steps:

Click Windows Start button.
Search cmd and click “Ctrl + Shift + Enter” keys to execute administrator command prompt.
Use the below command to start the Thingsboard service:

net start thingsboard

Note: To stop and restart the Thingsboard services, use the below commands:

Stop – net stop thingsboard
 Start – net start thingsboard

Browse your default localhost (http://localhost:8080/) with port number on your browser.
You will be redirected to the below Thingsboard local Windows machine sign in page as shown in the below diagram:

Conclusion

In this blog, we discussed about installation of the leading IoT open source cloud platform – Thingsboard on local Windows machine. In the upcoming blogs, let us discuss about configuring external access to Thingsboard webUI with some device connectivity to use communication protocols with some complex use cases. To know about installing Thingsboard IoT gateway on Windows, refer our blog on Thingsboard Gateway IoT.

References

Getting Started: https://thingsboard.io/docs/getting-started-guides/helloworld/

Overview

Thingsboard, a leading open-source IoT platform, enables rapid development, management, and scaling of IoT projects. It is highly scalable, fault-tolerant, robust, efficient, customizable, and durable IoT platform. Thingsboard provided default pre-configured live demo instance for quick testing. Thingsboard IoT gateway integrates devices connected to third-party and legacy systems. In this blog, let us discuss about installing and setting up Thingsboard IoT gateway on Windows and performing testing with Thingsboard live demo instance.

Pre-requisites

Install Java: https://www.java.com/en/download/help/download_options.xml
Download TB Gateway: https://github.com/thingsboard/thingsboard-gateway/releases/download/v1.2.1/tb-gateway-windows-1.2.1.zip
Thingsboard Live Demo: https://demo.thingsboard.io/signup
Thingsboard Local Installation: Version 1.1 or greater is required

Note: To know more about installing Thingsboard locally, refer our previous blog on Thingsboard Installation on Windows.

Use Case

Installing Java 8

Install Java 8 on Windows machine. If already installed, check the version of Java by using the below command:

java -version

Installing TB Gateway

To install TB Gateway, perform the following:

Double-click on the downloaded tb-gateway configuration file that is a pre-archived file. You can view a list of files in the tb-gateway folder as shown in the below diagram:

Click Windows Start button.
Search cmd and press “Ctrl + Shift + Enter” keys to open Administrator command prompt.
Go to tb-gateway folder directory and enter the below command to install TB Gateway: C:\tb-gateway>install.bat

Provisioning IoT Gateway

In this section, let us discuss about provisioning the Gateway by using Live Demo instance. To connect IoT Gateway to Thingsboard server, gateway credentials are required. Let us consider access token credentials for simple and better understanding. To provision the gateway, perform the following:

Create a tenant administrator account.
Log in to Thingsboard.
On the left pane, click Device option as shown in the below diagram:

Name the device.
Enable “Is gateway” option as shown in the below diagram:

On successfully creating the device, open the device card.
Click “COPY ACCESS TOKEN” button to copy your access token as it is needed during configuration process.

Configuring IoT Gateway

To configure IoT gateway, perform the following steps:

Open tb-gateway configuration folder as shown in the below diagram:

Open tb-gateway.yml file.
Change gateway.connection.host property to Thingsboard host (Use “demo.thingsboard.io” as live demo instance is used).
Change gateway. security. accessToken property to newly created device token that was copied during IoT gateway provisioning. The tb-gateway.yml file configuration setup looks similar to the below diagram:

On completing edit, save the file and close it.

Launching IoT Gateway

To launch IoT Gateway, perform the following:

Open the administrator command prompt.
Execute the below command to start the tb-gateway process:

net start tb-gateway

Note: To stop or restart the tb-gateway process, use the below commands:

Stop – net stop tb-gateway
Start – net start tb-gateway

IoT Gateway Statistics

To view the IoT gateway statistics, perform the following:

On successfully starting the tb-gateway, go to open your device.
Click “LATEST TELEMETRY“.
Check “devicesOnline, attributesUploaded, and telemetryUploaded” Keys.
Ensure that values of all the keys are 0 as shown in the below diagram to confirm successful connection of IoT gateway with Thingsboard server.

Conclusion

In this blog, we discussed about installing and configuring Thingsboard IoT gateway with proper commands. In our next blog, let us discuss about connecting IoT gateway extension with external communication protocol broker, and mapping JSON with custom protocol message mappers and so on.

References

Getting Started Thingsboard: https://thingsboard.io/docs/getting-started-guides/helloworld

Overview

Datameer, an end-to-end big data analytics platform, is built on Apache Hadoop to perform integration, analysis, and visualization of massive volumes of both structured and unstructured data. It can be rapidly integrated with any data sources such as new and existing data sources to deliver an easy-to-use, cost-effective, and sophisticated solution for big data analytics.

It simplifies data extraction, data transformation, data loading, and real-time data retrieval. It helps to gain actionable insights from complex organizational data through data preparation and analytics. In this blog, let us discuss about importing, analyzing, and visualizing large volume of financial or bank data in Datameer.

Pre-requisite

Download and install Datameer 6.1.14 from the below link:
https://www.datameer.com/direct/

Use Case

The financial data file such as CSV, Excel, and so on is considered for importing into Datameer before starting data analysis. A workbook is created to associate with the data. A database connection is established to link the data with the database.

Importing Data into Datameer

In this section, let us discuss about importing the data into Datameer.

Uploading Files

To upload a file, perform the following steps:

Open Datameer.
In the left panel, click FileUploads –> Create new –> File upload to upload a file into Datameer as shown in the below diagram:

Click Browse and upload the required file.
Choose File Type and click Next.
Enter Data Details and Define Fields in the subsequent tabs.
Configure the file and Save it.

Adding Data to Workbook

To add data into a workbook, right-click on the uploaded file and choose Add Data To New Workbook. The data will be added to the workbook as shown in the below diagram:

Establishing Database Connection

You can create a connection with any type of databases such as DB2, MySQL, or Oracle. To establish a database connection, add appropriate database drivers to Datameer installation.

Adding Database Connection

To add a database connection, perform the following steps:

In the left panel, click Connections –> Connection as shown in the below diagram:

You will be redirected to the New Connection page.

Choose the required Type of database.
Provide Connection Details and Save it. The newly added connection will be displayed under the Connections menu as shown in the below diagram:

Adding Jar File

To add a jar file, perform the following steps:

Click View –> Admin Tab.
In the left panel, click Database Drivers –> New as shown in the below diagram:

Provide database driver details to add a new database driver.
Click Save to save the details. The new database driver will be added and will be listed in the Database Drivers tab as shown in the below diagram:

Fetching Data from Database

To fetch data from the database, perform the following steps:

In the left panel, click FileUploads –> Create New –> Import Job as shown in the below diagram:

You will be redirected to the New Import Job tab.

Choose the Connection by clicking Select Connection.
Select the required connection and click Next.
Provide Data Details and click Next.
Select the required Data Fields as shown in the below diagram:

Click Next.
Provide Schedule details to schedule the data import and click Next.
Provide the required location to Save the data as shown in the below diagram:

The file will be saved in the destination folder as shown in the below diagram:

Analyzing Data in Datameer

In this section, let us discuss about analyzing the data in Datameer.

Data Description

Yearly loan data of a financial institution is used as a data source for analysis. The dataset is as follows:

Setting up Data for Analysis

To set up the data for analysis, Datameer has provided the following four capabilities:

Formulas
Filtering
Joining
Sorting

Using the above capabilities, you can locate numbers, trends, or other information needed for analysis. In this section, let us discuss about formulas and joining capabilities in Datameer. To set up the data for analysis using formulas, perform the following steps:

Log in to Datameer using your login credentials.
In the left pane, click Connection –> Workbook.
Open the required workbook. A popup window with Formula Builder tab will be opened as shown below:

Setting up Data using Formulas

Formulas – Grouping Records with GROUPBY

This function is used to create groups of records based on the column selected. In the left pane of Formula Builder, select Grouping and choose GROUPBY in the relevant right pane to group the records in a column as shown below: The grouped records will be displayed as shown in the below diagram:

Formulas – Counting Records with GROUPCOUNT

This function is used to count the records in a group. In the left pane of Formula Builder, select Grouping and choose GROUPCOUNT in the relevant right pane to count the records in a group as shown below:

Formulas – Comparing Records with COMPARISON

This function is used to compare records in two different columns. In the left pane of Formula Builder, select Comparison and choose COMPARE in the relevant right pane to compare the records in the selected two columns as shown below: Few comparison data types are as follows:

Setting up Data using Data Joins

To join data from two columns, perform the following steps:

Open the saved workbook.
Click Join to start joining data from two different sheets as shown below:

Click join type to join data as shown below:

Visualizing Data

After setting up the data, visualization can be easily created in the form of graphs and charts for performing analysis. To visualize data, click Add Tab icon and choose Infographic to visualize the data as shown below:

Conclusion

In this blog, we discussed about importing data into Datameer, setting up data for analysis, and visualizing data in Datameer.

References

Datameer Tutorials:
https://www.datameer.com/learn/tutorials/
Analyzing Data:
https://www.datameer.com/documentation/display/DAS50/Analyzing+Data
End to End Datameer – Basics:
https://vimeo.com/46828412