Flamingo

↧

Image may be NSFW.
Clik here to view.

Sales Data Analysis using Dataiku DSS

June 15, 2017, 5:01 am

Overview Dataiku Data Science Studio (DSS), a complete data science software platform, is used to explore, prototype, build, and deliver data products. It significantly reduces the time taken by data...

View Article

Image may be NSFW.
Clik here to view.

Apache Spark on YARN – Performance and Bottlenecks

June 20, 2017, 12:19 am

Overview Apache Spark 2.x version ships with second-generation Tungsten engine. This engine is built upon ideas from modern compilers to emit optimized code at runtime that collapses the entire query...

View Article

Image may be NSFW.
Clik here to view.

Apache Spark on YARN – Resource Planning

June 20, 2017, 12:22 am

Overview This is the second article of a four-part series about Apache Spark on YARN. As Apache Spark is an in-memory distributed data processing engine, the application performance is heavily...

View Article

Image may be NSFW.
Clik here to view.

Apache Spark Performance Tuning – Degree of Parallelism

June 20, 2017, 12:23 am

Overview This is the third article of a four-part series about Apache Spark on YARN. Apache Spark allows developers to run multiple tasks in parallel across machines in a cluster or across multiple...

View Article

Image may be NSFW.
Clik here to view.

Apache Spark Performance Tuning – Straggler Tasks

June 20, 2017, 12:24 am

Overview This is the last article of a four-part series about Apache Spark on YARN. Apache Spark carefully distinguishes “transformation” operation into two types such as “narrow” and “wide”. This...

View Article

Image may be NSFW.
Clik here to view.

Protractor with Cucumber

June 27, 2017, 4:44 am

Overview Protractor, an end-to-end testing framework, supports Jasmine and is specifically built for AngularJS application. It is highly flexible with different Behavior-Driven Development (BDD)...

View Article

Image may be NSFW.
Clik here to view.

Distributed Load Testing using Apache JMeter

June 27, 2017, 5:00 am

Overview Distributed load testing is a process of simulating very high work load of enormous number of users using multiple systems. As a single system cannot generate large number of threads (users),...

View Article

Image may be NSFW.
Clik here to view.

Data Normalization and Filtration Using Drools

July 4, 2017, 2:30 am

Overview Drools, a Rule Engine, is used to implement an expert system using a rule-based approach. It is used to convert both structured and unstructured data into transient data by applying business...

View Article

Image may be NSFW.
Clik here to view.

Building a RESTful API Using LoopBack

July 4, 2017, 2:42 am

Overview LoopBack, an easy to learn and understand open-source Node.js framework, allows you to create end-to-end REST APIs with less code compared to Express and other frameworks. It allows you to...

View Article

Image may be NSFW.
Clik here to view.

Pivoting and Unpivoting Multiple Columns in MS SQL Server

July 4, 2017, 2:58 am

Overview MS SQL Server, a Relational Database Management System (RDBMS), is used for storing and retrieving data. Data integrity, data consistency, and data anomalies play primary role when storing...

View Article

Image may be NSFW.
Clik here to view.

Data Flow Pipeline using StreamSets

July 4, 2017, 3:14 am

Overview StreamSets Data Collector, an open-source, lightweight, powerful engine, is used to stream data in real time. It is a continuous big data ingest and enterprise-grade infrastructure used to...

View Article

Image may be NSFW.
Clik here to view.

Database Performance Testing with Apache JMeter

July 25, 2017, 3:22 am

Overview Database performance testing is used to identify performance issues before deploying database applications for end users. Database load testing is used to test the database applications for...

View Article

Image may be NSFW.
Clik here to view.

Visualize IoT data with Kaa and MongoDB Compass

July 25, 2017, 3:56 am

Overview Kaa is a highly flexible, open source middleware platform for Internet of Things (IoT) product development. It provides a scalable, end-to-end IoT framework for large cloud-connected IoT...

View Article

Image may be NSFW.
Clik here to view.

Nginx with GeoIP MaxMind Database to Fetch User Geolocation Data

July 25, 2017, 4:16 am

Overview Geolocation data of a user plays a significant role in business marketing. This data is used to promote or market any brand or product or service in that specific area to which the user...

View Article

Image may be NSFW.
Clik here to view.

Apache NiFi – Data Crawling from HTTPS Websites

July 25, 2017, 4:23 am

Overview Apache NiFi, a very effective, powerful, and scalable dataflow building platform, is used to process and distribute data and to automate data flow between systems. In this blog, let us discuss...

View Article

Image may be NSFW.
Clik here to view.

Airflow to Manage Talend ETL Jobs

July 25, 2017, 4:46 am

Overview Airflow, an open source platform, is used to orchestrate workflows as Directed Acyclic Graphs (DAGs) of tasks in a programmatic manner. An airflow scheduler is used to schedule workflows and...

View Article

Image may be NSFW.
Clik here to view.

Nginx with GeoIP2 MaxMind Database to Fetch User Geolocation Data

July 26, 2017, 10:59 pm

Overview This is second part about fetching user geolocation data using Nginx and MaxMind Database. In our previous blog on Nginx with GeoIP MaxMind Database to Fetch User Geolocation Data, we...

View Article

Image may be NSFW.
Clik here to view.

MySQL to Amazon Aurora – Diverse Ways of Data Migration

July 26, 2017, 11:28 pm

Overview Amazon Aurora, a simple and cost effective relational database engine, is used to set up, operate, and scale MySQL deployments. It possesses speed and reliability of high-end commercial...

View Article

Image may be NSFW.
Clik here to view.

Drill Data with Apache Drill – Part 2

July 27, 2017, 12:16 am

Overview This is second part about drilling data with Apache Drill. Apache Drill is an open source low latency SQL on Hadoop query engine for larger datasets. The latest version of Apache Drill is 1.10...

View Article

Image may be NSFW.
Clik here to view.

Data Analysis Using Apache Hive and Apache Pig

August 11, 2017, 6:01 am

Overview Apache Hive, an open-source data warehouse system, is used with Apache Pig for loading and transforming unstructured, structured, or semi-structured data for data analysis and getting better...

View Article