Flamingo

↧

Image may be NSFW.
Clik here to view.

Apache Drill vs Amazon Athena – A Comparison on Data Partitioning

September 1, 2017, 6:25 am

Overview Big data exploration in almost all fields has led to the development of multiple big data technologies such as Hadoop (Hive, HDFS, Pig, HBase), NoSQL databases (MongoDB), and so on for...

View Article

Image may be NSFW.
Clik here to view.

Amazon Athena & Tableau – Serverless Interactive Query Service and Business...

September 1, 2017, 6:32 am

Overview Amazon Athena, a serverless query service in Amazon Simple Storage Service (S3) and a pay per service, is used to easily analyze data using standard SQL in S3. It has a very high query...

View Article

Image may be NSFW.
Clik here to view.

Self Service Analytics using Dremio

September 22, 2017, 4:34 am

Overview Dremio, a self-service data platform, helps data analysts and data scientists to determine, organize, accelerate, and share any data at any time irrespective of volume, velocity, location, or...

View Article

Image may be NSFW.
Clik here to view.

Data Quality Checks with StreamSets using Drift Rules

September 22, 2017, 4:53 am

Overview In the world of big data, data drift has emerged as a critical technical challenge for data scientists and engineers in unleashing the power of data. It delays businesses from gaining...

View Article

Image may be NSFW.
Clik here to view.

Handle Class Imbalance Data with R

September 28, 2017, 4:08 am

Overview Imbalanced data refers to classification problems where one class outnumbers other class by a substantial proportion. Imbalanced classification occurs more frequently in binary classification...

View Article

Image may be NSFW.
Clik here to view.

API Response Tracking with StreamSets, Elasticsearch, and Kibana

October 6, 2017, 4:51 am

Overview RESTful API JSON response data can be used to view various aspects such as pipeline configuration or monitoring information of the StreamSets Data Collector. This API response information can...

View Article

Image may be NSFW.
Clik here to view.

Import and Ingest Data into HDFS using Kafka in StreamSets

October 13, 2017, 4:14 am

Overview StreamSets provides state-of-art data ingestion to easily and continuously ingest data from various origins such as relational databases, flat files, AWS, and so on, and write data to various...

View Article

Image may be NSFW.
Clik here to view.

Kylo – Self-Service Data Ingestion, Cleansing, and Validation (No Coding...

October 22, 2017, 11:34 pm

Overview Kylo, a feature-rich data lake platform, is built on Apache Hadoop and Apache Spark. Kylo provides a business-friendly data lake solution and enables self-service data ingestion, data...

View Article

Image may be NSFW.
Clik here to view.

Predict Lending Club Loan Default Using Seahorse and SparkR

October 23, 2017, 12:16 am

Overview Data scientists are using Python and R to solve data problems due to the ready availability of these packages. These languages are often limited as the data is processed on a single machine,...

View Article

Image may be NSFW.
Clik here to view.

Data Quality Metrics using Talend Data Quality Management

October 30, 2017, 12:16 am

Overview Data Quality is the process of examining data in different data sources according to predefined business goals. It helps to improve the quality of the data and collect statistics and...

View Article

Image may be NSFW.
Clik here to view.

Kylo – Automatic Data Profiling and Search-based Data Discovery

October 30, 2017, 12:33 am

Overview Data profiling is the process of assessing data values and deriving statistics or business information about the data. It allows data scientists to validate data quality and business analysts...

View Article

Image may be NSFW.
Clik here to view.

Sensor Data Quality Management using PySpark & Seaborn

November 14, 2017, 5:31 am

Overview Data Quality Management (DQM) is the process of analyzing, defining, monitoring, and improving quality of data continuously. Few data quality dimensions widely used by the data practitioners...

View Article

Image may be NSFW.
Clik here to view.

Predict Bad Loans with H2O Flow AutoML

November 15, 2017, 2:41 am

Overview Machine learning algorithms play a key role in accurately predicting loan data of any bank. The greatest challenge in machine learning is to employ the best models and algorithms to accurately...

View Article

Image may be NSFW.
Clik here to view.

Crime Analysis Using H2O Autoencoders – Part 1

December 8, 2017, 4:51 am

Overview Nowadays, Deep Learning (DL) and Machine Learning (ML) are used to analyze and accurately predict data. Machine Learning models are used to accurately predict crimes. Crime prediction not only...

View Article

Image may be NSFW.
Clik here to view.

Streaming Analytics using Kafka SQL

December 8, 2017, 4:57 am

Overview Kafka SQL, a streaming SQL engine for Apache Kafka by Confluent, is used for real-time data integration, data monitoring, and data anomaly detection. KSQL is used to read, write, and process...

View Article

Image may be NSFW.
Clik here to view.

Crime Analysis Using H2O Autoencoders – Part 2

December 8, 2017, 5:10 am

Overview This is the second part of a two-part series of Crime Analysis using H2O Autoencoders. In our previous blog on Crime Analysis Using H2O Autoencoders – Part 1, we discussed building the...

View Article

Image may be NSFW.
Clik here to view.

Ingest IoT Sensor Data into S3 with Raspberry Pi3 & StreamSets Data Collector...

December 26, 2017, 5:04 am

Overview Due to increasing amount of data produced from outside of source systems, enterprises are facing difficulties in reading, collecting, and ingesting data into a desired, central database...

View Article

Image may be NSFW.
Clik here to view.

Custom Partitioning and Analysis using Kafka SQL Windowing

January 29, 2018, 1:51 am

Overview Apache Kafka uses round-robin fashion to produce messages to multiple partitions. Custom partition technique is used to produce a particular type of message in the defined partition and to...

View Article

Image may be NSFW.
Clik here to view.

Customer Churn – Logistic Regression with R

February 21, 2021, 12:37 am

1 Overview 2 Learning/Prediction Steps 2.1 Data Description 2.2 Data Preprocessing 2.3 Partitioning the Data & Logistic Regression 2.4 Model Summary 2.5 Prediction Accuracy 3 References Overview...

View Article

Image may be NSFW.
Clik here to view.

Embrace Relationships with Neo4J, R & Java

February 21, 2021, 12:40 am

2 Use Case 3 Solution 3.1 Prerequisites 3.2 Download StackOverflow Dataset 3.3 Data Manipulation with R 3.4 Create Nodes and Relationship file with Java 3.5 Create GraphDB with Batch Importer 3.6...

View Article