Apache Spark: What? Why? How? But…. Why a new framework (don’t we have MapReduce already)? MapReduce requires data to be serialized to disk between each step, which means that the I/O cost of a MapReduce job is high, making interactive analysis and iterative algorithms very expensive.
Splunk is a platform used to search, monitor, analyze and visualize machine data. Data inputs are used to write data into Splunk for later analysis. Splunk provides many default data input settings for common use cases like reading data from files/directories, listening on ports, but these are not useful in certain scenarios e.g. when you have […]
This post describes design and implementation of a scalable architecture to monitor and visualize sentiment against a twitter hashtag in real-time. The project streams live tweets from Twitter against a hashtag, performs sentiment analysis on each tweet, and calculates the rolling mean of sentiments. This sentiment mean is continuously sent to connected browser clients and displayed in […]
R is one of the most widely used programming languages for data and statistical analysis. At eMumba we use R heavily to make sense out of data, to find patterns and for general exploratory data analysis. Generally, results of these analyses are fed into machine learning models to solve various classification and regression problems. In […]
In this post, I demonstrate how you can use Apache Spark’s machine learning libraries to perform binary classification using logistic regression. The dataset I am using for this demo is taken from Andrew Ng’s machine learning course on
Apache Storm Continuing on the log analysis journey, in this post I explore Apache Storm. Apache Storm is a framework for real time, distributed, fault tolerant computation. Storm gives you a set of abstractions to help build systems that can analyze a large volume of streaming data in real time. Here is an excellent talk […]
Big data processing in general and log analysis in particular is not new but that doesn’t mean it is easy. It’s a hard problem to deal with. A lot of companies have written extensively on how they developed solutions around their needs. So the good news is that you can avoid mistakes by learning from […]
Google?s reCAPTCHA is an industry standard when it comes to fighting bots. Integrating it in a regular web app has almost become a no-brainer, thanks to plugins available on almost every platform to do the job. With this expectation, I started integrating it in my latest app built on ReactJS but encountered many roadblocks. Without […]