Thursday, July 24, 2014

Hadoop 101: Spring Batch with Spring Hadoop

Everything about Apache Hadoop seems big. First, its all about big data. Its users are internet giants including Facebook, Yahoo! and Google. And it’s ecosystem is also large.

Our Hadoop 101 series of posts is meant for the newbies looking for some pointers and primers on where they need to start learning, as well as provide a comprehensive overview what technologies help slim down that critical Time-to-Insight (TTI).

In a previous post, we explained the MapReduce framework, covered how a word count program fits within it, and then compared a basic word count program in Hadoop, Pig, Hive, and Cascading.

Today we are going to look at how developers can speed up java development using Spring Hadoop. We will cover examples of how Spring Hadoop interfaces with the rest of the Spring framework, and show you how to code and configure Spring Hadoop with Spring Batch.

Read more here.

MLlib: Apache Spark component for machine learning

The Hadoop Summit 2014 in San Jose (June 3-5) brought many innovations to the Hadoop ecosystem, but the one I was most eager to hear about was what was happening with the MLlib component of Apache Spark. Spark 1.0.0 was released just before the conference on May 30 (and a new 1.0.1 release found its way out on July 11).

In case you’re not familiar with Spark and MLlib, let me get you quickly up to speed. Spark is a distributed in-memory computation framework, and the project is almost a year old. Apache Spark provides primitives for in-memory cluster computing which is well suited for large-scale machine learning purposes. MLlib is a Spark component focusing on machine learning, with many developers now creating practical machine learning pipelines with MLlib. It became a standard component of Spark in version 0.8 (Sep 2013). The initial contribution for the Spark subproject was from UC Berkeley AMPLab. Due to the rapid adoption of Spark, MLlib has received more and more attention and contributions from the open source machine learning community. At this time, 50+ developers from the open source community have contributed to its codebase. MLlib has features for classification, regression, collaborative filtering, clustering, and decomposition (SVD and PCA).

Read more here.

DBAccess: a Thread-safe, Efficient Alternative to Core Data

DBAccess is a new ORM for iOS that promises to improve on Apple's Core Data by providing thread-safety and high performance.
DBAccess claims to provide three key benefits over Core Data:
High performance and support for query performance fine tuning.
Event model that enables binding data objects to UI controls and keep them updated with changes made in the database.

DBAccess: a Thread-safe, Efficient Alternative to Core Data