When I was working for Zaloni, we had the issue of creating a custom data source for Spark, so we could connect to a REST endpoint. The team worked thoroughly to make this in Java. I presented this work at Spark Summit in Dublin last year, and now it is one of the topics of chapter 9 of Spark with Java, published in MEAP (Manning early access program) at Manning Publications.
In this chapter, you will see how to find third-party data sources for ingestion, understand the benefits of building your own data source, build your own data source, and, finally build a JavaBean data source, allowing you to have anything as a data source to Spark. Sweet, no? The practical example I chose is to import EXIF (photo metadata) as a dataframe. There’s a lot of code in this chapter, and it’s all Java.
Appendix S is out too, standing for enough of Scala. Nope: it is not a diatribe against Scala. Promised.
This chapter is released as the first set of reviews for the book are coming (around the first third of the book). I must admit I am pretty pleased with the feedback, despite some technical issues that impaired the review. Here are a few of the quotes.
I would say that this is the best book on Spark I’ve read. -Kelvin Johnson
One of the most simple, but powerful introductions and dive-ins that you can ever have on an Apache library! -Igor Franca
A great book for beginners and prospective experts. -Markus Breuer
Could be worse, right? Promised, I did not pay them. As a result, overall, the group, composed of twelve reviewers, granted me a 4.0 Amazon-like rating… Not so bad. What would you rate it?
On a side note, I will be speaking at NC Tech on August 22nd 2018 and All Things Open, the great open source event on October 21st to 23rd, 2018. You probably guessed the topic…