Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Enables data scientists to compose pipelines of analysis which consist of data manipulation, exploratory analysis & reporting, as well as modeling steps. Data scientists can use tools of their choice through an R interface, and compose interoperable pipelines between R, Spark, and Python.
This tool parses log data and allows to define analysis pipelines for anoamly detection. It was designed to run the analysis with limited resources and lowest possible permissions to make it suitable for production server use.
What people are tweeting about now, in your desired location? Live Streaming of Twitter Data to Spark and Tweet Analysis application on various trends Eg: Trending HashTags, Trending Mentions etc. Location based features supported.