COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200720064040/https://github.com/topics/bigdata
Here are
1,264 public repositories
matching this topic...
An open-source big data platform designed and optimized for the Internet of Things (IoT).
A curated list of awesome big data frameworks, ressources and other awesomeness.
Updated
Jun 24, 2020
Java
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Updated
Jul 18, 2020
Python
Python clone of Spark, a MapReduce alike framework in Python
Updated
Jan 23, 2019
Python
大数据面试题,大数据成神之路开启...Flink/Spark/Hadoop/Hbase/Hive...
An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Updated
May 31, 2020
Java
Apache Avro is a data serialization system.
Updated
Jul 17, 2020
Java
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Updated
Jul 17, 2020
Java
Distributed Big Data Orchestration Service
Updated
Jul 16, 2020
Java
Upserts, Deletes And Incremental Processing on Big Data.
Updated
Jul 20, 2020
Java
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Updated
Sep 6, 2017
Jupyter Notebook
The Programming Language Designed For Big Data and AI
Updated
Jul 16, 2020
JavaScript
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Updated
Jul 15, 2020
Java
A Kubernetes Native Batch System (Project under CNCF)
C# and F# language binding and extensions to Apache Spark
🚚 Agile Data Science Workflows made easy with Pyspark
Updated
Jul 19, 2020
Jupyter Notebook
Google, Naver multiprocess image web crawler (Selenium)
Updated
Jun 25, 2020
Python
Lightweight real-time big data streaming engine over Akka
Updated
Jun 23, 2020
Scala
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Updated
Jul 18, 2020
Jupyter Notebook
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Detect threats with log data and improve cloud security posture
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Updated
Apr 24, 2020
Python
A book about running Elasticsearch
Fast topic modeling platform
🎯 🌟 [大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
A serverless cluster computing system for the Go programming language
d3 library to build circular graphs
Updated
Jul 17, 2020
JavaScript
Improve this page
Add a description, image, and links to the
bigdata
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
bigdata
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.