The Wayback Machine - https://web.archive.org/web/20211125184642/https://github.com/topics/big-data
Skip to content
#

big-data

Here are 2,704 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
  • Updated Nov 4, 2021
  • Python
Bluenix2
Bluenix2 commented Aug 7, 2021

Is your feature request related to a problem? Please describe.
Many static type checkers have issues finding Cython's stubs.
Here is from running mypy on my current project:

error: Skipping analyzing "cython": found module but no type hints or library stubs

The same issue can be seen when using import Cython as cython:

error: Skipping analyzing "Cython": found module but 
nickhuangxinyu
nickhuangxinyu commented Sep 25, 2021

usually, after trained model. i save model in cpp format with code:

cat_model.save_model('a', format="cpp")
cat_model.save_model('b', format="cpp")

but when my cpp need to use multi models.

in my main.cpp

#include "a.hpp"
#include "b.hpp"

int main() {
  // do something
  double a_pv = ApplyCatboostModel({1.2, 2.3});  // i want to a.hpp's model here
  double b_pv 

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
  • Updated Nov 25, 2021
  • Jupyter Notebook
jovanpop-msft
jovanpop-msft commented Aug 18, 2021

Could we clarify that delta-log files are JSON line-delimited files in https://github.com/delta-io/delta/blob/master/PROTOCOL.md#delta-log-entries ?

In the PROTOCOL.md file it is not clear what is the format of JSON. Every delta-log entry file is "new-line delimited json file", but this is not specified in this file. Protocol do not explicitly specify that every action is stored as a single-lin

vespa
abhimech001
abhimech001 commented Oct 6, 2021

Hi Team,

I have created a vespa multinode stack and performed a load test for search query

  1. search with single field with 60s timeout - median was consistent around 150 to 200ms
    endpoint : /search/
    http request : POST
    Body:
    {
    "yql":"select * from sources * where table contains "testing.vespa.search"
    }
  2. search with more fields and conditions - median was 100+ secs which is very
proddata
proddata commented Nov 17, 2021

Use case:
Selecting a subset of array elements of an existing array inside the database without the need to unnest the array first or use a user-defined function.

CREATE TABLE t1 (id INTEGER, tags ARRAY(TEXT));
INSERT INTO t1 (id, tags) VALUES (1, ['database','search engine','document store']);
SELECT array_slice(tags,2,3) FROM t1;
--> ['search engine','document s

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more