The Wayback Machine - https://web.archive.org/web/20200714150703/https://github.com/topics/big-data
Skip to content
#

big-data

Here are 2,109 public repositories matching this topic...

ines
ines commented Sep 29, 2019

I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration For existing plugins and projects, check out the spaCy universe.

If you have questions about the projects I suggested,

presto
rongrong
rongrong commented Dec 12, 2019

Currently array_position only returns the first occurrence of the given element. We want to extend array_position to take an additional parameter instance similar to strpos.

array_position(x, element, instance) -> bigint

For example, for array x: [1, 3, 2, 1, 4, 3, 2, 5, 4, 1]:

array_position(x, 1) = 1 -- existing function
array_position(x, 1, 1) = 1 -- same as exist
erkankarabulut
erkankarabulut commented Apr 15, 2020

I am trying to deploy the app with the given ./sbt clean dist command but I got this error:

Downloading sbt launcher for 1.3.8:
  From  https://repo.scala-sbt.org/scalasbt/maven-releases/org/scala-sbt/sbt-launch/1.3.8/sbt-launch-1.3.8.jar
    To  /root/.sbt/launchers/1.3.8/sbt-launch.jar
Downloading sbt launcher 1.3.8 md5 hash:
  From  https://repo.scala-sbt.org/scalasbt/maven-releas
ram-bv
ram-bv commented Apr 12, 2020

Stefan Behnel wrote:

No. "@cython.cfunc" declares a function or method as a pure C function,
without a Python interface to it, and for methods, it only applies to
extension types and not regular Python classes.

It's interesting that Cython allowed you to set it on the "__iter__" method
which cannot, in fact, be a C method because it's one of Python's special
methods. We s

Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
  • Updated Jul 14, 2020
  • Jupyter Notebook
janl
janl commented Mar 4, 2020

Summary

CouchDB keeps a list of purge infos to ensure that purges can be applied on a cluster without purged documents being re-introduced by internal replication.

It would be useful to make this list available for replication clients like PouchDB, who then could apply local purges on their own. I know PouchDB doesn’t implement purge just yet, but it’s something that folks will need befor

ncherkas
ncherkas commented Apr 2, 2020

In the Cloud-native K8S environment, the logging architecture almost always assumes that all needed logs are sent to the stdout. It works as a unified source of logs where different tools read them, re-organize if needed, and route to the destinations like Analytics Dashboards etc.

Hazelcast Diagnostics are very useful when troubleshooting the performance and stability issues but currently, it

yiheng
yiheng commented Jul 11, 2018

Spark 2.3 officially support run on kubernetes. While our guide of "Run on Kubernetes" is still based on a special version of Spark 2.2, which is out of date. We need to:

  1. update that document to Spark 2.3
  2. release the corresponding docker images.
vespa
pinankg
pinankg commented May 24, 2019

Hello Vespa Team,
Can you please consider support a properties-file which is available during run-time along with the model ? So some meta-data e.g. threshold/label, etc can be associated with the model.
This is for the stateless evaluation of the models like (XGBoost, TensorFlow, Onnx, etc) which is supported in vespa.

Thank you,
Pinank

ramkumarkb
ramkumarkb commented Feb 5, 2020

I have noticed a small error in the documentation around S3 configurations:
https://docs.delta.io/latest/delta-storage.html#amazon-s3

On the read part, it should be load and not save:
spark.read.format("delta").load("s3a://<your-s3-bucket>/<path>/<to>/<delta-table>")

Also, I have successfully tested Delta 0.5.0 with on-premise S3 - https://min.io
There were some quirks around the

xbyang18
xbyang18 commented Nov 19, 2019

数据源对接jdbc presto,presto sql不复杂,但一个任务对接多个看板,返回数据 是null,查看 代码:
synchronized (context) {
context.wait(10 * 60 * 1000);
}
唤醒代码:
synchronized (context) {
context.setData(data);
context.notify();
}
返回的data=null,
我将自动等待时长加到30 * 60 * 1000,错误还是一样

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.