The Wayback Machine - https://web.archive.org/web/20201115145108/https://github.com/topics/apache-spark
Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 954 public repositories matching this topic...

PhilipMay
PhilipMay commented Jun 12, 2020

MLflow seems to have a length limit of 5000 when setting tags (see below).

[...]
  File "/home/smay/miniconda3/envs/py38/lib/python3.8/site-packages/mlflow/utils/validation.py", line 136, in _validate_length_limit
    raise MlflowException(
mlflow.exceptions.MlflowException: Tag value '[0.8562690322984875, 0.8544098885636596, 0.8544098885636596, 0.8544098885636596, 0.85440988856365
thrixton
thrixton commented Jul 13, 2020

This is more a question than a feature request.

When parsing JSON files, I need to sanitize the field names so field with spaces becomes field_with_spaces.
I want to preserve the original name as well, metadata about the column if you like :)

There is a metadata field on StructField, but it is internal.
Why is this internal, is it possible or desirable to expose it?

Created by Matei Zaharia

Released May 26, 2014

Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala
You can’t perform that action at this time.