data-science
- Sign up for GitHub or sign in to edit this page
Here are 9,084 public repositories matching this topic...
Currently, using OrdinalEncoder with a string-valued feature, and without categories
explicitly specifying an order, means that OrdinalEncoder will number the categories according to their lexicographic ordering.
This is not appropriate if the categories have a natural ordering (e.g. ['Green', 'Amber', 'Red']) that can be harnessed by the downstream estimator.
Rather, we should allow the u
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
-
Updated
Nov 23, 2019 - Jupyter Notebook
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Updated
Nov 23, 2019 - Python
I got a conllU file, from my university, where the head column is filled with .
Processing such file with the cli.convert method will result in a int cast error in
https://github.com/explosion/spaCy/blob/master/spacy/cli/converters/conllu2json.py line 73
in the read_conllx method (head = (int(head) - 1) if head != "0" else id).
In the format documentation on https://universaldependencie
Since #11953 was merged a couple of extra simplification can be done:
See in particular this comments.
The value DICT_IS_ORDERED
in IPython/lib/pretty.py
is always True
; any code that reply on it can be simplified; and the value should be documented for future removal.
This is a good issue for a first time contri
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
Updated
Nov 23, 2019 - Python
Your new Mentor for Data Science E-Learning.
-
Updated
Nov 23, 2019 - Jupyter Notebook
:memo: An awesome Data Science repository to learn and apply for real world problems.
-
Updated
Nov 23, 2019
latex support
I tried to use latex in dash, but it is not working.
It seems that the mathjax javacript library is not loaded.
Blueprint
The "Python Machine Learning (1st edition)" book code repository and info resource
-
Updated
Nov 23, 2019 - Jupyter Notebook
The usage example in the word2vec.py doc-comment regarding KeyedVectors
uses inconsistent paths and thus doesn't work.
If vectors were saved to a tm
Dive into Machine Learning with Python Jupyter notebook and scikit-learn!
-
Updated
Nov 23, 2019
VIP cheatsheets for Stanford's CS 229 Machine Learning
-
Updated
Nov 22, 2019
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): pip install ray
- Ray version: 0.7.4
- Python version: 3.6
Describe the problem
After having successfully trained and restore an agent, one very common use case might be to make deterministic action given a state. After training, or wh
Deep learning library featuring a higher-level API for TensorFlow.
-
Updated
Nov 23, 2019 - Python
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
-
Updated
Nov 23, 2019
A curated list of awesome big data frameworks, ressources and other awesomeness.
-
Updated
Nov 23, 2019
Question
- Can PretrainedTransformerTokenizer track character offset like WordTokenizer?
Since character offset is important to calculate answer span after wordpiece tokenization?
i'm a newbie in programming. I try to use this library. it's very useful for me.
I want to show centroid in K-means clustering. how to show it? thank u so much..
Svr training error
When pressing the Enter key in the Wikidata login form from the Wikidata extension, one would expect the form to be submitted, which currently does not happen.
Description
from a conversation with @anargyri:
It would be more appropriate to have a folder called tuning and, under that folder, azureml and nni and the spark tuning code. This would require testing again all the tuning notebooks, so I would leave it for a separate PR. Hyperdrive and NNI rely on several path names, so they will brea
I can not find a guide on choosing TPOT parameters. I know the API is explained in the documents but its too brief. TPOT seems made for users unskilled in ML and GP. I made another issue with my many questions. "We recommend using the default parameter unless you understand how the mutation rate affects GP algorithms. " should have a link.
Open Machine Learning Course
-
Updated
Nov 23, 2019 - Python
Tutorials, assignments, and competitions for MIT Deep Learning related courses.
-
Updated
Nov 23, 2019 - Jupyter Notebook
Today you can put Streamlit in "wide mode" via the Settings dialog in the UI. However, it would be great if the wide mode setting were sticky.
Option 1: just make Wide Mode sticky by persisting it in local storage!
Option 2: Provide a config option that toggles wide mode:
[browser]
wideMode = True
(for this we'd have to replicate much of the code used to propagate settin
The "Python Machine Learning (2nd edition)" book code repository and info resource
-
Updated
Nov 23, 2019 - Jupyter Notebook
Would be great to have new option in Pool. Just like cat_features list of numbers or column names.
There's a small mistake in the description of the embedding layer. It says
'Turns positive integers (indexes) into dense vectors of fixed size.'
but it should read
'Turns non-negative integers (indexes) into dense vectors of fixed size.'
as it expects indexes ranging from 0 to input_dim - 1.
See https://keras.io/layers/embeddings/