Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
#
data-cleaning
Repositories 290
General Assembly's 2015 Data Science course in Washington, DC
data-science
machine-learning
scikit-learn
data-analysis
pandas
jupyter-notebook
python
course
linear-regression
logistic-regression
model-evaluation
naive-bayes
natural-language-processing
decision-trees
ensemble-learning
clustering
regular-expressions
web-scraping
data-visualization
data-cleaning
Jupyter Notebook
Updated Apr 18, 2016
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
data-wrangling
data-forge
data
data-analysis
javascript
nodejs
linq
pandas
visualization
data-visualization
data-management
data-manipulation
data-munging
data-cleaning
data-cleansing
csv
json
TypeScript
Updated Mar 20, 2019
simple tools for data cleaning in R
kaggle
data-science
machine-learning
big-data
python
eda
deep-learning-tutorial
data-visualization
data-analysis
data-cleaning
feature-extraction
numpy-tutorial
pandas-tutorial
seaborn
sklearn
kaggle-competition
jupyter-notebook
deep-neural-networks
deep-learning
deep-learning-algorithms
Jupyter Notebook
Updated Mar 20, 2019
Jupyter notebook and datasets from the pandas Q&A video series
Jupyter Notebook
Updated Mar 12, 2019
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Professional data validation for the R environment
HTML
Updated Mar 18, 2019
Encoding methods for dirty categorical variables
Python
Updated Jan 9, 2019
A Machine Learning System for Data Enrichment.
Python
Updated Sep 15, 2018
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exp…
iris
iris-dataset
machine-learning-algorithms
python
jupyter-notebook
kaggle
kmeans
adaboost
gradient-boosting
data-visualization
data-cleaning
feature-extraction
feature-engineering
machine-learning-workflow
titanic-kaggle
house-price-prediction
machine-learning
workflow
courses
kaggle-competition
Jupyter Notebook
Updated Feb 16, 2019
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
openrefine
fuzzy-matching
ngram
approximate-string-matching
data-cleaning
data-clustering
clustering
cran
r
rstats
C++
Updated Sep 14, 2018
An R package for data screening
HTML
Updated Mar 7, 2019
Make sense of your data
JavaScript
Updated Mar 22, 2019
DTCleaner: data cleaning using multi-target decision trees.
Java
Updated Jun 21, 2016
voice
voice-assistant
voice-recognition
voice-recording
transcription
featurization
data
data-cleaning
visualization
generation
voice-activity-detection
voice-control
server
security
encryption-decryption
python3
machine-learning
wake-word-detection
voice-computing
Python
Updated Feb 7, 2019
taxonomic classes for R
A simple command line interface to the datamade/dedupe library.
Jupyter Notebook
Updated Apr 2, 2018
Clean species occurrence records
R
Updated Nov 19, 2018
Grateful Data isn't programming code, but an online tutorial about data acquisition, cleaning and enriching, using pu…
Updated Sep 11, 2018
Exploratory data analysis 📊 using python 🐍 of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
data-science
data-analysis
data-visualization
data-cleaning
data-cleansing
data-wrangling
data-science-python
data-analytics
data-analysis-python
eda
exploratory-data-analysis
kaggle-competition
kaggle-dataset
kaggle-used-cars-dataset
Jupyter Notebook
Updated Jan 2, 2019
This is a A Comprehensive ML Workflow for House Prices data set, I have tried to help **Fans of Machine Learning** in…
machine-learning
house-price-prediction
kaggle-competition
regression
kaggle
kernel
eda
data-science
data
data-cleaning
data-analysis
data-visualization
visualization
python
jupyter-notebook
notebook
ipynb
toturial
regression-models
Jupyter Notebook
Updated Jan 6, 2019
Foofah: programming-by-example data transformation program synthesizer
data-transformation
data-wrangling
inductive-program-synthesis
programming-by-example
combinatorial-search
heursitic
data-preparation
data-cleaning
CSS
Updated Apr 23, 2018
Vue
Updated Sep 24, 2018
Engaged in research to help improve to boost text sentiment analysis using facial features from video using machine l…
data-cleaning
multimodal-sentiment-analysis
pandas
bag-of-words
nltk
svm
logistic-regression
sparse-matrix
classification-report
cross-validation
recursive-feature-elimination
count-vectorizer
tf-idf
machine-translation
Jupyter Notebook
Updated Jan 12, 2018
【项目已迁移到团队github】因此该 repository 只会同步最新的 README.md,若需要 watch、Star、Fork,则去团队的 github。谢谢。
Vue
Updated Jun 16, 2017
Find and replace erroneous fields in data using validation rules
R
Updated Sep 13, 2018
Fill missing values in a pandas DataFrame using a Restricted Boltzmann Machine
Python
Updated Dec 13, 2018
Interactive cleaning for pandas DataFrames
Python
Updated Dec 13, 2018
Reduce, filter, and anonymize moodle data for non-prod environments
PHP
Updated Mar 18, 2019