a curated guide to the best tools, resources and technologies for data visualization

Data Cleaning

Open Refine

OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

Trifacta Wrangler

Trifacta Wrangler

Trifacta Wrangler is specifically designed to make the data preparation process easier and faster. By providing a connected desktop application for users to visually explore, structure and publish out dashboard-ready datasets, Trifacta Wrangler helps analysts deliver faster and more accurate analysis.

CSV Fingerprints

CSV Fingerprints

CSV Fingerprints aims to make it easy to spot errors in your dataset by providing a birdseye view of the file without too much distracting detail.

DataCleaner

DataCleaner

Find the patterns, missing values, character sets and other characteristics of your data values.

DataCleaner

DataCleaner

The heart of DataCleaner is a strong open-source data profiling engine for discovering and analyzing the quality of your data.

Pandas for Python

Pandas for Python

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

dedupe.io

dedupe.io

Dedupe.io is a highly-advanced software as a service platform for quickly and accurately identifying clusters of similar records across one or more files or databases.