a curated guide to the best tools, resources and technologies for data visualization

Data Scraping

Convextra

A browser plugin that scrapes data from websites

Scrapy

Scrapy

An open source and collaborative framework for extracting the data you need from websites.

Artoo.js

Artoo.js

A powerful script that can be run from your browser’s bookmark bar to scrape a website and return the data in JSON format.

Beautiful Soup

Beautiful Soup

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping.

Zanran

Uses computer vision to identify graphs and tables for our data search engine, as well as finding and extracting content from PDF documents

WebScraper

WebScraper

A Chrome extension for converting elements on a webpage into manipulatable data. Web Scraper requires no coding, just an understanding of HTML sitemaps.

QuickCode

QuickCode

Formerly known as ScraperWiki, QuickCode is a Python and R data analysis environment, ideal for economists, statisticians and data managers who are new to coding.

PDF Tables

PDF Tables

Accurately and quickly convert PDF tables to Excel.

Tabula

Tabula

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface. And now you can download Tabula and run it on your own computer. Tabula is a development from Pro-Publica, La Nacion DATA and Knight-Mozilla OpenNews.

Parsehub

Parsehub

You can extract data from anywhere. ParseHub works with single-page apps, multi-page apps and just about any other modern web technology. ParseHub can handle Javascript, AJAX, cookies, sessions and redirects. You can easily fill in forms, loop through dropdowns, login to websites, click on interactive maps and even deal with infinite scrolling.

Outwit

Outwit

OutWit Hub explores the depths of the Web for you, automatically collecting and organizing data and media from online sources.

TAGS

TAGS

Build your own dataset or API, without writing code

Abbyy

Abbyy

PDF Transformer+ offers everything you need for your daily work with PDF files. Whether you wish to edit or comment, add password protection, share PDFs with colleagues, create, convert, or simply read PDFs, PDF Transformer lets you handle it all with ease. This versatile PDF software combines an intuitive interface and collaboration tools with ABBYY’s Optical Character Recognition (OCR) technology and Adobe® PDF Library technology ensuring that you can easily work with any type of PDF.

Able2Extract

Able2Extract

“Need image (scanned) PDF conversion to Excel, Word, and PowerPoint? Able2Extract Professional combines leading edge technology with our proprietary PDF conversion algorithm to deliver high quality conversions every time. This is great for people working with paper documents and wanting to access them electronically.”

Import.io

Easily extract structured data from almost any website simply by copying and pasting its URL, then accessing the data either as a .CSV download for spreadsheet applications or via API.

Tags: