Critical Data Overload: Introduction
Plan for the day
This morning:
- introduction (me, you, this workshop)
- thinking about data, practically and critically
- examples of work
This afternoon:
- data collection techniques: scraping and requesting
- scraping in HTML: the craigslist example
Introduction
Pierre Depaz ( pierredepaz.net )
political science, game design, comparative literature
i'm curious about how code affects our ways of thinking and acting.
You ? Name, experience with programming, why did you come to this workshop?
What
This workshop focuses on critically thinking about data through direct engagement.
Collecting datasets to better understand what it is, where it comes from, and what it can mean.
this means understanding and working with data in a very practical way: through the systems we often interface with
Why
To provide concrete answers to some questions:
- what is data? when does it become information? what is this information about?
- how is data distributed? how is it taken apart and recombined? what do technical systems reveal about data providers?
- how do we re-present data in order to change its interpretation?
(also as a starting point for Helin's class next semester)
How
Until Monday, we will collaboratively build new corpora of online data.
- Identifying ways to think about data as a kind of corpus (research, conceptual thinking)
- Using the technical tools to acquire this data, and render it usable (programming, JS, Python)
- Reflecting on how to interface this data (sketching)
aka we create our own datasets in groups
Thinking about data
What is data?
data is a pattern of symbols which can be made meaningful
Data is a pattern of symbols which can be made meaningful.
What is metadata?
data about data, sketching things in the negative
Data is the new oil?
Airbnb and Facebook started by scraping
Data as the by-product of systems: data vs. capta.
Data comes from "given" in latin, but it's not alway given, rather it's "taken", captured, capta
What is the difference between data and information?
The mathematical theory of communication and the disappearance of meaning.
so we need to represent, in order to interpret
What agency do we have when faced with a deluge of data? How do we make sense of it?
Distance, or pivot is one way of doing it. Art is one way of providing a different perspective
Data is the raw material of digital technologies. It only becomes information through representation, which facilitates interpretation.
Examples
Politics:
- Sam Lavigne's Get Well Soon , C-Span 5 , Occupied B&B (and the accompanying piece )
- Winnie Soon's Unerasable Characters
- Miao Ying's Blind Spot
- Jack Sweeney's ElonJet
- Lawrence Abu Hamdan's Air Pressure
- Emma Sheffer's Insta Repeat
- Luke DuBois's A More Perfect Union and Billboard
- Mark Hansen and Ben Rubin's Moveable Type
- Josh Begley's Prison Map and Drone Stream and Condolences
Visual Culture:
- Lev Manovich and his team's Selfie City
- Jason Salavon's 100 Special Moments
- Nicolas Malevé's 12 Hours of ImageNet
- Aaron Swartz and Taryn Simon's Image Atlas
Multimedia:
- Truth and Quantity
- Riley Walz's IMG_0001
- Pierre Depaz's Real Time Sound Cloud
- Hatnote's Listen to Wikipedia
- Kyle McDonald's Spotify Serendipity
- Disnovation.org's Pirate Cinema
Misc:
- Mimi Onuha's Missing Datasets
- Flickr's Data Lifeboat
Critical data explorations is shaping a material to represent something about a system (critically, poetically, both)
MAHLZEIT
Gathering data
Scraping is the process of taking apart a webpage.
If you see it on your browser, it's already in your computer.
Interfacing is the process of asking for data from a specialized service (an Application Programming Interface)
Both can be automated!
Working with web data
Data markups
some text
Anatomy of a webpage a.k.a Developer Tools are your best friends!
(please don't use safari)
- menu > more tools > developer tools > elements
Craigslist example
First things first, installing tools:
- Python
- NodeJS
- Analysis
- Testing
- Navigating
Analysis
- what does the site architecture look like?
- how do i get from one page to another?
- what does the markup looklike?
- are there any
classoridwe can use? - is there any pattern we can build on?
Testing
- write a small page scraper that goes directly to a specific page
- log the elements as we find them
- save them to a file
Navigating
- write small piece of code that visits every page
- connect it to our page scraper
_Exercise
find a news website and extract all the headlines you can find with one or more particular words (e.g. "says", "trial").