Critical Data Overload: Restitutions

Plan for the day

Cleaning and formatting data

Modes of representation

Work session

Presentations

Critical Data Overload: Restitutions


Plan for the day


This morning:


This afternoon:


Cleaning and formatting data


Cleaning data and formatting data is essential for sustainability.


Diagram representing the scraping lifecycle
Diagram representing the scraping lifecycle


Cleaning involves:


Formatting involves:


CSV (comma-separated value) is the file format of Excel, very popular, not very expressive

name,location,date
'paris,texas',us,1984
ikiru,jp,1952
mauvais sang,fr,1986
parasite,kr,2019

import csv

films = [
    {
        'name': 'paris, texas',
        'location': 'us',
        'date': 1984
    },
    {
        'name': 'ikiru',
        'location': 'jp',
        'date': 1952
    },
    {
        'name': 'mauvais sang',
        'location': 'fr',
        'date': 1986
    },
    {
        'name': 'parasite',
        'location': 'kr',
        'date': 2019
    },
]

with open('data.csv', 'w', newline='') as csvfile:
    fieldnames = ['name', 'location', 'date']
    writer = csv.DictWriter(films, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data)

JSON is more expressive than CSV, but a bit more complicated

import json

data = {
    #...
}

with open('data.json', 'w') as jsonfile:
    json.dump(data, jsonfile)

Modes of representation


We have data. Now what?


Representation shapes data to express something.


Transmedia to reveal patterns in an engaging way.


Zoomed out to reveal the sheer size of smaller, repeating elements


Zoomed in to focus on each piece of data, with the quantity as a background


Focusing on the mass or on the sample?

Giving a lot of context, a little context, or a whole new context?

Is it more of a document, or a performance?

Is it a restitution, or an interpretation?


Work session



Presentations

Questions: