Skip to content

Exercise

First Steps in Visualization

1. Today we will start with walking through a Jupyter Notebook that introduces a bit of Python and starting to work with data.

Binder

Completed Notebook

2. Next, we will explore data related to college mobility. We will first describe the distributions of access, success rates, and mobility rates across institutions. We use the same definitions of these terms used in the paper and described in lecture:

  • access: the percentage of students enrolled that are ‘low income’–those whose parents' income is in the bottom quintile (bottom 20%) of the parental income distribution. Note: values range from 0 to 100.

  • success: the percentage of low income students with post-graduation incomes in the top quintile (top 20%) of the student income distribution, measured at age 32-34.

  • mobility: the percentage of students enrolled that are both ‘low income’ and later have earnings in the top quintile (top 20%) of the student income distribution.

Recall that mobility = access x success. Hence, institutions with high mobility will tend to have more low income students and high 'success' rates with those students.

Binder

Data files:

Notebook file:

3. Learn about other visualization methods and tools.

  • matplotlib - matplotlib is the original data visualization library for Python. The functions closely match to plotting function in MATLAB.
    Many other libraries work with matplotlib, e.g., pandas and seaborn integrate with matplotlib and give access to the methods.

  • seaborn - seaborn is built on top of matplotlib, but is designed to have a more modern look to the plots created.

  • Plotnine - Plotnine is an implementation of ggplot2, R's plotting package, in Python. Similar to how MATLAB user's may initially gravitate to using matplotlib in Python, R user's may want to use Plotnine for visualization.

  • Bokeh - Bokeh is also based on the ideas of Grammar of Graphics (like ggplot2 and Plotnine), but it has been created from the ground up in Python.
    Bokeh provides the ability to create interactive, web-ready plots.

  • Plotly - Plotly is well-known as an online platform for data visualization, but it can also be used in a Python notebook. Like Bokeh, Plotly can make interactive plots.

  • Gleam - Gleam is a library that takes its ideas from R's Shiny package. Gleam allows the creation of interactive web apps.

  • Altair - Altair is a declarative visualization library (similar to seaborn), but is not wrapped around matplotlib.

  • Folium - Folium combines python and mapping with leaflet.js to visualize geospatial data.