Focusing a data visualization class on C-19

05-04-2020 05:38:02 PM

With one of our doctoral students, Cassie Collier, I  co-taught a DataViz course starting in January 2020.  Cassie taught the technical bits using TIBCO Spotfire (but you might elect to use Tableau, Microsoft’s Power BI, or something else), while my half of each lecture focused on the choice of chart type, story telling with visuals, how humans process images, layout, labeling, identifying the big idea, color (Cassie did that one),  fonts, chart junk, misleading visuals, unethical uses, the perils of pie and 3D charts, and so on. Each three-hour class section was broken into a  lecture or visiting speaker and a Spotfire lab.  Much of my half of the class was spent looking at  and discussing examples of dataviz and the stories that accompanied them, some brought in by Cassie and I, and some via mini assignments to the students.  

In the second class, students were required to identify an interesting visualization in the news. In the first few days of February, one of the students drew our attention to the Johns Hopkins Covid-19 Dashboard.  Each week thereafter we revisited the site as well as other C-19 visuals that were increasingly appearing in the news - in particular, The New York Times, The Economist, The Washington Post, and The Wall Street Journal, and a site here in Houston.   Here are some examples from Vox that are not behind a paywall:

Charting the coronavirus pandemic state by state 
11 charts that explain the coronavirus pandemic
Why the Covid-19 economy is devastating to millennials, in 14 charts

or Randy Krum's fantastic C-19 data pack on Information is Beautiful

We discussed where the data was coming from, the challenges in gathering it, the goods and the bads of design,  the potential biases of the sources and the questions that the data was being asked.  Unexpectedly, the class suddenly had a very relevant new major theme, or at least a context, and one that became even more relevant when, half way through the course, we all were forced to leave campus and go online.

The students had already been given a final project assignment and by week three or four they had selected the context for their projects. Required was a presentation on the last day of class (via Zoom) and a written article, both related to some problem of interest to them or, if they could find one, a client. They also were asked to provide a brief description oof the process they had gone through. Unfortunately, it was too late for me to push them towards C-19 as I had already given them the freedom to pursue something that interested them personally.  Too bad, because it would have beeb really fun to give them all the same context and see what they cane up with.  Plus, by then, C-19 interested, for better or worse, all of us - and all of you!

Two students did use C-19 as their context and one, an undergraduate honors student, did a good job looking at the relationship between the virus and the economic consequences.  I have reproduced her two reports at the end of the posting. First, though, let's look at the assignment.

The Assignment

Data Visualization Final Project

The final project is intended as an opportunity to demonstrate what you have learned in class while exploring a data intensive topic of interest to you and, if possible, to a client. While you need not identify a client, you will get more out of the exercise if you do so. The client might be someone you have worked, or currently work, for, a mentor, a friend or a relative. Alternatively, your context might be a social or political issue that you wish to learn more about and to influence. Or, perhaps your investigation might be based on public data you have found that might make a newsworthy story.

This will require that you:

1. Identify the problem, issue or news worthy story that interests you (this may be driven by the data you have access to)
2. Find the data required to address the problem
3. As necessary, assemble the data in machine readable tables
4. As necessary clean, pivot, and/or join the tables
5. Begin to formulate your story
6. Select visuals appropriate for your story (a minimum of five).
7. Create the visuals using Spotfire.
8. Complete the story as both a presentation and as an article/report/memo.
9. Also, write up the process you went through to assemble, clean and 
organize your data and to select your visuals.

Deliverables:

  1. In class 6 (February 18th), come to class prepared to provide orally a brief (2 -3 minute) description of the problem you intend to address, the sources of the data, and any problems you foresee (just you talking - no visual aids). Also, by noon on that day put the problem description posting on your Padlet page.

  2. For class 9 (March 17th), make an entry on your Padlet page describing the data you will require for your final project. Where are you getting it from, what challenges are you facing in preparing it, how much data is it? How many files? (Due by noon on day of class).

  3. For class 13 (April 14th), be prepared to give a rough run through of your project. It need not be completed, but the final deliverable will benefit from class feedback.

  4. Prepare a “slide” presentation of no more than 10 slides that tells your story - including your five Spotfire * produced visuals (which you are encouraged to enhance with other tools).

  5. On the last day of class (April 21st) deliver a presentation to the class of that story using those slides

  6. Prepare and turn in, in addition to the slides, a written “article” presenting that same story, as if for publication.

  7. Also, prepare and turn in a 500-1,000 word description of the process you followed with particular attention to the source of your data, how you obtained and cleaned it, any challenges you had to overcome along the way, and the imagined audience (no, not me) that you have prepared and tailored your presentation for, as well as any design choices you made given your intended audience. Make it clear what your objective is: are you informing or are you selling?

    *if you choose to use a visualization that is more complex - say, something interactive or an infographic - you may describe or depict it by a sketching and verbally describing it rather than creating it in Spotfire (not more than one of your visualizations may be handled in this manner).


Here is how Jacqueline Pillar, a University of Houston honors student attacked the assignment.

Process Description

For my project, I used government sources for all the data except the COVID-19 numbers. The data compiled in the New York Times’ Github seemed more complete as it was updated everyday, versus the CDC numbers which could only be updated for states that supplied the information to them.

I started with the COVID-19 data, which consisted of states, dates, the number of cases, the number of deaths, and the state codes. I removed the state codes column, because I considered it extraneous information. I loaded the data into Spotfire and displayed it on a world map. I showed the number of cases and deaths by state over time. I wanted to explore this data further, so I created a calculated column that showed fatality rate by state (Deaths divided by Cases). I used a tree map to size states by the number of cases and colored it by the death rate. I wanted to see if this correlated to the number of unemployment claims by state. I downloaded the unemployment claims information from the US Department of Labor, and it seemed that the unemployment claims were in fact higher in states that had more cases. However, I wanted to make sure this was causation, and not just correlation, so I used the 2019 State Population Estimates from the US Census Bureau and looked at unemployment claims per 1,000 people of the state population. This gave me a completely different view, where Hawaii, Michigan, and Pennsylvania were now leading state unemployment claim numbers. I wanted to investigate why this was the case and added Texas just for the sake of curiosity. I researched these states’ population demographics, their responses to the pandemic, and the main industries that make up their economies. Then, I drew my conclusions from the information gathered.

I had the most issues with finding the data for my project. I wanted to make sure the data I presented was accurate, but also complete. The New York Times seemed to have the most complete, up-to-date information on COVID-19 case and death numbers. The US Labor Department had the unemployment claims filed by week. Unfortunately, they only have the data for who has filed, so not the people that have lost their jobs but may not have filed yet. It also does not differentiate between a layoff due to COVID-19 versus a seasonal worker out of work or an employee that was fired. I used the population data from the US Census Bureau, which is currently only estimates. I believe it is as close to exact I could get considering the census only happens once every ten years.

The intended audience of my presentation was the American public. The issues currently present on everyone’s mind are COVID-19 and the economy. I wanted to create a presentation that gives a concise analysis of what is going on, without any political or other bias present – just the facts. I used a consistent blue-colored theme as a salute to the healthcare workers working to help people during this pandemic.

Jackie's Article 

COVID-19 & Unemployment

An Analysis by Jacqueline Pillar

I was looking at COVID-19 numbers when I heard a news report talking about the devastation done to the stock market. Then, the layoffs started rolling in. My hypothesis was that states with the most cases would have the most unemployment claims. I created an interactive map in Spotfire that showed the number of cases by size and the number of deaths by color, with a date filter that started at January 21, 2020, which is when the first case of coronavirus was reported in the United States.

Diving deeper, I made the visual below to look more into deaths by state. As New York is the state with the most cases, it made sense that they have the most deaths. However, Michigan has only an eighth of the cases that New York has but has the highest COVID-19 fatality rate in the nation. This is because over one-third of the Michigan state population lives below the poverty line, and many of those do not have health insurance or access to clean water. Also, much of the population suffers from pre-existing conditions such as diabetes, hypertension, and high blood pressure (and Michigan’s average is higher than the national average) which puts them at a higher risk of dying from COVID-19. 

Deaths Per State

I thought that states with the most cases and/or deaths would have more unemployment cases. Though the correlation is there, I do not believe it is causation. As you can see in the visual below, California, New York, and Texas are some of the states that lead the nation in unemployment claims. This is more so due to their also being the most populated states in the country.

Bar Chart



The chart below accounts for state population and shows unemployment claims by state per 1,000 people. The US Labor Department reports unemployment by the week, so I started the chart at the week ended January 25th to include January 21st.

Line chart

At the week ended March 14, there was a steep increase in the number of unemployment claims. This is due to President Trump officially declaring a national emergency on March 13. I did want to look more into why Hawaii and Pennsylvania had the most unemployment claims and included Texas just for curiosity’s sake.

Hawaii had the highest unemployment claims per 1,000 people. This is because tourism makes up almost a quarter of their state economy. When China locked down their country, it impacted Hawaii, because, though Chinese tourists account for a small percentage of the state’s annual visitors, they spend more money per person than visitors from any other region of the world including the United States. 

The steep incline of unemployment claims for Pennsylvania was due to their early response to the pandemic. The governor closed all non-essential businesses on March 19, less than a week after President Trump declared a national emergency. Doctors and healthcare experts also believe this is how they managed to keep their number of cases, and, consequently, their death rate, lower than average.

I wanted to include Texas to specifically touch on the fact that yesterday, oil prices were around negative $37. As in there is such a surplus that oil producers were paying people to take it off their hands. This is the first time in history that oil prices have ever been negative. Texas will feel this being that it is the energy capital. Oil and gas make up about 40% of the state economy, and companies have already started laying off. Weatherford, for example, let go of 6,000 employees last week (25% of their workforce) – all on the same day.

Column chart

 

COVID-19 does physically affect older people at a higher rate, but younger age groups feel the financial effects. The service sector has had to lay off a lot of people due to many businesses being considered non-essential, such as salons and clothing stores. About 40% of the service sector consists of people aged 16 to 24. As shown in the visual above, they are the most affected. 

Job loss has increased astronomically over the past few weeks, but I wanted to use a benchmark so we could understand the enormity of it. During the 2008 Recession, from its peak to the end, 8.7 million jobs were lost. During just the 4 weeks ended April 11th, 22 million unemployment claims were filed. Also, as job loss is defined as someone that was currently working for their employer when they were laid off, the 22 million does not take into consideration the thousands, possibly tens of thousands, that had accepted offers for internships and full-time jobs this summer but were ultimately cancelled or rescinded. 

Economists at the St. Louis Federal Reserve project the coronavirus economic freeze could cost 47 million jobs and jump unemployment past 32%. To better predict the damage that could be caused by the pandemic, we would need to know when it will be over. It took about 5-6 years to replace the 8.7 million jobs lost during the 2008 Recession. We are currently at three times that amount. How long will it take to get the US economy back to its pre-COVID-19 standing? I do not have an answer for that, but I am interested to see how we will try to restart the economy when all of this is over.


10 Powerpoint Slides Not Shown but here is an example:

 Example PPT


Blake Observations

-  Too much use of slides directly from Spotfire.  Generally, the labels are far too small.
+ Consistent use of color, font and layout through out slides
+ A fresh angle on the analysis
- What's the big idea on each visualization - not just what you are showing but why!
+ Story and visuals nicely tied together

Statistics
0 Favorited
26 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Comments

05-06-2020 09:56:56 AM

Blake -- thanks for posting this. Your "on-the-fly" stuff is still so good - and for most of us, a great idea for adaptation going forward. Stay safe!

Related Entries and Links

No Related Resource entered.