Data Visualization Final Project
The final project is intended as an opportunity to demonstrate what you have learned in class while exploring a data intensive topic of interest to you and, if possible, to a client. While you need not identify a client, you will get more out of the exercise if you do so. The client might be someone you have worked, or currently work, for, a mentor, a friend or a relative. Alternatively, your context might be a social or political issue that you wish to learn more about and to influence. Or, perhaps your investigation might be based on public data you have found that might make a newsworthy story.
This will require that you:
1. Identify the problem, issue or news worthy story that interests you (this may be driven by the data you have access to)
2. Find the data required to address the problem
3. As necessary, assemble the data in machine readable tables
4. As necessary clean, pivot, and/or join the tables
5. Begin to formulate your story
6. Select visuals appropriate for your story (a minimum of five).
7. Create the visuals using Spotfire.
8. Complete the story as both a presentation and as an article/report/memo.
9. Also, write up the process you went through to assemble, clean and organize your data and to select your visuals.
Here is how Jacqueline Pillar, a University of Houston honors student attacked the assignment.Process Description
In class 6 (February 18th), come to class prepared to provide orally a brief (2 -3 minute) description of the problem you intend to address, the sources of the data, and any problems you foresee (just you talking - no visual aids). Also, by noon on that day put the problem description posting on your Padlet page.
For class 9 (March 17th), make an entry on your Padlet page describing the data you will require for your final project. Where are you getting it from, what challenges are you facing in preparing it, how much data is it? How many files? (Due by noon on day of class).
For class 13 (April 14th), be prepared to give a rough run through of your project. It need not be completed, but the final deliverable will benefit from class feedback.
Prepare a “slide” presentation of no more than 10 slides that tells your story - including your five Spotfire * produced visuals (which you are encouraged to enhance with other tools).
On the last day of class (April 21st) deliver a presentation to the class of that story using those slides
Prepare and turn in, in addition to the slides, a written “article” presenting that same story, as if for publication.
Also, prepare and turn in a 500-1,000 word description of the process you followed with particular attention to the source of your data, how you obtained and cleaned it, any challenges you had to overcome along the way, and the imagined audience (no, not me) that you have prepared and tailored your presentation for, as well as any design choices you made given your intended audience. Make it clear what your objective is: are you informing or are you selling?
*if you choose to use a visualization that is more complex - say, something interactive or an infographic - you may describe or depict it by a sketching and verbally describing it rather than creating it in Spotfire (not more than one of your visualizations may be handled in this manner).
For my project, I used government sources for all the data except the COVID-19 numbers. The data compiled in the New York Times’ Github seemed more complete as it was updated everyday, versus the CDC numbers which could only be updated for states that supplied the information to them.
I started with the COVID-19 data, which consisted of states, dates, the number of cases, the number of deaths, and the state codes. I removed the state codes column, because I considered it extraneous information. I loaded the data into Spotfire and displayed it on a world map. I showed the number of cases and deaths by state over time. I wanted to explore this data further, so I created a calculated column that showed fatality rate by state (Deaths divided by Cases). I used a tree map to size states by the number of cases and colored it by the death rate. I wanted to see if this correlated to the number of unemployment claims by state. I downloaded the unemployment claims information from the US Department of Labor, and it seemed that the unemployment claims were in fact higher in states that had more cases. However, I wanted to make sure this was causation, and not just correlation, so I used the 2019 State Population Estimates from the US Census Bureau and looked at unemployment claims per 1,000 people of the state population. This gave me a completely different view, where Hawaii, Michigan, and Pennsylvania were now leading state unemployment claim numbers. I wanted to investigate why this was the case and added Texas just for the sake of curiosity. I researched these states’ population demographics, their responses to the pandemic, and the main industries that make up their economies. Then, I drew my conclusions from the information gathered.
I had the most issues with finding the data for my project. I wanted to make sure the data I presented was accurate, but also complete. The New York Times seemed to have the most complete, up-to-date information on COVID-19 case and death numbers. The US Labor Department had the unemployment claims filed by week. Unfortunately, they only have the data for who has filed, so not the people that have lost their jobs but may not have filed yet. It also does not differentiate between a layoff due to COVID-19 versus a seasonal worker out of work or an employee that was fired. I used the population data from the US Census Bureau, which is currently only estimates. I believe it is as close to exact I could get considering the census only happens once every ten years.
The intended audience of my presentation was the American public. The issues currently present on everyone’s mind are COVID-19 and the economy. I wanted to create a presentation that gives a concise analysis of what is going on, without any political or other bias present – just the facts. I used a consistent blue-colored theme as a salute to the healthcare workers working to help people during this pandemic.
COVID-19 & Unemployment
An Analysis by Jacqueline Pillar
I was looking at COVID-19 numbers when I heard a news report talking about the devastation done to the stock market. Then, the layoffs started rolling in. My hypothesis was that states with the most cases would have the most unemployment claims. I created an interactive map in Spotfire that showed the number of cases by size and the number of deaths by color, with a date filter that started at January 21, 2020, which is when the first case of coronavirus was reported in the United States.
Diving deeper, I made the visual below to look more into deaths by state. As New York is the state with the most cases, it made sense that they have the most deaths. However, Michigan has only an eighth of the cases that New York has but has the highest COVID-19 fatality rate in the nation. This is because over one-third of the Michigan state population lives below the poverty line, and many of those do not have health insurance or access to clean water. Also, much of the population suffers from pre-existing conditions such as diabetes, hypertension, and high blood pressure (and Michigan’s average is higher than the national average) which puts them at a higher risk of dying from COVID-19.
At the week ended March 14, there was a steep increase in the number of unemployment claims. This is due to President Trump officially declaring a national emergency on March 13. I did want to look more into why Hawaii and Pennsylvania had the most unemployment claims and included Texas just for curiosity’s sake.
Hawaii had the highest unemployment claims per 1,000 people. This is because tourism makes up almost a quarter of their state economy. When China locked down their country, it impacted Hawaii, because, though Chinese tourists account for a small percentage of the state’s annual visitors, they spend more money per person than visitors from any other region of the world including the United States.
The steep incline of unemployment claims for Pennsylvania was due to their early response to the pandemic. The governor closed all non-essential businesses on March 19, less than a week after President Trump declared a national emergency. Doctors and healthcare experts also believe this is how they managed to keep their number of cases, and, consequently, their death rate, lower than average.
I wanted to include Texas to specifically touch on the fact that yesterday, oil prices were around negative $37. As in there is such a surplus that oil producers were paying people to take it off their hands. This is the first time in history that oil prices have ever been negative. Texas will feel this being that it is the energy capital. Oil and gas make up about 40% of the state economy, and companies have already started laying off. Weatherford, for example, let go of 6,000 employees last week (25% of their workforce) – all on the same day.
COVID-19 does physically affect older people at a higher rate, but younger age groups feel the financial effects. The service sector has had to lay off a lot of people due to many businesses being considered non-essential, such as salons and clothing stores. About 40% of the service sector consists of people aged 16 to 24. As shown in the visual above, they are the most affected.
Job loss has increased astronomically over the past few weeks, but I wanted to use a benchmark so we could understand the enormity of it. During the 2008 Recession, from its peak to the end, 8.7 million jobs were lost. During just the 4 weeks ended April 11th, 22 million unemployment claims were filed. Also, as job loss is defined as someone that was currently working for their employer when they were laid off, the 22 million does not take into consideration the thousands, possibly tens of thousands, that had accepted offers for internships and full-time jobs this summer but were ultimately cancelled or rescinded.
Economists at the St. Louis Federal Reserve project the coronavirus economic freeze could cost 47 million jobs and jump unemployment past 32%. To better predict the damage that could be caused by the pandemic, we would need to know when it will be over. It took about 5-6 years to replace the 8.7 million jobs lost during the 2008 Recession. We are currently at three times that amount. How long will it take to get the US economy back to its pre-COVID-19 standing? I do not have an answer for that, but I am interested to see how we will try to restart the economy when all of this is over.
10 Powerpoint Slides Not Shown but here is an example:
- Too much use of slides directly from Spotfire. Generally, the labels are far too small.
+ Consistent use of color, font and layout through out slides
+ A fresh angle on the analysis
- What's the big idea on each visualization - not just what you are showing but why!
+ Story and visuals nicely tied together