Excel: Data Cleaning and Visualization
- uzan baroto
- Apr 16, 2022
- 3 min read
Updated: Aug 22, 2022
Table of Contents:
Dataset: 1976-2008 Summer Olympic Medals
In this project, I sought out to find a reasonably dirty data to practice my data cleaning skills. In this project I practiced my problem solving skills with dirty and incomplete data. Along with my analytical and visualization skills with a data with a lot of variables.
Source: https://www.kaggle.com/code/vishnuarun22/summer-olympics-dataset-beginner-eda
Questions:
How many medals has each country won each year?
Who is the country leading in each sport?
What medals are won each year by each country?
TL:DR (Summary)
I transformed this:

into this:

Using a various formulas such as XLOOKUP, TEXT, and other. While solving issues such as missing data, and visualization issues.
But if you are curious about my workflow, lets get into the details!
Step 1: Understanding Data

This dataset had 11 Columns and 15317 rows. When understanding data, its important to take note which columns are worth keeping, and which ones we should drop. Lets run through each columns and its uses.
City: Keep, will be useful for categorization
Year: Keep, will be useful for categorization
Sport: Keep
Discipline: Optional, might be useful
Event: Optional, might be useful
Athlete: Drop, we are trying to find trends over the course of 32 Years, individual athlete names are not necessary
Gender: Keep, might be useful
Country: Keep, will be useful for categorization
Event gender: Drop, not necessary
Medal: Keep, this will be the focus of this analysis
Step 2: Data Cleaning
This data was reasonably clean. The only thing I really had to clean up was the "year" column because keeping it in general format will limit the visualization possibilities since excel won't recognize it as a "date". So I created a second table, detailing the exact date of the event. Then, I used XLOOKUP to fill in the "date" column in reference to the new table.
Issue #1:

In the data cleaning stage I across my first issue, keeping the geographical data sorted by countries causes the visualization to be extremely disproportionate because of the number of variables.
As shown in the picture above.
Solution:

Create a new column to group the countries by their continent. To do this I gathered data on what countries are in each continent. Then, I used XLOOKUP to sort the big data in reference to the new table.
This resulted in a much more manageable chart that shows how many medals each continent had won, and if you hover over the country colors, you can also tell how many medals each country had won. As shown below:

Step 3: Visualization
The last step was visualizing the data in a digestible way to answer the aforementioned 3 questions. I think we can skip the pivot table creation because I believe a messy pivot table can produce the best charts.
Here's the overview of the dashboard I made:

Lets highlight each section of the dashboard separately:
Slicers:

Slicers are a great way to navigate the dashboard in order to show the data you want to see.
There are 3 slicers in this dashboard, Sports, Year, and Country.
Total Medals Won Chart

This is the chart showing how many total medals a country had won.
This chart can be set to show data based on: Country, Sport, or Year.
Sport Leader Chart

This is the chart showing how many medals each country had won in a certain sport.
This chart can be set to show data based on: Sport and Year.
Medal Type Chart

This chart shows the types of medals a certain country hadwon in a particular sport.
This chart can be set to show data based on: Sport, Year, and Country.
Comments