top of page

Excel: Data Cleaning and Visualization

  • Writer: uzan baroto
    uzan baroto
  • Apr 16, 2022
  • 3 min read

Updated: Aug 22, 2022

Table of Contents:


Dataset: 1976-2008 Summer Olympic Medals

In this project, I sought out to find a reasonably dirty data to practice my data cleaning skills. In this project I practiced my problem solving skills with dirty and incomplete data. Along with my analytical and visualization skills with a data with a lot of variables.

Source: https://www.kaggle.com/code/vishnuarun22/summer-olympics-dataset-beginner-eda

Questions:

  1. How many medals has each country won each year?

  2. Who is the country leading in each sport?

  3. What medals are won each year by each country?

TL:DR (Summary)

I transformed this:


into this:


Using a various formulas such as XLOOKUP, TEXT, and other. While solving issues such as missing data, and visualization issues.



But if you are curious about my workflow, lets get into the details!



Step 1: Understanding Data

This dataset had 11 Columns and 15317 rows. When understanding data, its important to take note which columns are worth keeping, and which ones we should drop. Lets run through each columns and its uses.

  1. City: Keep, will be useful for categorization

  2. Year: Keep, will be useful for categorization

  3. Sport: Keep

  4. Discipline: Optional, might be useful

  5. Event: Optional, might be useful

  6. Athlete: Drop, we are trying to find trends over the course of 32 Years, individual athlete names are not necessary

  7. Gender: Keep, might be useful

  8. Country: Keep, will be useful for categorization

  9. Event gender: Drop, not necessary

  10. Medal: Keep, this will be the focus of this analysis

Step 2: Data Cleaning


This data was reasonably clean. The only thing I really had to clean up was the "year" column because keeping it in general format will limit the visualization possibilities since excel won't recognize it as a "date". So I created a second table, detailing the exact date of the event. Then, I used XLOOKUP to fill in the "date" column in reference to the new table.


Issue #1:



In the data cleaning stage I across my first issue, keeping the geographical data sorted by countries causes the visualization to be extremely disproportionate because of the number of variables.

As shown in the picture above.


Solution:

Create a new column to group the countries by their continent. To do this I gathered data on what countries are in each continent. Then, I used XLOOKUP to sort the big data in reference to the new table.


This resulted in a much more manageable chart that shows how many medals each continent had won, and if you hover over the country colors, you can also tell how many medals each country had won. As shown below:


Step 3: Visualization

The last step was visualizing the data in a digestible way to answer the aforementioned 3 questions. I think we can skip the pivot table creation because I believe a messy pivot table can produce the best charts.


Here's the overview of the dashboard I made:


Lets highlight each section of the dashboard separately:

Slicers:

Slicers are a great way to navigate the dashboard in order to show the data you want to see.


There are 3 slicers in this dashboard, Sports, Year, and Country.


















Total Medals Won Chart

This is the chart showing how many total medals a country had won.


This chart can be set to show data based on: Country, Sport, or Year.




Sport Leader Chart

This is the chart showing how many medals each country had won in a certain sport.


This chart can be set to show data based on: Sport and Year.





Medal Type Chart

This chart shows the types of medals a certain country hadwon in a particular sport.


This chart can be set to show data based on: Sport, Year, and Country.






Comments


bottom of page