top of page

SQL: Exploratory Data Analysis on how Indonesia really handled COVID-19

  • Writer: uzan baroto
    uzan baroto
  • May 7, 2022
  • 5 min read

Updated: Aug 22, 2022

Table of contents

  • TL:DR

  • The data's story:

  • Step 1: asking the right questions

  • Step 2: Preparing the data

  • Step 3: Processing the data

  • Step 4: Analyzing data

  • Step 5: Sharing the data

TL:DR



Description:

Let's end the debate with Data!


As the pandemic looks to be coming to an end. I feel curious on how Indonesia really handled COVID-19. And what better way to answer that question than Data!


I found the dataset from Ourworldindata which is a beautifully organized website with all kinds of data. I did an Exploratory Data Analysis with SQL to answer 4 specific questions as seen in the infographic.


After I spent the time in SQL to explore the data, then I went to work in tableau to create the visualizations.


And the results baffled me. Here are some of my findings.

1. Indonesia is ranked 62 out of 209 globally in countries with the lowest infection rates with only 2.2% of the population infected (not bad!)

2. Indonesia is ranked 93 out of 209 globally in mortality rate

3. Indonesia is also ranked 18 in most total cases while being ranked 4th in population size and ranked 63 in population density (not good)

4. Indonesia is ranked 25th globally in CFR at 2.6% (2.6% of infected people die) (Terrible)

5. Indonesia is ranked 9th in total deaths at 156.240 deaths (Terrible)

6. Social limitations had a negligible impact on new cases


Check out my SQL code here: https://bit.ly/OjanGithub


The data's story:


Step 1: Asking the right questions

Any data analysis project starts in the same place, the problem domain. This is the area where we inspect problems to define specific, measurable, relevant, and time-bound questions. In this project, the theme that I have selected is "how did Indonesia really handle covid".


Breaking down that theme into 4 questions.

  1. What are the infection rate and mortality rates of covid in Indonesia

  2. How does Indonesia rank in global case fatality rate (death rate)

  3. Is there a correlation between government policies and infection rates?

  4. How does Indonesia rank against other countries with similar population size and density?

These questions are defined based on the SMART methodology. This makes questions based on Specificity, Measurability, Action-oriented, Relevancy, and time-bound.


The problem that I am trying to solve here can be categorized as "categorizing things" and "discovering connections"


The tool that will be used in this EDA is SQL. Because I feel it's easier and faster to explore the dataset of this scale using SQL rather than pivot tables in excel. The SQL database that I'm using is Postgresql with DBeaver editor.


Because of the size of the dataset, I had to establish the scope of work. This analysis will not include:

  1. Pre-existing health conditions data

  2. Demographic data

  3. Vaccination data

With all of that establish we can hop on to the next step, Preparing the Data.


Step 2: Preparing the data

The data that I analyzed is from Ourworldindata.com. To prevent bias and establish as much context as possible, I should first answer 6 questions on the data:

  1. What is the data about? Covid data, including cases, deaths, vaccination, testing, demographics, and pre-existing health conditions

  2. Where was the data collected? The data was collected globally, but the company is based in the UK

  3. When was the data collected? Daily, but this analysis was done with the latest data entry in march 2022

  4. Who collected the data? The OWID team is a team of researchers and software engineers spread around the world.

  5. How was the data collected? The data was collected from first-party researchers and third parties.

  6. Why was the data collected? This data was verified against the publicly available data and is completely free for use. This ensures credibility and vouches for OWIDs goal of democratizing data

After making sure I stay objective during this analysis I downloaded the full covid data and the "government policies response" dataset.


Step 3: Processing the data

This is the step where I explore the data with SQL. I begin by understanding how each attribute of this data correlates and the attributes that I will explore.


This includes:

  1. Date: The date of entries

  2. Continent: Blank entries mean that this record is about a particular continent, or income group

  3. Location: Country, continent, or income group

  4. Total cases: Total cases to date

  5. New cases: total cases - previous total cases

  6. Total deaths: Total deaths to date

  7. New deaths: Total deaths - previous total deaths

  8. Government policy response: 0, 1, 2 (no measures, recommended to stay at home, required to stay at home other than "important activities")

Then I started exploring what tables might answer my questions.


Check out my SQL code here: https://bit.ly/OjanGithub


Step 4: Analyzing the data

In this step, I will start answering each questions. While visualizing each point of interest.

But first I would like to see the Infection and Death rate of each country in the world.


This is the infection map, this map shows total number of cases (circle size) and the death rate of each country(circle color)


Q1: What are the infection rates and mortality rates of covid in Indonesia?





INSIGHTS :

Indonesia is ranked 62 out of 209 globally in lowest infection rates (ranked from lowest to highest) with only 2.2% of the population infected.


indonesia is ranked 93 out of 209 globally in lowest mortality rate (ranked from lowest to highest).


This means that although Indonesia's case fatality rate (risk of dying when infected) is relatively high.


That number is affected by Indonesia's low infection count relative to population size.


Indonesia is also ranked 18 in most total cases, while being ranked 4th in population size and ranked 63 in population density



Q2: How does Indonesia rank in global case fatality rate (death rate)


INSIGHTS:

Indonesia is ranked 25th globally in CFR at 2.6% (2.6% of infected people die).


Indonesia is ranked 9th in total deaths at 156.240 deaths


This is a relatively high case fatality rate for the 6 million total cases in Indonesia


Research shows that CFR is affected minimally by healthcare. And is most affected by population density, testing numbers, and airport traffic, followed by higher age groups


Q3: Is there a correlation between government policies and infection rates?

This analysis is done by joining the overall covid data table with the government response table. Before creating a timeline out of it.


INSIGHTS:

There is little to no correlation between stay at home requirements and infection rate growth in Indonesia


The data shows that infection rates peak when stay at home requirements are at their highest--The first peak is in January 2021, this is after 9 months of high level social limitation


The second peak is in July 2021, after 5 months of high level social limitation after the first peak


The third peak happened in february 2022, after 3 months of high level social limitation followed by 3 months of medium level limitation


Q4: How does Indonesia rank against other countries with similar population size and density?

This question is important to add further context to the data, because its not enough to just compare indonesia globally with countries that has different population size, density, and total cases.




INSIGHTS:

Indonesia has the highest case fatality rate in countries with similar population density (+-25)


Indonesia has the highest case fatality rate in countries with similar population size (+-100.000.000)


Indonesia has the 5th highest case fatality rate in asia


And Indonesia has the 2nd highest case fatality rate in countries with similar total cases (+-1.000.000)--This shows that Indonesia's CFR is very high in most contexts


Step 5: Sharing the results

The final step is the visualization of this data. I wanted to make this data as visually pleasing as possible with Tableau. So lo and behold, the final result.




Comentarios


bottom of page