I was talking with a dear friend at AWS (Amazon Web Service) casually as we do often,  he casually mentioned that they released an amazing data lake for COVID-19 which can help researchers and organizations to keep up-to-date.

There are dozens of COVID-19 (Corona Virus Outbreak) monitors out there, but there are limited datasets for it's spread and impact furthermore the impact on countries and of-course lake of updated data. So here comes AWS  COVID-19 Data Lake.

AWS COVID-19 Data Lake is a centralize data repository for COVID-19 (The Noval Corona Virus) datasets related to it's spread. It's already hosted at AWS cloud and ready to use freely by researchers around the world. Furthermore, it's up-to-date.  

Where are the data coming from?

we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI. We will regularly add to this data lake as other reliable sources make their data publicly available.

The impact of such effort on for researchers and organization is huge, because they will not be busy collecting, gathering and validating the results before putting it into use. AWS COVID-19 Data Lake saves them time and even provided them with the tools to put it in active use.

What do you need to use AWS COVID-19 Data Lake?

However, In order to use the data-lake, You should have the following:

  • Access to active AWS account
  • Permissions to create an AWS CloudFormation stack
  • Permissions to create AWS Glue resources (catalog databases and tables)

I am really impressed about the product, the effort and the ease-of-use, However, It'll be useful for experienced AWS users to deal with the Data Lake and the tools.

If you have access to AWS with an active account you can browse the actively updated COVID-19 data catalog in your account:

Amazon AWS COVID-19 Data Lake

Real time monitor with AWS COVID-19 Data Lake

If you are interested about what can be done with these datasets, here are some screenshot to the data-late results in action:

Current COVID-19 Stats

Active cases by country

Confirmed and Active by Date

Here is a link for the sample.


It's a huge effort but more of the researchers and developers don't have  AWS (Amazon Web Service) account and they are not aware of the required services as well. However, We are hoping to see some products based this rich Data Lake.

Photo by EVG photos from Pexels