Join Now

Container Orchestration, Cloud, and Petabytes of Data: The Rubin Observatory Example

Abstract:

The Rubin Observatory is expected to revolutionize astronomy by conducting the largest and deepest survey of the universe to date. Celestial objects and their physical properties will be identified and stored in a database that will eventually be composed of thousands of billions of entries. With a volume of several tens of petabytes, this catalog will play a major role in the scientific exploration of the data produced by the telescope.

To satisfy this need, a specific distributed database software, named Qserv, is actively developed by a team of ten experts based at Stanford University in the United States and in France. During this presentation, we will detail the DevOps techniques implemented by the team, which allow Qserv and its data to be readily deployed and orchestrated on hundreds of nodes both in conventional academic data centers and on Google’s cloud-computing platform.

We will also present designs that allow loading of terabytes of data in Qserv in a few minutes, and scaling strategies for continuing on into the petabyte range. In addition, we will explain how Kubernetes has drastically accelerated Qserv’s large-scale operations and has enabled the solution to be deployed internationally, whether in the Cloud, on Openstack, or on conventional clusters.

Since 2019, our team has been working closely with Google Cloud Platform teams to facilitate deployment of Qserv on modern Cloud-Computing infrastructures, and we also propose to give feedback on this collaboration, and how we manage and secure petabytes of astronomical data in the cloud.

Speakers:

Fabrice Jammes | Senior Cloud Architect | Rubin Observatory Data Management Group
@VRubinObs @FabriceJammes | linkedin.com/in/fabrice-jammes-5b29b042

Fritz Mueller | Technical Manager | SLAC National Accelerator Laboratory
@SLAClab | linkedin.com/in/fritz-mueller-062679a