My Summer with Super Computers
By Tianyang Chen.
Editors’ note: Tianyang Chen is 2nd year PhD student and CAGIS research assistant advised by Dr. Wenwu Tang. This past summer, Chen attended the SDSC Summer Institute, a week-long workshop that focuses on a broad spectrum of introductory-to-intermediate topics in high performance computing and data science.
It was a beautiful weather, sunny and cool, when I landed at San Diego and took a Lyft to the San Diego Supercomputer Center (SDSC). I was there to attend SDSC’s Summer Institute 2019, a program famous for supporting high-performance computing in a variety of computational settings, not just biggest or best funded. The SDSC is a leader in data-intensive computing and cyberinfrastructure[1], and has provided resources, services, and expertise to the national research community including industry and academia since 1985.
Housed at the University of California, this program is aimed at researchers in academia and industry, especially in domains not traditionally engaged in supercomputing, who have problems that cannot typically be solved using local computing resources. I felt I had an advantage, having worked with CAGIS’s high performance computing (HPC) clusters, but I wanted to learn more about machine learning and big data processing. Soon, we were going through orientation and introductions led by Dr. Robert Sinkovits, the Director of Scientific Computing Applications.
I found I was part of a very interesting cohort. Each of us were from different regions of academia, such as physics, biography, medical, bio-chemistry, geography etc. I enjoyed the self-introduction made by everyone from different research. The Lightning Rounds, which is a workshop where everyone shared their research, really opened my eyes to new ways researchers used HPC in their studies. I was amazed to see how supercomputing played such a varied and important role across a broad range contemporary scientific research.
My favorite part was when Dr. Sinkovits introduced the history of this center and gave us a tour of the Comet supercomputer. It was amazing to be in the same room with a petascale supercomputer with almost 2,000 nodes.
The five-day training exposed me to a number of topics, including version control with git for programming, introduction to science gateways, introduction to containers for scientific and High-Performance Computing, scientific visualization, GPU computing, parallel computing using MPI, and introduction to research data management. These courses provided a “big picture” for using parallel computing and associated data management, data visualization, and version control for programming.
Even though the courses were a sort of introduction to the cutting-edge technology and science pertaining to parallel computing, I was exposed to a lot of new, powerful and useful knowledge. Back at CAGIS, I feel now that I can better leverage our computing resources to conduct the study in spatial problems in geographic information science (GIS), challenges that face similar computational- and big data- challenges.
Thanks for the travel support by Dr. Tang’s grant: DeepHyd: A Deep Learning-based Artificial Intelligence Approach for the Automated Classification of Hydraulic Structures from LiDAR and Sonar Data funded by NCDOT.
[1] Cyberinfrastructure refers to an accessible, integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery (https://www.sdsc.edu).