The Arecibo Observatory’s Big Data Program: Award Winning Preservation of AO’s Historic Dataset

Big Data Program


#AOScienceNow

The Arecibo Observatory was part of a coalition of institutions to win HPCWire’s 2021 Reader’s Choice Award for Best High-Performance Computing Collaboration for their efforts to “transfer petabytes of irreplaceable observation data to a safe place in proximity to capability-class computing to foster analysis” within weeks following the collapse of the 305-meter radio telescope at the Arecibo Observatory.

The Arecibo Observatory was well-situated to facilitate this critical task thanks to the creation of its Big Data Program in 2019 through a supplemental grant funded by the National Science Foundation. The program was created with the goal of improving data management and analysis by taking advantage of new cloud storage and computing capabilities.

Although the data storage project had already been in operation for several months, it became urgent to quickly move the data following the failure of the first of the telescope’s cables in August 2020, since the staff were not sure which areas of the facility would be impacted in the event of a total collapse.

That is when several institutes, including the Texas Advanced Computing Center, the Engagement and Performance Operations Center, the NSF Cyberinfrastructure Center of Excellence Pilot, and Globus at the University of Chicago, offered to assist Arecibo and the University of Central Florida. The coalition was able to accelerate the storing process and successfully secure decades of collected datasets.

“For us it’s an honor to be part of this great community that worked tirelessly to salvage decades of historical data,” says Mr. Francisco Córdova, director of the Arecibo Observatory. “We are incredibly thankful to everyone that helped us throughout this process.”

Mr. Córdova notes that the telescope’s ultimate collapse reinforces the need for programs like AO’s Big Data program. “As computing power and tools have become more accessible, a more structured approach to data management and analysis - especially at large facilities that are continuously accumulating large amounts of observational data - is needed.”

In addition to storing the data and making it accessible to all Arecibo users, the Big Data program was designed to search for newly discovered phenomena in datasets acquired over the telescope’s 50+ years of observations by improving the data processing time, incorporating real-time analysis of some of the recurring data sets, and implementing advanced machine learning techniques.

“There is great potential in the AO dataset,” says Mr. Córdova. “At the time much of these data were taken, we still did not know what an exoplanet was, or what a Fast Radio Burst detection would look like, so there is a good probability that signals of these phenomena are hidden in the data but we just weren’t able to recognize them with the technology we had at the time.”

As for the future of Arecibo’s Big Data program, Mr. Francisco Torres, a software engineer on the Big Data team says, “We have several other projects in the works. There is still so much we plan to do to help the scientific community access and utilize the wealth of data contained in the Arecibo archives so that the data can continue to be used to make significant contributions to the exploration of our universe.”

“We have several other projects in the works. There is still so much we plan to do to help the scientific community access and utilize the wealth of data contained in the Arecibo archives so that the data can continue to be used to make significant contributions to the exploration of our universe.” - Eng. Francisco Torres, Software Eng, team member of the Big Data Program




Eng. Francisco Torres
Big Data Prgoram
Arecibo Observatory
787-878-2612 ext. 252
francisco.torres@ucf.edu

Keywords: big,data, program, arecibo, observatory, cordova, torres