- Leter from Dr. Julie Brisset (Principal Investigator of the Arecibo Observatory)13 Sep, 2022
- Arecibo Deputy Principal Scientist to Explore the Cosmos with the JWST02 Sep, 2022
- Letter from the Director22 Aug, 2022
- Piercing through the Clouds of Venus with Arecibo Radar17 Aug, 2022
- Summer greetings from the Facilities and Operations Team!17 Aug, 2022
- Arecibo Observatory at the Small Bodies Assessment Group12 Aug, 2022
- Meet the 2022 Arecibo Observatory REU students!11 Aug, 2022
- Meet Luis R. Rivera Gabriel, Research Intern in the Planetary Radar Group09 Aug, 2022
- Updates from the 2022 CEDAR Workshop in Austin, TX09 Aug, 2022
- Insights into the AAS Conference from AO Analyst Anna McGilvray08 Aug, 2022
- American Astronomical Society’s 240th Meeting: Plenary Lecture Building the Future of Radio Science with the Arecibo Observatory by Dr. Héctor Arce. 28 Jul, 2022
- TRENDS 202227 Jul, 2022
- Advancing IDEA in Planetary Science 27 Jul, 2022
- The Arecibo Observatory: An Engine for Science and Scientists in Puerto Rico and Beyond27 Jul, 2022
- Cryogenic Frontend work for the 12m telescope entering phase II21 Jul, 2022
- Remote Optical Facility Updates20 Jul, 2022
The Big Data Program: Arecibo Observatory Data Archive
Byadmin19 July 2021 #AOScienceNow
#AOScienceNow |
Through the Big Data Program at the Arecibo Observatory (AO), we are developing the Arecibo Archives Data Catalog to facilitate the access to AO's projects, observations, datasets, and attributes. Approximately half of the AO database is currently available in the catalog: ../../../www.naic.edu/datacatalog/
The purpose of the Data Catalog is to provide a user-friendly portal where users can browse, query, and explore the projects observed at Arecibo for more than 55 years. This catalog consolidates multiple data sources that have been built throughout AO's operation. The main component of the Data Catalog is the Projects Catalog, which provides all of the technical information about a proposal or project. This is essentially what the scientists would submit as a proposal to receive Arecibo observing time. The Data Catalog is complemented by the Observations Log, a Files Catalog and an Attributes Catalog. The Observations Log provides a detailed log recorded by the observing scientists for each project. The Files and Attributes catalogs contain all of the raw data files that were captured in the observations as well as key metadata of those files.
To build this catalog, the Big Data team worked to first identify and catalog all of the projects that have been done at Arecibo. This was no easy task since the data was stored in many formats throughout the years. For each format, the team created scripts that scraped or extracted all technical information from the documents and saved them into a database. This first step is the foundation of the Data Catalog.
In a similar way, the team extracted and compiled the Observations Log using log information that existed in different locations. Most of the observations were already saved in a database, making it easier to integrate into the catalog. The Files Catalog is being built as the datasets are copied to the Texas Advanced Computing Center. Once a dataset is copied, the team catalogs it and creates a record for it within the Catalog Database, keeping record of the file location, corresponding project, and size. Finally, the Attributes Catalog is being actively populated by extracting headers, metadata, and attributes from the raw files. This is being done using scripts that navigate through the server's paths and extracts the attributes from each file. This is catalogued and saved into a database that keeps record of all scientific attributes including related file name and project.
This catalog's importance is incalculable. It is the steppingstone to make Arecibo's Datasets accessible to the community and curious minds. The Data Catalog project is a computing strategy that will make the necessary data and resources widely available to the scientific community, continuing the Arecibo Observatory’s legacy of enabling groundbreaking new results about our atmosphere, our Solar System, and our universe.
Article written by Eng. Julio Alvarado Negrón
|
Big Data Manager |
Keywords: observatory, arecibo, data, big, data, catalog, texas, TACC, advanced, computing