-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhats-ivoa.tex
95 lines (69 loc) · 8 KB
/
hats-ivoa.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\input gitmeta
\title{HATS: A Standard for the Hierarchical Adaptive Tiling Scheme in the Virtual Observatory}
\ivoagroup{Applications Group}
%\author[https://www.ivoa.net/authors/caplar]{Neven Caplar}
%\author[https://www.ivoa.net/authors/usher]{Melissa DeLucchi }
%\author[https://www.ivoa.net/authors/offline]{Mario Juric}
\author[https://www.ivoa.net/authors/offline]{LINCC team...}
\editor{editor here}
\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
The increasing complexity and volume of astronomical datasets necessitate efficient spatial indexing and query strategies within the Virtual Observatory (VO). The Hierarchical Adaptive Tiling Scheme (HATS) is a framework designed to facilitate scalable queries, filtering operations, and efficient data retrieval across large astronomical surveys. Traditional spatial indexing methods often struggle with the massive scale of modern astronomical datasets, leading to inefficient query execution and storage overhead. HATS provides a flexible, hierarchical approach that balances computational efficiency and adaptability to non-uniform data distributions.
This document describes the structure, implementation, and best practices for integrating HATS within the VO ecosystem, ensuring interoperability and performance optimization for distributed astronomical datasets. The reference implementation of HATS can be found at \url{https://github.com/astronomy-commons/hats}. Additionally, we discuss how HATS enhances existing indexing schemes, its role in federated data access, and its potential applications for time-domain astronomy, large-scale surveys, and cross-matching of astronomical catalogs.
\end{abstract}
\section*{Acknowledgments}
The authors thank the IVOA Applications Working Group and various contributors from the astronomical community for their feedback and discussions that shaped this standard. We acknowledge the key collaborations with Space Telescope Science Institute (STScI) and IPAC, whose expertise and contributions have been invaluable in refining the HATS framework. Additionally, we extend our gratitude to CDS for their assistance in metadata structuring and interoperability support. This work has also benefited from insights provided by the S-Plus survey, Linea, ESA, and CDAC, whose contributions have helped enhance the applicability and robustness of HATS in large-scale astronomical data analysis.
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{https://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
The rapid expansion of astronomical data from large surveys like LSST, Euclid, and Roman Space Telescope necessitates innovative solutions for spatial indexing and efficient data retrieval. These surveys generate vast amounts of high-resolution imaging and time-domain data, requiring efficient methods for organizing, querying, and cross-matching data across multiple archives. Traditional approaches to spatial indexing, such as hierarchical pixelization (e.g., HEALPix) or static tiling schemes, often exhibit inefficiencies when handling dynamic, multi-resolution datasets.
The Hierarchical Adaptive Tiling Scheme (HATS) is a novel approach designed to optimize spatial data partitioning while maintaining flexibility in accommodating varying data densities. Unlike fixed spatial partitioning methods, HATS dynamically adjusts tile sizes based on local data characteristics, ensuring an optimal balance between resolution, query efficiency, and storage management. By leveraging a hierarchical structure, HATS allows users to perform efficient multi-resolution queries while preserving high precision in regions of interest.
The HATS standard aims to define best practices for implementing and utilizing this tiling scheme within the Virtual Observatory framework. This document outlines the principles behind HATS, describes its data model, and provides recommendations for integrating it into existing VO services. Additionally, we discuss how HATS can facilitate efficient cross-matching of astronomical catalogs, accelerate large-scale spatial queries, and enhance interoperability between diverse astronomical datasets.
This note [does what?]
\section{Motivation and Goals}
The primary motivation behind HATS is to address the following challenges in astronomical data management:
\begin{itemize}
\item \textbf{Scalability:} Modern astronomical surveys generate petabytes of spatially distributed data, requiring an indexing scheme that scales efficiently with dataset size.
\item \textbf{Adaptive Resolution:} Fixed grid-based partitioning often leads to inefficient storage and query execution, particularly in non-uniformly distributed datasets. HATS dynamically adjusts tile sizes to accommodate varying data densities.
\item \textbf{Efficient Query Execution:} Spatial queries such as nearest-neighbor searches and cross-matching must be executed efficiently across distributed data repositories. HATS enables rapid indexing and retrieval of relevant data subsets.
\item \textbf{Interoperability:} Astronomical data is collected from diverse instruments and observatories, often using different spatial reference frames. HATS provides a standardized framework for integrating and harmonizing spatial data across multiple sources.
\end{itemize}
\section{HATS Design and Implementation}
The HATS framework consists of the following core components:
\subsection{Hierarchical Structure}
HATS employs a multi-level hierarchy where each level represents a progressively finer spatial resolution. This structure enables efficient spatial indexing, allowing queries to traverse different levels of granularity based on computational needs.
TODO: Explain how the structure works. Explain datasets folder. Leafs at the end, separated in directories at healpix level. LEafs can be folders.
\subsection{Adaptive Tiling Algorithm}
Unlike static partitioning schemes, HATS dynamically subdivides spatial regions based on data density. In regions with sparse data, larger tiles minimize storage overhead, whereas high-density regions are subdivided into smaller tiles to improve query efficiency. TODO: How is this done, split in 4 when above a certain limit, split either by number of rows or size
\subsection{Metadata and Auxiliary Files}
HATS implementations utilize auxiliary metadata files to store relevant information about the tiling structure, including:
\begin{itemize}
\item \textbf{partition\_info.csv:} TODO: what is inside
\item \textbf{point\_map.json:} TODO: what is inside
\item \textbf{properties:} TODO: what is inside
\item \textbf{metadata.json:} (Inside dataset folder)TODO: what is inside
\item \textbf{common\_metadata.json:} (Inside dataset folder) TODO: what is inside
\end{itemize}
\subsection{Structure of files}
TODO: Do we request parquet? Explain hats\_29 index. Explain not unique.
\subsection{Integration with Existing VO Standards}
HATS is designed to be compatible with existing VO spatial indexing frameworks, such as HEALPix and MOC (Multi-Order Coverage maps). TODO: explain more how is it compatible (if it s). Add how will it work with TAP
\subsection{Performance Considerations}
TODO: parquet enables columns. Rowgroups. Server-side filtering. Benchmarking plots
\appendix
\section{Changes from Previous Versions}
No previous versions yet.
\bibliography{ivoatex/ivoabib,ivoatex/docrepo}
\end{document}