-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdata-origin.tex
594 lines (466 loc) · 30 KB
/
data-origin.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\usepackage{todonotes}
\usepackage{array}
\marginparwidth=4cm
\title{Data Origin in the VO}
% see ivoatexDoc for what group names to use here
\ivoagroup{DCP}
%\author[????URL????]{G.Landais}
\author{G.Landais}
\author{A.Muench}
\author{M.Demleitner}
\author{R.Savalle}
%\author{looking for contributors}
%\author{????Fred Offline????}
\editor{G.Landais}
%\previousversion{NOTE-20230522}
\previousversion{NOTE-DataOrigin-1.0-20230522}
%\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
Data Origin in the VO specifies a set of metadata items that define basic
provenance information, as well as their representation in documents produced
by Virtual Observatory (VO) services. This will improve traceability for VO
users, help them to understand result sets and facilitate data reuse and citation.
\end{abstract}
\section*{Acknowledgments}
Alberto Accomazzi (ADS), Anne Catherine Raugh (University of Maryland), Rafaele d'Abrusco (CfA), Mihaela Buga (CDS)
\section*{Conformance-related definitions}
\section{Introduction}
Information on the origin of a piece of data is important for end users to understand data, for meaningful data citation and to improved their reusability. It is a part of provenance, which in turn is as a mandatory criterion in the GOFair\footnote{https://www.go-fair.org/} or RDA FAIR definition\footnote{https://doi.org/10.15497/rda00050}.
The Virtual Observatory (VO) provides an advanced framework to search for, query, and consume astronomical data. The specification of Data Origin proposed here for VOTable output include both metadata originating at the data producer (e.g, author, space agency, observatory) and at the data centre (publisher) hosting the resource.
At this point, depending of the implementation, users can find the information conveyed in Data Origin in the data centre web pages (landing pages) or in the VO Registry. For citation, the ADS (NASA Astrophysics Data System) offers comprehensive bibliographic capabilities, including the production of BibTeX records for publications known to ADS. However, there are no VO standards to communicate this type of information yet.
A list of basic added metadata, reliably findable in a convenient location (i.e.,
the VOTable produced by a query) will help users to properly cite or
acknowledge the data resources contributing to new or derived works.
Tracing Data Origin, from the producer of the query to the production of the response, also allows an end users to determine the different agents implied in data preservation (authors, data centre, space agencies, journal), which is particularly helpful when debugging. A typical scenario here is when mirrored data is the subject to potentially differing curation actions in the different publication processes.
The list of metadata items proposed here is designed to meet the needs of basic provenance
tracking when using current VO protocols.\\
The remainder of this note first presents the use cases guiding this
effort, then briefly lists the kind of metadata Data Origin concerns.
The core of the specification is in
section~\ref{sec:data-origin-in-votable}, which
describes the VOTable serialisation.
To complete the picture, the document includes in appendixes, a mapping of Data Origin items with the IVOA Registry schema (see appendix~\ref{sec:appendixB})
and a citation example (appendix \ref{sec:appendixC}) that illustrates how this information can be used.
\section{Use cases}
\subsection{Data Origin information}
Scenario: Researchers have data in a VOTable that shows an odd feature. They would now like investigate whether that feature is physical or an artefact.
Derived requirements:
\begin{itemize}
\item Contact information for producers must be present.
\item Researchers would like to clearly identify defined roles in well-known places regardless of the service which generated the result.
\item When data provided by the service is derived from external resources, those external resources are clearly identified. In that case, additional curation applied by the publisher can be detected.
\end{itemize}
For instance, a table published in a journal or by a space agency is also hosted in multiple data centre. The details of the table schema may depend on the data centre, which can add associated data, enrich metadata, or make a sub-selection of columns.
\subsection{Reproducibility}
A researcher revisits work they did six months earlier in an ad-hoc fashion and would now like to reproduce it in a more structured way. To do that, they need to know, say, which queries against which services, or perhaps which programs, produced the files.
Derived requirement: The request parameters and a service identification
(perhaps an ivoid in a narrower VO context, an access URL beyond that) must be available.
\subsection{Citation}
\label{sec:req-citation}
While preparing a publication, a researcher would like to properly cite the software and data that went into their results. They now run a program to extract that information from the digital artefacts going into the publication -- perhaps even in separate parts of citations and acknowledgments.
The type of information that would go into such a
citation can be assessed from this acknowledgement following a
convention of the American Astronomical Society:
\begin{quotation}
We searched optical astrometric data of these sources from the Gaia (Gaia Collaboration et al. 2016) Early Data Release 3 (Gaia Collaboration et al. 2021) via the CDS archive.
\end{quotation}
Derived requirement: The Data Origin must indicate requests for citation
and/or acknowledgment in a machine-readable way, preferably in a way
that machines can generate BibTeX entries for whatever they specify.
\subsection{Workflow bibliography}
A researcher has used a workflow engine to solve a complex science
problem. They now want to create a bibliography of everything that was
used to obtain the end result of the computation.
Derived requirement: Essentially as for use case~\ref{sec:req-citation},
except that the metadata extracted needs a higher level of homogeneity
and that relationships declared between different parts of Data Origin
might be useful.
\section{State of the Art}
Neither VOTable \citep{2019ivoa.spec.1021O} nor IVOA data access protocols at this point provide standard facilities for conveying Data Origin information. While protocols such as TAP \citep{2019ivoa.spec.0927D} have standard interfaces to retrieve table metadata (e.g., unit, type and description of columns) or metadata on service endpoints (``capabilities'') by virtue of providing VOSI \citep{2017ivoa.spec.0524G} endpoints, for basic metadata like authors or publication dates, clients have to consult the VO Registry. Even that may be difficult, because there is not even a standard way to obtain an identifier from a service itself.
HiPS \citep{2017ivoa.spec.0519F} is a more recent protocol which includes for each Dataset a list of standardized metadata. HiPS metadata includes authors, publication year, data centre identifier or licenses.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{fig-ext-ids.pdf}
\caption{Usage of DOIs, ORCIDs, and Bibcodes in a sketch of a
VOResource instance document.}
\label{fig:extids}
\end{figure}
\subsection{Data Origin in IVOA Registry}
The IVOA Registry contains metadata relevant for Data Origin such as authors,
publication dates,
or relationships to other resources. Of particular relevance for Data
Origin are those latter relationships using external identifier schemes
more persistent than their IVOA's own: DOIs, ORCIDs, or Bibcodes (see
Fig.~\ref{fig:extids}). The metadata schema is mainly defined by
VOResource \citep{2018ivoa.spec.0625P} and VODataService
\citep{2021ivoa.spec.1102D}, which also define an XML serialisation for it.
The Registry makes this information available through several interfaces, partly
hosted by the data centres themselves, partly provided by a central
infrastructure.
The VO Registry is an open framework without any moderators.
The IVOA hence does not guarantee the resources' continued availability.
In consequence, there are no guarantees as to the continued availability
of any metadata in central IVOA infrastructure; the Registry is
specifically \emph{not} designed as a persistent metadata repository
that artefacts could reliably refer to as metadata sources.
The IVOA Registry uses a unique identifier, the IVOID
\citep{2016ivoa.spec.0523D}, as the primary key for its resource
collection. By the above considerations, this IVOID is not suitable as a means of citation, because it is a technical identifier with no provisions for persistence.
Both the Registry's metadata schema and the DataCite
\citep{std:DataCite40} metadata schema have been
derived from Dublin Core \citep{std:DUBLINCORE}. While the extensions differ in detail, it is not
hard to write a (not unreasonably lossy) mapping between the two metadata schemas. This can enable
sustainable citation by enrolling VO resources in DataCite's persistence
mechanisms. Even so, the provision of extensive in-document metadata
has clear usability and practicality advantages, not to mention that many
pieces of data origin are unsuitable for a general metadata schema like
DataCite's.
\subsection{Data Origin and Provenance}
%The Provenance \citep{2020ivoa.spec.0411S} and Dataset Data Models can
%be used to express Data Origin.
The Provenance Data Model \citep{2020ivoa.spec.0411S} is based on Entities, Agents and Activities as defined in the W3C Provenance model. The model's main focus is the detailed documentation of workflows.
For the serialisation of ProvDM instances within VOTables, MIVOT \citep{2023ivoa.spec.0620M} is available. At this point, however, the relatively complex model and many free parameters (for instance: serialisation) are obstacles for a wide and direct adoption of ProvDM+MIVOT to represent Data Origin, in particular when compared to the very straightforward mechanisms proposed here.
%``Last-Step-Provenance'' is a Provenance extension currently under discussion which would define a list of metadata corresponding to Data Origin. Its output will not be recursive and could be easily serialized in a table.\todo{If we write this here, everyone will ask: Well, so why don't we wait for that? Perhaps we ought to just drop this?}
Other initiatives, in working progress, such as ``DatasetDM'', or
``Last-Step-Provenance'' show the growing interest of adding a piece of
Provenance. The metadata listed in Data Origin can be a reference for current
and future models interested by the information.
\subsection{DALI}
DALI \citep{2017ivoa.spec.0517D} defines common conventions for all
modern IVOA data access protocols.
%A part of it defines in-band signalling of error conditions or overflows -- an important part of Data Origin -- for VOTables.
It also defines bespoke names for \xmlel{INFO} elements used to convey
Data Origin-type metadata, in particular \emph{citation} and \emph{standardID}. In a sense,
this specification is an extension of the mechanism defined in DALI.
\section{Expected Data Origin}
% added 23-nov-2023
This document lists atomic metadata items that server software should
generate when producing query responses for VO clients.
It includes reproducibility metadata (see Table~\ref{tab:query-names})
that reflects the context in which a query was executed. The information
includes parameters allowing users to execute the query again as well as
parameters that will aid debugging in case a later execution of the
query does yield different results, such as version or the execution
date, which is particularly relevant when the resources' data content
can evolve.
The information is complemented by provenance metadata (see Table~\ref{tab:origin-names}) like authors, licence, references, or identifiers
for the resource or related resources (which could be IVOIDs, Bibcodes, or DOIs).
Most provenance metadata is generally provided through the Registry.
While giving relevant IVOID(s) would therefore in principle be
sufficient for the definition of the metadata, as discussed above for
persistent availability of the metadata, a serialisation
directly into the VO output is preferred
(see Table~\ref{sec:data-origin-in-votable}).
% end
% please don't comment out things in version control; it only makes
% changes harder to follow -- and you can always recover deleted
% material from the history if necessary.
% remove 23-nov-2023
\subsection{Condition for citation}
% MD: since I don't know what to do with this section, I've not done
% any editorial work on it.
The DOI is the privileged persistent identifier to cite resources.\\
%Data Citation requires a sustainable URL which is not guarantied in IVOA resources.
%Unlike ivoid, the DOI guaranties a sustainable URL and should be used for citation. \\
Data citation requires a persistent identifier and a sustainable URL.
Both are guaranteed by DOI, but resource provided with an ivoid (the IVOA identifier)
is not guaranteed to be sustainable.\\
BibTeX requires curation that needs metadata-like identifier, authors, title, publisher and date of publication.
ADS (NASA) provides a citation capability for its indexed resources. This curation quality has to be privileged or may be taken as example for any data providers and users.
For instance, DOI providers like Datacite, provides a BibTeX capability. The BibTeX quality depends of the DOI metadata filled by the Data producers and publishers.\\
The IVOA registry which contains metadata for any resources could be used to get the expected quality for citation if the following conditions are met:
\begin{itemize}
\item the registry resource includes a persistent identifier (DOI), typically in an \xmlel{altIdentifier} element
\item the registry resource includes the metadata which meets the BibTeX requirements
\end{itemize}
% move metadata list : 23-nov-2023
\section{Data Origin in VOTable}
\label{sec:data-origin-in-votable}
The metadata listed below combines terms from DALI \citep{2017ivoa.spec.0517D}, Dublin Core \citep{std:DUBLINCORE} and extensions in order to provide Data Origin information to end users.
\subsection{Query information}
Table~\ref{tab:query-names} lists the metadata items defined here to
convey query-related information in Data Origin.
These pieces of information enable linking Registry records to the
current result and, to some extent, reproducing the query executed.
For queries against evolving datasets, the request\_date item clearly is
particularly important.
\begin{table}
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{6cm}|l|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Key} & \textbf{Description} & \textbf{Level} & \textbf{Dublin Core}\\ \hline
ivoid & IVOID of underlying data collection & R & \\ \hline
publisher & Data centre that produced the VOTable & R & publisher\\ \hline
%rename 23-nov-2023 version & Software version (*) & & \\ \hline
server\_software & Software version (*) & & \\ \hline
service\_protocol & IVOID of the protocol through which the data was
retrieved& R& \\ \hline
request & Full request URL including a query string (**)& R& \\ \hline
query & An input query in a formal language (e.g, ADQL) & & \\ \hline
% removed in 23-nov-2023
%request\_post & (POST Request) POST arguments & & \\ \hline
% end
request\_date & Query execution date & R&\\ \hline
contact & Email or URL to contact publisher & & \\ \hline
\multicolumn{4}{p{\textwidth}}{\vskip 2pt\footnotesize(*) Operators are
encouraged to follow \citet{note:opid} in this item} \\
\multicolumn{4}{p{\textwidth}}{\footnotesize(**) For ``Simple''
protocols (regardless of the HTTP method), put the
application/x-www-form-urlencoded form of the query parameters in the
query part of the URL here.
More complex scenarios like UWS are not covered by this document.}
\end{tabular}\hss}
\caption{\xmlel{INFO} names available for specifying the query that
generated a VOTable}
\label{tab:query-names}
\end{table}
\subsection{Dataset Origin}
Dataset origin complements the query-related information to improve the
understandability of the underlying data. This information is intended
for end users. If the resource is also described in the Registry, care
must be taken that in-response metadata remains reflects the metadata
available there at the time the response is produced.
Table~\ref{tab:origin-names} lists the origin-related metadata items
defined here.
\begin{table}
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{6cm}|l|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Key} & \textbf{Description} & \textbf{Level} & \textbf{Dublin Core}\\ \hline
% removed 23-nov-2023 publication\_id & Dataset identifier that can be used for citation& M & identifier\\ \hline
citation & Dataset identifier that can be used for citation (e.g. dataset DOI) & R & identifier\\ \hline
reference\_url & Dataset landing page & & \\ \hline % previously landing_page
% removed in 23-nov-2023
%curation\_level & Controled vocabulary
% (IVOA rdf, content\_level) & & \\ \hline
% end
% modifier 23-nov-2023 resource\_version & Dataset version or last release & R & \\ \hline
resource\_version & Dataset version & R & \\ \hline
%rename 23-nov-2023 rights & (*) Licence URI & R & rights\\ \hline
rights\_uri & Licence URI (*) & R & rights\\ \hline
% removed 23-nov-2023 rights\_type & (*) Licence type (eg: CC-by, CC-0, private, public) & & \\ \hline
%rename 23-nov-2023 copyrights & Copyright text & & \\ \hline
rights & Licence or Copyright text & & rights\\ \hline
creator & \raggedright The person(s) mainly involved in the
creation of the resource; generally, the author(s).
& R & creator\\ \hline
editor & Editor name (article)& & \\ \hline
% removed 15-dec-2023 to use cites or is_derived_from
%relation\_type & An identifier of a second resource and its relationship to the
% present resource.
% Controlled vocabulary (**)& & relation\\ \hline
%related\_resource & Information about a second resource from which the present resource
% is derived. The source is an identifier that can be prefixed with the identifier type: eg: bibcode:, doi:, ror: & & source\\ \hline
article & Bibcode or DOI of a reference article & & relation\\ \hline
cites & An Identifier (ivoid, DOI, bibcode) of second resource
using relation type ``cites'' (**)& & relation\\ \hline
is\_derived\_from & An Identifier (ivoid, DOI, bibcode) of second resource
using relation type ``is\_derived\_from'' (**)& & relation\\ \hline
% remove 23-nov-2023
%publication\_date & Date of publication (DALI timestamp) & R & \\ \hline
%resource\_date & Date of the original publication (DALI timestamp) & R & date\\ \hline
%
original\_date & Date of the original resource from which the present resource is derived (DALI timestamp) & & \\ \hline
publication\_date & Date of first publication in the data centre (DALI timestamp) (***) & R & \\ \hline
last\_update\_date & Last data centre update (DALI timestamp) (****) & R & date\\ \hline
\multicolumn{4}{p{\textwidth}}{\vskip2pt\footnotesize(*) Following Registry
practice, this should come from
SPDX \url{https://spdx.org/licenses/}, though Creative Commons URLs
\url{https://creativecommons.org} are also admitted}\\
\multicolumn{4}{p{\textwidth}}{\footnotesize(**) \url{https://www.ivoa.net/rdf/voresource/relationship\_type/}}\\
\multicolumn{4}{p{\textwidth}}{\footnotesize(***) Equivalent to curation/date[@role='created'] in registry}\\
\multicolumn{4}{p{\textwidth}}{\footnotesize(****) Equivalent to curation/date[@role='updated'] in registry}
\end{tabular}\hss}
\caption{\xmlel{INFO} names available for specifying information
related to the origin of the data set(s) a VOTable was generated from}
\label{tab:origin-names}
\end{table}
\subsection{VOTable serialization}
In this document, we focused on a basic serialization that allows data
providers to describe individual tables.
This output is particularly suitable for protocols like Simple Cone Search.
The basic serialization uses INFO tags to populate Data Origin (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
INFO tags are allowed in VOTable under \xmlel{VOTABLE} or in \xmlel{RESOURCE} elements.
Thus, it becomes possible to annotate a collection of \xmlel{TABLE} which are in different resources.
This specification at this point does not constrain the multiplicities of individual INFO items, and clients should not fail hard if any given INFO item occurs multiple times.
Complex queries (for instance, resulting from ADQL JOIN-s) need an advanced output serialization to gather metadata by resource.
Mechanisms to manage this requirement are being developed in the IVOA
(MIVOT).
The mechanisms defined here are generally still applicable in these
cases, but the authors acknowledge that they are certainly stretched to
their limits there.
As a service to human readers, it is recommended to put descriptions, possibly derived from definitions provided in this document, into the bodies of the INFO elements.
\section{Data Origin in Registry}
%The ivo-id, now available in VOTable, allows to query the resource metadata which are in the VO registry.\\
%The registry schema \citep{2018ivoa.spec.0625P} can be mapped with Data Origin items.
The VO registry schema, which contains most of the Data Origin information, is completed by metadata described in VOResource \citep{2018ivoa.spec.0625P}.
This information (assuming it has been filled in by the issuer of the registration) can be requested via the ivo-id now available in VOTable.
The table \ref{tab:voresourcemapping} (see appendix \ref{sec:appendixB}) establishes the mapping between VOResource and Data Origin items.
% may be in an other note?
%
% UPDATE - licenses : type, uri as Datacite
% ADD - copyrights
% ADD - akcnowledgement
% ADD - curation text.. added values, column selection...., extract....
% list of Data Origin metadata in registry
\appendix
\section{Appendix, Cone search serialization}\label{sec:appendixA}
Simple Conesearch with its VOTable serialization. Data Origin are specified using INFO.
\begin{verbatim}
<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.1" xsi:schemaLocation=...>
<INFO name="server_protocol" value="ivo://ivoa.net/std/ConeSearch"/>
<INFO name="request_date" value="2022-10-30T12:08:00">Query execution date</INFO>
<INFO name="request" value="
https://vizier.cds.unistra.fr/viz-bin/conesearch/J/AJ/161/36/table8?
RA=28.4&DEC=39.3&SR=1"/>
<INFO name="contact" value="cds-question@unistra.fr">Publisher contact</INFO>
<INFO name="server_software" value="7.294">Software version</INFO>
<RESOURCE ID="yCat_51610036" name="J/AJ/161/36">
<DESCRIPTION>117 exoplanets in habitable zone with Kepler DR25</DESCRIPTION>
<INFO name="ivoid" value="ivo://cds.vizier/j/aj/161/36">
ivoid identifier to link registry
</INFO>
<INFO name="publisher" value="CDS">data centre</INFO>
<INFO name="landing_page"
value="https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/161/36"/>
<!-- Extra information from Data Origin (basic info) -->
<INFO name="citation" value="doi:10.26093/cds/vizier.51610036">
Identifier that can be used for citation
</INFO>
<INFO name="last_update_date" value="2022-10-07">Last Data Centre update</INFO>
<INFO name="rights_uri"
value="https://cds.unistra.fr/vizier-org/licences_vizier.html"/>
<INFO name="creator" value="Bryson S.">Author</INFO><!-- ORCID ? -->
<INFO name="cites" value="2021AJ....161...36B">
Reference article
</INFO>
<INFO name="editor" value="Astronomical Journal">
Journal of the reference article
</INFO>
<INFO name="publication_date" value="2021-03-16">
Date of first publication in the data centre
</INFO>
<INFO name="original_date" value="2021">Publication Date of the article</INFO>
....
<TABLE> .... </TABLE>
</RESOURCE>
</VOTABLE>
\end{verbatim}
\section{Appendix, VOResource and Data origin}\label{sec:appendixB}
Expected metadata (VOResource) with their equivalent in Datacite schema (version 4.4) to provide Data Origin in the registry.\\
%\begin{table}
\label{tab:voresourcemapping}
\begin{tabular}{|p{3cm}|p{3cm}|p{3cm}|p{5cm}|} \hline
\textbf{VOResource} & \textbf{Data Origin} & \textbf{DataCite} & \textbf{Explain} \\ \hline
identifier &ivoid & Identifier & ivoid of resource(s) hosted by the service\\ \hline
title & & Title& resource title\\ \hline
shortName &&& Resource short name\\ \hline
altIdentifier & & AlternateIdentifier&
Alternate identifier accepts bibcode, DOI or URL. DOI should be privileged to facilitate citation and link with DataCite or Crossref..eg: DOI \\ \hline
\end{tabular}
\begin{tabular}{|p{3cm}|p{3cm}|p{3cm}|p{5cm}|} \hline
\multicolumn{4}{|l|}{\textbf{curation}} \\ \hline
publisher & publisher & Publisher &publisher (*)\\ \hline
creator & creator & Creator & author(s) (*)\\ \hline
contributor & & Contributor & contributor(s) (*)\\ \hline
date [Created]& publication\_date & Dates [created] & creation date (in data centre)\\ \hline
date [Updated]& last\_update\_date & Dates [updated] & last modification\\ \hline
? & original\_date & PublicationYear & publication year in data centre\\ \hline
version & resource\_version & Version &\\ \hline
contact & contact &&\\ \hline
\multicolumn{4}{l}{\small \footnotesize(*) terms allowing name and Orcid (\xmlel{altIdentifier} in VOResurce)} \\
\end{tabular}
\begin{tabular}{|p{3cm}|p{3cm}|p{3cm}|p{5cm}|} \hline
\multicolumn{4}{|l|}{\textbf{content} } \\ \hline
source & article & RelatedIdentifier (*) & bibcode\\ \hline
referenceURL & reference\_url & & Landing page \\ \hline
type & & ResourceType & Resource type (catalog, etc)\\ \hline
description & & Description & Abstract\\ \hline
relationShip & & RelatedIdentifiers &Link to remote resource (Recommended to link Original data centre) \\ \hline
relationshipType & cites, is\_derived\_from & relationType &\\ \hline
relatedResource & cites, is\_derived\_from & RelatedIdentifier & \\ \hline
\multicolumn{4}{l}{\small \footnotesize(*) DataCite sub-properties type=bibcode, relationType=IsSupplementTo} \\
\end{tabular}
\begin{tabular}{|p{3cm}|p{3cm}|p{3cm}|p{5cm}|} \hline
\multicolumn{4}{|l|}{\textbf{rights}} \\ \hline
rights & rights & Rights& license
The right element accepts free text. However, it is preferable to provide a machine-readable License (*)
\\ \hline
URI & rights\_uri& rightsURI & license URL\\ \hline
& & rightsIdentifier & standard license name .ex CC-by.
Copyright is accepted by FAIR principle. But copyright is only a link to the data producer. It gives the contact point to any users who would like to use data. Copyright is more simple to implement for data-centre that provides a copy of original resource, but its use is not well integrated in an interoperable workflow.
\\ \hline
\multicolumn{4}{p{\textwidth}}{\small \footnotesize(*) See SPDX list \url{https://spdx.org/licenses/} or Creative Commons licenses \url{https://creativecommons.org}}
\end{tabular}\\
%\caption{Expected metadata (VOResource) with their equivalent in Datacite schema (version 4.4) to provide Data Origin in the registry.}
%\end{table}
%%\textbf{Examples}
%%
%%Examples of rights serialization:
%%
%%\begin{verbatim}
%%<rights rightsURI="https://spdx.org/licenses/ODbL-1.0.html">
%% ODbL-1.0
%%</rights>
%%\end{verbatim}
%%
%%\begin{verbatim}
%%<rights rightsURI="https://creativecommons.org/licenses/by/4.0/">
%% Creative Commons Attribution 4.0
%%</rights>
%%\end{verbatim}
%%
%%
%%Example or relation ship :
%%Cite the original dataset using "source" (to link a bibliographic reference) or "relatedIdentifier" (to link a dataset)
%%
%%e.g.:
%%\begin{verbatim}
%%<relationship>
%% <relationshipType>Cites</relationshipType>
%% <relatedResource>doi: 10.5270/esa-qa4lep3 : Gaia DR3 ESA</relatedResource>
%%</relationship>
%%\end{verbatim}
%%
%%Example of Creator:
%%\begin{verbatim}
%%<creator>
%% <name>Bryson S.</name>
%% <altIdentifier>orcid:0000-0003-0081-1797<altIdentifier>
%%</creator>
%%\end{verbatim}
\section{Appendix, Citation Template} \label{sec:appendixC}
Example of citation template that could be included in authors articles. The template exploits the metadata described in this document:\\
\textbf{Template}:\\
We extract data published in <article> (<creator>, <original\_date>),
via <publisher> services (ivoa resource=<ivoid>, <publication\_date>)
using <service\_protocol> (version <server\_software>, executed at <request\_date>\\
\textbf{Example}:\\
We extract data published in bibcode:2021AJ....161...36B (Bryson S., 2021),
via CDS services (ivoa resource=ivo://cds.vizier/j/aj/161/36, 2021-03-16)
using Simple Cone Search 1.03 (version 7.294, executed at 2022-10-30)
\section{Appendix, Changes from Previous Versions}
%No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
%\subsection{Data Origin in the VO Version 1.0}
\begin{itemize}
\item Changed vocabulary: \\
\textit{resource\_date} becomes \textit{last\_update\_date},
\textit{rights} and \textit{copyrights} becomes \textit{rights\_uri} and \textit{rights} (VOResource),
\textit{version} becomes \textit{server\_software} (https://ivoa.net/documents/Notes/softid/),
\textit{publication\_id} becomes \textit{citation} (DALI)
\textit{landing\_page} becomes \textit{reference\_url} (VOResource)
\textit{relation\_type} and \textit{related\_resource} to specific relation : \textit{cites} and \textit{is\_derived\_from} (VOResource)
\item Remove \textit{curation\_level},
\textit{request\_post},
\textit{rights\_type}
\item New item: \textit{original\_date}, \textit{article}
\item Clarify date items and multiple INFO in VOTable.
\item Propose added human-readable text for VOTable serialization.
\item Language smoothing
\end{itemize}
\bibliography{ivoatex/ivoabib,ivoatex/docrepo,local}
\end{document}