Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error converting xml to dataless #53

Open
anss-sis opened this issue Jul 28, 2020 · 35 comments
Open

Error converting xml to dataless #53

anss-sis opened this issue Jul 28, 2020 · 35 comments

Comments

@anss-sis
Copy link

The stationxml seed converter failed to generate dataless from https://files.anss-sis.scsn.org/preview/WR_SHER.xml . Note this file is available only for a day. I was not able to attach it to the ticket, I can email it to you if you prefer.

Command ran:
/usr/bin/java -jar stationxml-seed-converter-2.1.0.jar -s -o WR_SHER.dataless WR_SHER.xml

Output:
java.lang.NumberFormatException: Couldn't format number!-150.1 -150.1 [5 6 ]
at edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth(SeedStringBuilder.java:204)
at edu.iris.dmc.seed.control.station.B052.toSeedString(B052.java:296)
at edu.iris.dmc.seed.AbstractBlockette.getSize(AbstractBlockette.java:72)
at edu.iris.dmc.seed.Volume.add(Volume.java:266)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:112)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:48)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:426)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:70)
at edu.iris.dmc.station.Application.convert(Application.java:189)
at edu.iris.dmc.station.Application.run(Application.java:127)
at edu.iris.dmc.station.Application.main(Application.java:70)
edu.iris.dmc.station.mapper.MetadataConverterException: java.lang.NumberFormatException: Couldn't format number!-150.1 -150.1 [5 6 ]
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:329)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:48)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:426)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:70)
at edu.iris.dmc.station.Application.convert(Application.java:189)
at edu.iris.dmc.station.Application.run(Application.java:127)
at edu.iris.dmc.station.Application.main(Application.java:70)
Caused by: java.lang.NumberFormatException: Couldn't format number!-150.1 -150.1 [5 6 ]
at edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth(SeedStringBuilder.java:204)
at edu.iris.dmc.seed.control.station.B052.toSeedString(B052.java:296)
at edu.iris.dmc.seed.AbstractBlockette.getSize(AbstractBlockette.java:72)
at edu.iris.dmc.seed.Volume.add(Volume.java:266)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:112)

@timronan
Copy link
Collaborator

-150.1 used as a a local depth value, -150.1, is too long to be formatted in dataless seed. The SEED manual states that Blockette 52 Field 13 can only have a length of 5 characters and this value is a length of 6. The conversion cannot occur because the output dataless would be off by 1 byte which would create a corrupt output dataless file.

The stationmaster-seed-converter is working as expected in this case. The stationxml is not incorrect but it is incompatible with dataless SEED conversions. If this file needs to be converted consider removing the '-' or '.1' from -150.1.

In general, the converter cannot convert from stationxml fields to dataless that have values which do not fit in the dataless predefined byte count. Refer to the SEED manual when looking at specific fields.

@yazan-iris
Copy link
Contributor

The converter should take care of this.

@yazan-iris yazan-iris reopened this Jul 29, 2020
@chad-earthscope
Copy link

I'm not sure what a negative value for a depth means at this scale, is the sensor on a 150.1m tower?

This field is "The depth of the sensor under the local surface ground level, in positive meters." Negative is allowed if defining something above the surface (on a pier, platform, etc.), but it's rarely used for that because it's usually not at all important for whatever is being recorded.

@yazan-iris
Copy link
Contributor

The error you received is due to invalid input value (- 150), it is not because the number is too long even though the message states so. A more appropriate exception should be thrown. The Seed manual has this for description:
B052: 13 Local depth (m) D 5 “###.#”

I will leave this open so we fix the exception description.

@yazan-iris
Copy link
Contributor

Seed does not allow negative numbers for depth.

@timronan
Copy link
Collaborator

That's true, it seems like we should fix the exception rather than the metadata. The converter should make minimal metadata alterations. Users should fix the data after the exception is thrown.

@chad-earthscope
Copy link

Seed does not allow negative numbers for depth.

I agree, that that is best interpretation based on the field pattern. Unfortunately, the text description mentions negative numbers, so there is a conflict in the spec. The wording mentions 'elevations' but the values were adjusted a bit, clearly wasn't very thoroughly edited. So who knows what the consensus intention was.

@timronan
Copy link
Collaborator

@chad-iris it is true that the SEED manual is contradictory and that is why I was initially confused about whether the length or the negative sign caused this exception. I just replaced <Depth>-150.1</Depth> with <Depth>-150</Depth> and received the error below. So we know it is not a length issue. Maybe we should strip negative sign on depth when converting from StationXML to SEED. @chad-iris is correct that a negative depth is confusing and should maybe reported as an elevation or is a data error. Stripping the negative sign could change the position that the data creator intended to use.

java.lang.NumberFormatException: Couldn't format number!-150.0 -150.0 [5 6 ]
at edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth(SeedStringBuilder.java:204)
at edu.iris.dmc.seed.control.station.B052.toSeedString(B052.java:296)
at edu.iris.dmc.seed.AbstractBlockette.getSize(AbstractBlockette.java:72)
at edu.iris.dmc.seed.Volume.add(Volume.java:266)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:112)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:48)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:426)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:70)
at edu.iris.dmc.station.Application.convert(Application.java:189)
at edu.iris.dmc.station.Application.run(Application.java:127)
at edu.iris.dmc.station.Application.main(Application.java:70)
edu.iris.dmc.station.mapper.MetadataConverterException: java.lang.NumberFormatException: Couldn't format number!-150.0 -150.0 [5 6 ]
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:329)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:48)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:426)
at edu.iris.dmc.station.converter.XmlToSeedFileConverter.convert(XmlToSeedFileConverter.java:70)
at edu.iris.dmc.station.Application.convert(Application.java:189)
at edu.iris.dmc.station.Application.run(Application.java:127)
at edu.iris.dmc.station.Application.main(Application.java:70)
Caused by: java.lang.NumberFormatException: Couldn't format number!-150.0 -150.0 [5 6 ]
at edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth(SeedStringBuilder.java:204)
at edu.iris.dmc.seed.control.station.B052.toSeedString(B052.java:296)
at edu.iris.dmc.seed.AbstractBlockette.getSize(AbstractBlockette.java:72)
at edu.iris.dmc.seed.Volume.add(Volume.java:266)
at edu.iris.dmc.station.converter.XmlToSeedDocumentConverter.convert(XmlToSeedDocumentConverter.java:112)
... 6 more
[2020-07-29 12:31:24] [SEVERE] edu.iris.dmc.station.Application run: edu.iris.dmc.station.Application.exitWithError(Application.java:204)
edu.iris.dmc.station.Application.convert(Application.java:194)
edu.iris.dmc.station.Application.run(Application.java:127)
edu.iris.dmc.station.Application.main(Application.java:70)

@yazan-iris
Copy link
Contributor

That's true, it seems like we should fix the exception rather than the metadata. The converter should make minimal metadata alterations. Users should fix the data after the exception is thrown.

Don't mean to drag this further but we do truncate numbers/text to fit Seed required field lengths, I believe its a good practice. It is not the case here though. This is about Seed not accepting negative numbers, the user should fix the number and we should log a better message. if(depth<0)throw new SeedException("B052: Field:13 Local depth cannot be negative");

@yazan-iris
Copy link
Contributor

We do not however change number signs, throwing a better exception is the way to go (I think).

@chad-earthscope
Copy link

I agree with @yazan-iris, the converter should not be changing the sign of values. Certainly when it's not a clear violation of StationXML and SEED, which is not the case currently.

@timronan timronan reopened this Jul 29, 2020
@anss-sis
Copy link
Author

The field in question is not depth. It is elevation blockette 52 field 12. Negative values are permitted here. It is listed as type D, length 7 and format “-####.#” . In fact the number -150.1 is valid per SEED.

In case a longer value was present in the xml I think the converter should truncate it. It does it for latitude / longitude currently. I confirmed the behavior by publishing a station in SIS that has 7 digits after the decimal point for longitude. https://files.anss-sis.scsn.org/preview/BK_DCMP.xml and the converter truncated it to fit the SEED convention: https://files.anss-sis.scsn.org/preview/BK_DCMP.dataless

Prabha

@chad-earthscope
Copy link

The document referenced at https://files.anss-sis.scsn.org/preview/WR_SHER.xml contains:

<Channel code="HNN" startDate="2013-05-16T22:55:00+00:00" endDate="2014-10-14T15:30:00+00:00" locationCode="03">
...
  <Depth>-150.1</Depth>

So there is definitely a large negative depth value in what we are looking at converting.

@yazan-iris
Copy link
Contributor

The error you referenced above:

java.lang.NumberFormatException: Couldn't format number!-150.1 -150.1 [5 6 ]
at edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth(SeedStringBuilder.java:204)

is localDepth specific and not Elevation, please confirm.

@timronan
Copy link
Collaborator

timronan commented Jul 29, 2020

The stationxml is <Depth>-150.1</Depth> not elevation. It translates to B52:F13 Local depth.

To be clear, this is a bug in the metadata. This issue originates from B52:F13 . Keep searching for the value -150.1 and you will find it. Use vim to search, it works well.

@anss-sis
Copy link
Author

Oh now I see that the depth is set to -150.1 for 3 channels. It is also set as the elevation for other channels and that threw me off. If you changing what error is thrown will you be able to write out the line number in the file where the error occurred.

@timronan
Copy link
Collaborator

We probably won't be able to write out a line number because of how the converter works. This error is thrown during the edu.iris.dmc.io.SeedStringBuilder.appendLocalDepth process which happens after the stationxml has already been unmarashelled. Writing a line number would probably require us to do a preemptive validation check, which would probably make the converter significantly less performant. We will work on the exception and see what we can do. For now, regex searching using the error messages has proven to be a robust method for finding isolating issues.

@chad-earthscope
Copy link

I agree a line number is not likely useful, an XML document can all be on one "line" for example.

How about including more context like the name of the element?

@anss-sis
Copy link
Author

I take back the line number request. But any extra information would be helpful. I have been in communication with the person who added this site to SIS. He said that he has successfully published sites in SIS with negative depth which he uses for structures (buildings, towers etc). And yes I can confirm that your converter has writted out negative depth for other sites like this one https://files.anss-sis.scsn.org/production/dataless/WR/WR_DCS.dataless
The SEED manual for blockette 52 field 13 notes have more detail which sort of contradicts the mask in the table. "The local depth or overburden of the instrument’s location. For downhole instruments, the depth of the instrument under the surface ground level. For underground vaults, the distance from the instrument to the local ground level above. Surface instruments can use zero. If negative elevations are less than -999 m, the decimal value should not be used. This allows depths up to -9999 m to be accommodated."

Can the converter truncate the depth to fit in the space available ? In this case it would have written out -150.

@yazan-iris
Copy link
Contributor

No negative values for depth per the mask in the Seed manual.

@timronan
Copy link
Collaborator

@anss-sis you are correct. -15 correctly converts. So the issue is actually a length issue and is not associated with the -. The converter adds .0 to a int depth value maintain the correct number of bytes. The problem is that we can't cover the use case, "This allows depths up to -9999 m to be accommodated." We can't deal with negative numbers with 3 decimal places or positive numbers with 4 decimal places. The problem about positive numbers with 4 decimal places is that that some filler value, probably a 0 would have to be inserted. As I originally stated, the converter is throwing an accurate error.

Truncating the negative sign - is dangerous because it changes the meaning of the data. If we truncate this negative than the users intended elevation value once again becomes a depth. The example above showed that the user wanted a negative depth value. Users need to purposely alter values. In general we try to avoid automatic data alterations because they confuse users.

Truncating decimal places to accommodate for negative values also seems dangerous. The outstanding questions are: do we want to accept a negative sign for B52:F13? I personally think this is not robust and may lead to problems in dependent programs, but the SEED manual description is confusing.
If we do accept a negative sign do we truncate decimal places to accommodate for it?
How do we correctly 0 pad values so we don't alter our byte sequence?

@yazan-iris
Copy link
Contributor

Negative depth is not allowed as described by the Seed manual, if the converter allows it then it is a bug and should be fixed accordingly.

@ghost
Copy link

ghost commented Jul 30, 2020

Sorry to butt in, but I feel as a daily user, I have a horse in this race.

The original XML is erroneous.

It appears that elevation and depth are getting mixed up several times.

Look at the two below examples:

<Channel code="HNE" startDate="2013-01-18T16:15:00+00:00" endDate="2013-05-16T22:54:00+00:00" locationCode="02">
...
<Latitude>38.030591</Latitude>
<Longitude>-121.745258</Longitude>
<Elevation>-14.8</Elevation>
<Depth>17.1</Depth>
<Channel code="HNE" startDate="2013-05-16T22:55:00+00:00" endDate="2014-10-14T15:30:00+00:00" locationCode="02">  
<Comment subject="Install a configured logger and its instruments">  
...  
<Latitude>38.030591</Latitude>
<Longitude>-121.745258</Longitude>
<Elevation>17.1</Elevation>
<Depth>-14.8</Depth>

It then switches back in later epochs to the first one.

Similar with 03-HNE the channel that started this conversation.

Went from:

<Elevation>-150.1</Elevation>
<Depth>152.4</Depth>

To

<Elevation>152.4</Elevation>
<Depth>-150.1</Depth>

Either the ground elevation rose 300 meters in the Bay area for a year and a half, while the seismometer was mounted on a 500ft tower, then sunk back down to exactly the original depths OR more likely its a typo and the person entering these numbers simply mixed up depth/elevation.

Perhaps the converter should flag the exception better, but this is clearly a metadata issue, not seismometers being put on towers.

@rcasey-earthscope
Copy link
Contributor

rcasey-earthscope commented Jul 30, 2020 via email

@anss-sis
Copy link
Author

@jholland-usgs - I will forward your comments to the user who maintains the metadata. He has made several changes to the channel metadata since the initial error and has published the stationxml and dataless successfully. The metadata problem is a red herring and so I would like to bring the conversation back to the converter.
@yazan-iris although the mask indicates no negative sign, the detailed notes that I quoted from the SEED manual indicates that negative sign is allowed for depth. Negative depth has been used in other channels and the stationxml-seed-converter handles it properly as long as it is > -100.
@timronan - The negative sign cannot be truncated since that would change the meaning of that value. Would writing it out as "-0150" or "-150." be allowed in SEED?

Prabha

@timronan
Copy link
Collaborator

@anss-sis we will follow the SEED manual mask definition because it is the most concise and leaves little room for interpretation. Furthermore, this is the historic standard. Negative signs will not be allowed in B52:F13. An error will be thrown if a negative value is encountered when converting from <depth></depth> to B52:F13.

IF depth < 0 THEN THROW depth exception THEN EXIT with ERROR

We are not going to write depth in SEED as "-0150" or "-150.". A hanging . seems like a very dangerous character. It likely changes the type of the value from a float to something else (ex. string) depending on the program reading the dataless. Because we have to consider dependent programs, we must follow the SEED manual mask and are not going to reformat dataless fields based on interpretation.

Any previous data that was converted with a negative depth values exploited a bug in the code, is incorrect metadata, and should be identified and corrected. Negative depth values should be reported in elevation B52:F12.

@anss-sis
Copy link
Author

anss-sis commented Aug 3, 2020

I'd like to request that this workaround --- adjusting the elevation that the depth is >=0 -- be done on part of the the converter and that the converter report a warning.
Modifying the source data is not ideal. The main reason is that it now makes the fields of depth and elevation ambiguous. Elevation could now have a mixture of the two values. It's one thing that dataless cannot handle negative depths - but should we push this deficiency onto FDSN StationXML - which can handle it? - Ellen Yu

@timronan
Copy link
Collaborator

timronan commented Aug 5, 2020

@anss-sis The converter cannot adjust data in this way. It is the users responsibility. We could potentially throw an accurate error that that will alert the user if the elevation or depth are less then 0, but we cannot change any data.

Can you clarify the logic of

adjusting the elevation that the depth is >=0.

Does that mean:

Elevation must be greater or equal to 0

Depth must be greater or equal to 0

or something else? Also can you clarify the conversion path that you want adjusted. StationXML to Dataless or Dataless to StationXML. The direction of conversion matters.

A negative depth is generally confusing and should be avoided.

@timronan
Copy link
Collaborator

@anss-sis have you thought about the logical statements that we are trying to catch? I would like to start working on this issue.

@anss-sis
Copy link
Author

Hi Tim,

We discussed it at length here. First of all - I should say that at some point SIS will stop creating dataless files as we push networks to use stationXML as the primary metadata format. So the behavior of the converter will stop having impact on SIS operations in the intermediate future.

While negative depth might be unusual or perhaps unwanted in weak motion seismic installations, they might be more common in building arrays. Also some of our user base are with volcano observatories -which may deploy environmental sensors and may want to describe their response. So whether they are accepted in dataless or not, we will be dealing with these type of installations. It would be more confusing to only list depths as positive values for these sites.

My original suggestion was if < 0, then set channel depth to 0 and make = + -- My understanding is in dataless, the channel depth cannot be negative but channel elevation could. This logic would keep channel depth >=0. Yes, it could be viewed as changing the data - but the change is consistent and I would argue similar to other transformation the converter does - such as truncating extra decimal places or changing the case of mixed case strings. The converter is simply taking what is legal in stationXML and converting it to its legal value in dataless.

Does that answer your question about the logic requested? - Ellen

@timronan
Copy link
Collaborator

That answers my question, thanks for the clarification. I'll get back to you with a response. We need to talk this over.

@timronan
Copy link
Collaborator

Hi Ellen,

This is the logic we thought may work for the conversion.

Conversion: Stationxml -> Dataless

IF (depth < 0) THEN depth = 0 AND elevation=elevation+depth

The SEED manual (pg 68) states:
To find the local ground depth, add the depth field to the instrument elevation.

Following this logic no information is lost but it is reported differently. What do you think?

@anss-sis
Copy link
Author

Yes - that is what I was thinking. And then maybe the code could report a warning about how it mapped the channel depth.
-Ellen

@timronan
Copy link
Collaborator

We can do this, it seems like the best solution. I'll work on the update and let you know when it is ready to test. Thanks for working with us to figure out how to handle this metadata conversion correctly.

@chad-earthscope
Copy link

On further reflection and internal discussion at the DMC, we are going to convert negative depth values in StationXML to negative values in dataless SEED. When necessary to fit into the width of the SEED field, the converter will reduce the resolution of the depth by rounding in order to match the original value as closely as possible. Specifically, values will be rounded to single decimal precision, the maximum resolution allowed by the field, and if still too large will be rounded to integer precision.

Rationale: The SEED manual is inconsistent in the description of a channel's depth, the "mask" defines an unsigned value whereas the description includes discussion of negative values. Either signed or unsigned values can be justified by the standard. Furthermore, there is clearly a use for network operators to document negative depths for sensors in buildings or on piers/platforms that are above the ground surface. Finally, a review of common SEED-reading software shows that the significant majority read this value as signed, including rdseed, and have no problems with negative values. The PDCC software was the only notable exception and care should be taken when using it with metadata that includes negative depth values.

As a general rule, we do not want to modify a metadata author's values, and to minimize modification when it cannot be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants