-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use "mihi" to enhance scientific name finding and parsing #230
Comments
Searching gnverifier database got 20 names with mihi:
|
Looks like
|
I dont worry about Union, uBio, ION, and EOL, they are not human-curated, but AlgaeBase, CoL and WoRMS seem to have names with legitimate use of |
Many thanks, Dima! Name deduplication: I believe that for the sake of counting potentially affected names, the 20 name instances that you found can be deduplicated down to 15, as follows:
Deduplicated list of names:
|
Names by Plazi: Scientific name: Lithobius (Polyrbothrus) caesar mihi Scientific name: Lithobius (Polybothrus) leostygis subsp. mihi Scientific name: Lithobius leostygis mihi Result: The three (two when deduplicated) scientific name instances contributed by Plazi are false-positive digitization artifacts, including a misspelling. Deduplicated list of names v.2 (Plazi names cleared):
|
Anomaly: The name "Odonthophagus var. c mihi", coming from ION has so many anomalies that it seems irrelevant to GNA for name finding. Overall, the name can be considered a false positive for mihi and can be deleted from the list. Deduplicated list of names v.3:
@dimus, could someone check the "algal" and fungal names for you, so that we can know if they are true or false positives? A copy of the original publication would be desirable. |
Word Conferva geminata var. mihi Schwabe:
|
"Conferva geminata var. mihi Schwabe" may be hard to match. The combination is uncurated in AlgaeBase and there is no guarantee that it is an original combination. There are no recorded references for that combination. Confirming whether these are two combinations of the same name and whether the "mihi" is an artifact would require consulting with specialists familiar with the historical literature on Conferva and Oscillatoria. However, that is likely the case, as the author matches and there are currently combinations under both genera for a few species. |
So my understanding is that really we have only these known exceptions for the parsing rule:
|
Aeolesthes inhirsutus mihi seems another false positive. |
About "Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966" The "author" is only Dussart, as he is the sole responsible for Copepoda in that publication. The name string "Eucyclops serrulatus var. mihi" is apparently styled correctly (pages 270 and 278). However, this is a printing artifact which became a database artifact. Dussart stated on pp. 270-271 (translated): "The differences existing between these two forms are not sufficient to give a name to the variety with the spine of P5 slender. I need only mention its existence...". Also, as per the first edition of the International Code of Zoological Nomenclature (1961), "Article 15. Names published after 1960. — After 1960, a new name proposed conditionally, or one proposed explicitly as the name of a "variety" or "form" [Art. 45e], is not available." (https://www.biodiversitylibrary.org/page/34584570). This further points at an unnamed form by Dussart (1966), the "mihi" in this case also being a false positive that does not need to be added to the exceptions, at least from the nomenclatural point of view. |
Hmm, looks like situation is even more interesting with https://www.biodiversitylibrary.org/item/181042#page/535/mode/1up
I wonder if a better approach to |
Thank you @Archilegt for interesting information about |
Hi @dimus |
Hi @dimus
"...in zoology old names with var. or f. sometimes are promoted to subspecies rank?" |
Hi @dimus |
I do not have yet https://github.com/gnames/gnparser/blob/master/testdata/test_data.md#names-with-mihi I think it is reasonable enough to close the ticket for now, especially because the parser does not deal with names that happen in biological texts, and it is extremely rare to have If more concerns will appear about |
Dima, please note that |
Ah thank you for spotting it @Archilegt!
Making gnfinder ticket about it gnames/gnfinder#125 |
Ok. If the parsing of |
@Archilegt, do you think it makes better sense to parse I tend to think about this string as an indication of implicit authorship in two places, kind of similar to |
"do you think it makes better sense to parse Characium obovatum mihi. var. longipes mihi as Characium obovatum with var. longipes mihi as an unparseable tail? The parser does assume that a string must have only one name." "I tend to think about this string as an indication of implicit authorship in two places, kind of similar to Aus bus L. cus K."
Example for Fig 3.
Does it make sense? |
I think what you say is more of a job for Lets say with lowest parsing quality 4 and 2 warnings: {
"parsed": true,
"quality": 4,
"qualityWarnings": [
{
"quality": 4,
"warning": "Unparsed tail"
},
{
"quality": 3,
"warning": "Ignored annotation `mihi`"
}
],
"verbatim": "Characium obovatum mihi. b. var. longipes mihi",
"normalized": "Characium obovatum",
"canonical": {
"stemmed": "Characium obouat",
"simple": "Characium obovatum",
"full": "Characium obovatum"
},
"cardinality": 2,
"tail": " b. var. longipes mihi",
"details": {
"species": {
"genus": "Characium",
"species": "obovatum"
}
},
"words": [
{
"verbatim": "Characium",
"normalized": "Characium",
"wordType": "GENUS",
"start": 0,
"end": 9
},
{
"verbatim": "obovatum",
"normalized": "obovatum",
"wordType": "SPECIES",
"start": 10,
"end": 18
}
],
"id": "e65f7279-c3f1-5719-9058-a3c024719fde",
"parserVersion": "v1.6.7"
} |
The Latin word "mihi" was used by authors when proposing new scientific names, with the meaning of "me". The word could be used as a marker for "scientific name ends here", and could enhance scientific name finding if coupled to "search for scientific name 1, 2, 3 words ahead".
The word could also be used for adding "interpreted authorship" (author+date) to scientific names instances if coupled to the publication (book, article) metadata where the scientific name instance is matched, therefore potentially helping to disambiguate homonyms.
A quick glance at the occurrence of the word in BHL: https://www.biodiversitylibrary.org/search?searchTerm=mihi&stype=F#/titles
Maybe it would be worth trying at least the "scientific name ends here" suggestion? :)
The text was updated successfully, but these errors were encountered: