The goal of this project is to extract scientific names from Ruhoff 1980
Cleaned up data file
-
Make OCR
-
Fix spaces in species names
-
Fix commas which were recognized as periods.
-
Extract name part (06-names.csv first column is the place to fix errors)
Names | Number | Percentage |
---|---|---|
Total | 35487 | 100% |
All Matches | 26799 | 75.4% |
No Match | 8688 | 24.6% |
Canonical + Auth. Match | 22311 | 62.8% |
Canonical Match | 3448 | 9.7% |
Fuzzy Canonical Match | 1040 | 2.9% |