As mentioned in a previous post, the Entomological Society of BC has been working hard to improve the Society’s impact by enhancing our online presence to reach more people in BC, as well as the rest of Canada and the world. We’ve undertaken a massive project to digitize our entire journal archive, and are in the process of moving all 108 volumes of the Journal of the ESBC online (plus occasional papers, supplementary reports, and the Quarterly Bulletins dating back to 1906!).
This project will provide unprecedented public access to the Society’s publications, and you can help! With scanning and OCR completed, we need to extract the metadata for each article, including the abstracts and references, so that we can import them into our new online journal system (link to http://journal.entsocbc.ca). We are recruiting volunteers to assist with this step, as well as creating cover images from the scanned volumes.
- Flexible and easy;
- Can be done from anywhere (you don’t have to live in BC!);
- Opportunity to explore the history of entomology in BC;
- Work closely with ESBC journal and web editors;
- Contribute directly to the establishment of a permanent, online, open-access repository of entomological knowledge in BC;
- Ideal for students or anyone with an interest in entomology and community service.
Contact Alex Chubaty (webmaster@entsocbc.ca) if you are interested in contributing time towards this project.
This post is also available in: Français
Some of this manual effort could be eliminated if something like https://github.com/CrossRef/pdfextract is used. I’ve used this before and it does a fairly good job at extracting article-level metadata and the references section. It won’t be perfect, but it will help accelerate the process. Once you have the references, you may also parse each of these using the web service associated with http://biblio.globalnames.org.
…and further to my previous comment, if you are interested in extracting the scientific names in each article, you are welcome make use of the web services for http://gnrd.globalnames.org and http://resolver.globalnames.org.