Human language (text or speech) is varied, nuanced, & not the optimal exclusive medium for learning complex systems, concepts, or processes.
Curate a corpus of graphics and diagrams that illustrate how blockchains esp. Cardano works, annotate w machine translation in 58 languages
This is the total amount allocated to A Graphic Cardano Knowledge Corpus.
Project Goals:
- Cultivate/Curate Cardano-related graphic knowledge corpus
The Primary Issues:
- Learning about Cardano shouldn't revolve exclusively around text
- There are too many languages to only address the Cardano non-English gap one language at a time, one text at a time
- Image-based artifacts generated via language-specific endeavors often are not shared across other languages
This Project:
Cultivates/Curates a Cardano-related graphic knowledge corpus. Then we will use OCR to recognize text embedded in each image and generate machine translated captions for each one in 58 languages. Artifacts in the corpus would be available for use as supplementary materials in other language specific endeavors (and said endeavors would be invited to add their images to the corpus). Finally multilingual Cardano community members are paid to confirm/improve the image captions and verify the quality of the images (priority for human vetting of machine translation will be given to Chinese, Spanish, French, Japanese, Portuguese, Hindi, Arabic, Bengali, and Amharic).
Project Timeline:
In the first 30 days:
- Establish initial criteria for Graphic Corpus inclusion
- Establish initial format for Graphic Corpus storage/location cataloging
- Establish initial Graphic Corpus metadata
In the first 90 days:
- Build Corpus w Seed Graphics and Diagrams
- Test Auto-Translate Capabilities on Seeded Corpus
- Establish vetting and connection protocols / processes for connecting and adding to the corpus
- Build initial Cardano Corpus interface across platforms (e.g., web, Reddit, MMS...)
In the first 180 days:
- Connect initial Corpus to platforms
- Invite existing communities and projects to connect their graphic content to the corpus
- Establish & test processes for graphic and translated caption quality confirmation/improvement via multilingual Cardonans
- Public Beta Release
By the end of a year:
- Integrate community-contributed/connected graphics to seeded corpus
- Implement community graphic and translation quality confirmation/improvement process
- Connect Cardano graphic corpus w vetted community-contributed/connected graphics to platforms
- Integrate corpus into larger existing service to ensure project continuance beyond funding period
Project Budget:
Design and Development:
- Corpus Planning, Design, Development $50 per hr * 100 hrs
APIs, Compute-Time, Storage:
- Auto-translation $10 per million characters of translation * 12 months (MS Azure)
- Cloud App Service $70 per month * 11 months (MS Azure)
- Cloud Storage Service $30 per month * 11 months (MS Azure)
- Auto-Caption Generation (Machine Vision OCR + Description) $2.50 per thousand images * 100 image blocks (1000 each)
Ingestion, Evaluation, and Translation:
- Graphic community resources ingestion $100 (ADA equivalent) per month split between all contributors * 9 months
- Graphic quality evaluation $200 (ADA equivalent) per month split between all contributors * 9 months
- Graphic caption translation $500 (ADA equivalent) per month split between all contributors * 9 months
Total: $13,670
PhD Learning Technologies
Teaches Natural Language Processing grad course
Designs multimodal web apps
10+yr dev experience