For my link archive: [Archive.is] Flexible and Economical UTF-8 Decoder.
Be sure to read the whole article there as the explanation of the initial algorithm is important and final algorithm is towards the end.
The foundation is a state machine combined with a lookup table to find the initial state and proceed to subsequent states.
Related (and reminder to check what David did):
- [WayBack part 1, WayBack part 2] Having been a little underwhelmed by the performance of TStreamReader when reading huge text files line by line, I attempted to roll my own. I managed … – David Heffernan – Google+
- [WayBack] More FAQs about Unicode in Tiburón – Community Blogs – Embarcadero Community (TFDD TextFile Device Drivers likely are no-no)
- [WayBack] FDCFastTextFile.pas – Text File Device Driver example – Pastebin.com
–jeroen