Captions and TV Archives // Tracey Jaquith, Internet Archive
Notes:
- Archive.org/TV
- 2 million news shows online, searchable captions
- The "Third Eye" -- reading and analyzing the "lower third" of the screen -- What are they reporting, what and how are they summarizing?
- Uses tesseract-ocr and simhash to pull lines from multiple news channels
- www.twitter.com/tvThirdEye -- watches CNN, tweets headlines.
- CLIPS -- little JSON annotations to set start/end - points. Using JSONPatch.
- Popcorn.js at IA: http://archive.org/pop/