Text-based audio editing in radio production, BBC Dialogger // Chris Baume, BBC R&D


  • 9 stations, 34 million listeners -- BBC is pretty big
  • Transcription is rough and manual process (enough for them to understand whats going on -- slow and "waste of their time")
  • Producers will pay other people to do it, in Australia which is overnight because of how time zones work, but only if they have the money
  • Timed transcripts in 2002, SCANMail, meant for navigating voicemail using text
  • Also this program called SILVER
  • 2004, SCANMail could edit text
  • Then a video editor can do transcripts with different camera angles and it auto edits the video so there are no awkward jump cuts
  • (Association of Computing Machinery) ACM is the journal where all these examples are coming from: https://www.acm.org/publications/journals
  • Removing "ums" and other sounds
  • 2016: prototype to quickly revise spoken comments
  • Chris is now demo'ing some prototypes out of BBC R&D
  • HTML5Compositor
  • Demonstrates the Magic Pen tool where transcripts can be printed, written on with a specific pen with a camera, and having those edits uploaded back to the document so producers can work at their leisure / on the go / not in the office
  • "Speech to text is a lossy process"

Academic references:

  • Whittaker, Steve, et al. "SCANMail: a voicemail interface that makes speech browsable, readable and searchable." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2002. DOI
  • Casares, Juan, et al. "Simplifying video editing with SILVER." CHI'02 Extended Abstracts on Human Factors in Computing Systems. ACM, 2002. DOI
  • Whittaker, Steve, and Brian Amento. "Semantic speech editing." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2004. DOI
  • Berthouzoz, Floraine, Wilmot Li, and Maneesh Agrawala. "Tools for placing cuts and transitions in interview video." ACM Trans. Graph. 31.4 (2012): 67-1. DOI
  • Rubin, Steve, et al. "Content-based tools for editing audio stories." Proceedings of the 26th annual ACM symposium on User interface software and technology. ACM, 2013. DOI
  • Sivaraman, Venkatesh, Dongwook Yoon, and Piotr Mitros. "Simplified Audio Production in Asynchronous Voice-Based Discussions." Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2016. DOI
  • Shin, Hijung Valentina, Wilmot Li, and Frédo Durand. "Dynamic Authoring of Audio with Linked Scripts." Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 2016. DOI

Ideas for next steps

  • Common base UI element for timed transcript editing
  • Google Docs style collaborative time transcript editor/player
  • Better, meaningful annotations (e.g. rate segments, export >4*)
  • Template for EDL file generation
  • Embed transcript and annotations in audio file
  • Umm detection/removal (STT with umms?)
  • Automatic segmentation with tagging and summaries
  • Better time compression
  • Tools for recording multiple versions of a script
  • Digital pen with audio playback, natural annotation and live bidirectional sync
  • Smart correction by exposing STT graphs
  • Fast clipping of a live audio stream using transcripts
  • Bidirectional integration with a proper audio editing system

results matching ""

    No results matching ""