Dysvideo: Diagnosis of Dyslexia Using Video Analysis Technology

Research.Dysvideo History

Hide minor edits - Show changes to markup

April 25, 2007 by 134.226.86.54 -
Changed lines 2-3 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed line 31 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed line 31 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed line 31 from:
to:
Changed line 59 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed line 31 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed lines 21-22 from:
to:
Changed line 31 from:
to:
Changed line 59 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed lines 20-23 from:

Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.

to:
Figure 1: The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.
Added line 30:
Changed lines 32-34 from:

Figure 2. An example of content throughout a typical recording of 60 mins. The location of the audio markers coarsely delineates the relevant video for one Test. This is shown at mins 13 and 14. The actual useful content lies inside this period, indicated by two typical frames. The paper focuses on using motion based features for delineating this exact start and end of the experiment within the coarse audio marker period. Note that the idea of coarse indexing is to quickly reject the outlier content e.g. when the child is not in view, not cooperating, or not assuming the start position. Example content in these areas are shown in the remaining (smaller) images on the timeline.

to:
Figure 2: An example of content throughout a typical recording of 60 mins. The location of the audio markers coarsely delineates the relevant video for one Test. This is shown at mins 13 and 14. The actual useful content lies inside this period, indicated by two typical frames. The paper focuses on using motion based features for delineating this exact start and end of the experiment within the coarse audio marker period. Note that the idea of coarse indexing is to quickly reject the outlier content e.g. when the child is not in view, not cooperating, or not assuming the start position. Example content in these areas are shown in the remaining (smaller) images on the timeline.
Added line 58:
Changed lines 60-61 from:
to:
April 24, 2007 by 134.226.86.54 -
Changed lines 2-5 from:

\logo{dysvideo_logo.jpg}

PhD subject, Erika Doyle (edoyle4@tcd.ie)

to:

Dysvideo: Diagnosis of Dyslexia Using Video Analysis Technology

The Problem: Data Management and Repeatable Assessment

PhD subject, Erika Doyle (edoyle4@tcd.ie)

Changed lines 12-13 from:

Trinity College Dublin 2, Ireland

to:

Trinity College Dublin 2, Ireland

Deleted lines 15-26:

dyslexialogo.gif

Dysvideo: Diagnosis of Dyslexia Using Video Analysis Technology

The Problem: Data Management and Repeatable Assessment

PhD subject, Erika Doyle (edoyle4@tcd.ie)

Dyslexia Research Group, Dept. of Psychology
Trinity College
Dublin 2, Ireland

Changed lines 20-21 from:

test10.jpg

to:
Changed lines 28-29 from:

parsing.jpg

to:
Changed lines 54-55 from:

browser.jpg

to:
Changed lines 82-86 from:





to:
Deleted lines 88-90:

(:flv-player Attach:some.flv :)

(:flv-player Attach:some2.flv :)

April 24, 2007 by 134.226.86.54 -
Changed lines 12-13 from:
to:

dyslexialogo.gif

Dysvideo: Diagnosis of Dyslexia Using Video Analysis Technology

The Problem: Data Management and Repeatable Assessment

PhD subject, Erika Doyle (edoyle4@tcd.ie)

Dyslexia Research Group, Dept. of Psychology
Trinity College
Dublin 2, Ireland
Recent research[6] has indicated a connection between Developmental Dyslexia and the retention of primary reflexes. The Dysvideo project at Trinity college was established to investigate this connection by observing the development of 400 children aged 4-7.

The essential idea is that children with a propensity to develop Dyslexia are unable to execute particular movements without some unavoidable associated reflex. Figure 1 below shows an example of one such movement.

test10.jpg

Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.

Video recordings are made of children observed though 3 sessions, each of 20 mins duration, and 6 months apart. Unfortunately, in each recording of 20 mins, less than 5 mins is useful material. Children may take a long time to settle down and may need to be cajoled through each session. The Dysvideo project exploits automated content based audio and video analysis to allow Psychologists to index directly the useful portion of the video.

Preliminary work on automated parsing was presented in Joyeux et al.[7] The focus there was the framework design and algorithms for coarse parsing and encoding of metadata by exploiting the audio stream. The parsing provided was good enough such that the estimated index points were guaranteed to contain the useful visual information. The size of the indexed portion typically contained about 20 seconds of material before the start and after the end of the actual useful motion experiment itself. This was good enough for the psychologist to manually browse very quickly to the start of the useful recording and perform the subjective motion measurement. Recall that the point of the programme is to measure the presence of certain motion based reflexes in children. Currently for psychologists the only reliable way of doing this is for a human to assess the degree to which the child cannot hold a particular position. It is true that motion tracking equipment exists for this purpose, but magnetic based tracking is expensive, while visual marker based tracking would require the cooperation of the child. Quantitative visual motion measures can be designed by estimating the motion of the particular limb in the field of view. The important observation here is that this is only possible by ensuring that the motion measurement is being made of the correct limb portion and starts at the right time. Therefore for quantitative assessment of motion a finer granularity of parsing is needed.

parsing.jpg

Figure 2. An example of content throughout a typical recording of 60 mins. The location of the audio markers coarsely delineates the relevant video for one Test. This is shown at mins 13 and 14. The actual useful content lies inside this period, indicated by two typical frames. The paper focuses on using motion based features for delineating this exact start and end of the experiment within the coarse audio marker period. Note that the idea of coarse indexing is to quickly reject the outlier content e.g. when the child is not in view, not cooperating, or not assuming the start position. Example content in these areas are shown in the remaining (smaller) images on the timeline.

Figure 2 shows the difference between the granularity provided by audio parsing[7] and the exact start and end of a particular session, Test 10. The audio parsing is achieved by allowing the user to insert a specific audio tone into the recording using a handheld PC (PalmPilot). Postprocessing the audio signal allows DTMF audio tones to be detected.7 The figure illustrates the importance and reliability of using audio markers to reject unwanted content. In this Test, the experimenter firmly rotates the head of the child while the child is on all fours. The hypothesis is that in children with a retained reflex one of the arms will bend at the elbow involuntarily during the motion of the head. To isolate the relevant content for analysis it is necessary to i) locate arms during that motion and ii) identify video portions during which the head is rotated. For some time before the start of the actual experiment, the child is coached and may undergo a few trials, in addition the child may move around and simply not be in the field of view. During the relevant video portions, the arm location will be more stable, and the rotation of the head more coherent. Therefore these two features can be used to index the video with increased granularity.

Information about the technical details of video based and audio based parsing can be found in [7]. The Dyslexia project is funded by Enterprise Ireland [RIF Programme] in two grants from 2002-2004 and now is subsumed into the LastActionReplay project also funded by Enterprise Ireland 2004-05 [Proof Of Concept Programme].

REFERENCES

  1. E. Y. Chang and Y.-F. Wang, Video surveillance, in First ACM SIGMM international workshop on Video surveillance, November 2003.
  2. A. Hampapur, S3-r1: The ibm smart surveillance system- release 1, June 2005.
  3. Y.-L. Tian, M. Lu, and A. Hampapur, Robust and efficient foreground analysis for real-time video surveillance, in IEEE International Conference on Computer Vision and Pattern Recognition, June 2005.
  4. J. Aggarwal and Q. Cai, Human motion analysis: A review, Computer Vision and Image Understanding 73, pp. 428-440, March 1999.
  5. I. Kakadiaris and D. Metaxas, Model based estimation of 3d human motion, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, pp. 1453-1459, December 2000.
  6. M. McPhillips, P. Hepper, and G. Mulhern, Effects of replicating primary-reflex movements on specific reading difficulties in children; a randomised double-blind, controlled trial, Lancet 355, pp. 537-541, 2000.
  7. L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bousseau, A.Kokaram, and R. Fuller, Content based access for a massive database of human observation video, in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004. Publications
  8. S. Goddard, A Teachers Window into a Child's Mind, A non-invasive approach to solving learning and behaviour problems, Fern Ridge Press, Oregon, 1996.
  9. K. Holt, Child Development: Diagnosis and Assessment, Butterworth-Heineman Ltd, 1991.
  10. P. Peer, J. Kovac, and F. Solina, Human skin colour clustering for face detection, in EUROCON 2003 - International Conference on Computer as a Tool, 2003.
  11. A. C. Kokaram, Motion Picture Restoration: Digital Algorithms for Artefact Suppression in Degraded Motion Picture Film and Video, Springer Verlag, ISBN 3-540-76040-7, 1998.

Browsing and access

A software GUI was developed for use as a browser for this project. The GUI allows playback of the stored video clips in real time to allow human assmens of motion. The browser is linked to an index for each video recording that allows the Psychologist to directly start playback from the moment each experiment is begun. The GUI also incorporates child identification and personal detail information as well as scores for each Test. A typical process chain begins with teh recording, then automated audio parsing and simultaneous encoding on a PC. This step is done in the night after each recording is made. The index generated by the audio parser is automatically associated with the compressed video data and available for access the next morning. Two compressed copies of the video recording are made. A high bit rate (1Mbit/sec) MPEG2 copy for archival and a lower bit rate (256Kbits/sec) copy for playback and assessment. The figure below shows an example of the GUI in operation.

browser.jpg

The GUI allows the user to watch three different exercises; a window is split horizontally or vertically, when an exercise is added (right-top bottom). Three sliders are used to navigate throughout the exercise, allowing the user to repeatably view the important sections of the session. On the right part of the window, a tree displays information on all exercises taken by a particular child in addition to current user scores. Other user scores can be displayed depending on access rights. The browser also allows computations to derive score statistics across individuals.

Personnel

The Dysvideo project is a collaboration between the Sigmedia group www.sigmedia.tv and the Dyslexia group at the Dept. of Psychology.

Personnel involved in the audio/video aspects of the Dysvideo project include Dr. Laurent Joyeux (now with Joanneum Research, Graz, Austria), Andrew Crawford (currently collaborating from CNR, Rome, Italy), Daire Lennon and Hugh Denman. Andrew Crawford in collaboration with Erika Doyle (with the Psychology Department) were responsible for the creation of the Dysvideo Content Based Browsing System. Hugh Denman in collaboration with Erika Doyle (with the Psychology Department) are responsible for the database design and access. Dr. Anil Kokaram and Dr. Ray Fuller are resposible for the projects. Dr. Sid Ahmed Berrani (now with France Telecom R&D, Rennes, France) worked on aspects of statistical analysis for motion measures. Ms. Doyle in collaboration with Dr. Kokaram are now pursuing the statistical analysis of the data from this project.

Publications

L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bosseau, A. Kokaram, and R. Fuller, "Content based access for a massive database of human observation video", in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004.

Conference presentations and posters

Nov 2005: Poster

  • Early Identification and Treatment of Dyslexia Psychological Society of Ireland, Annual Conference, Cork.

March 2004: Presentation

  • Dyslexia Early Years Project, 6th International Confrence of British Dyslexia

June 2003: Poster

  • DysVideo: Treating Dyslexia using video content and retrieval analysis technology, Models for Unified Multimedia Information Retrieval confrence in Porto, Portugal





Access to supporting material for research paper submission to SPIE 2006?

Access to the Therapy Web Page

January 23, 2007 by 134.226.85.142 -
Changed lines 14-16 from:

(:flv-player Attach:some.flv :)

to:

(:flv-player Attach:some.flv :)

(:flv-player Attach:some2.flv :)

January 23, 2007 by 134.226.85.142 -
January 23, 2007 by 134.226.85.142 -
Added line 14:

(:flv-player Attach:some.flv :)

January 23, 2007 by 134.226.85.142 -
Changed lines 12-13 from:

(:flv-player Attach:some.flv :)

to:
January 23, 2007 by 134.226.85.142 -
January 23, 2007 by 134.226.85.142 -
Deleted line 13:

(:flv-player Attach:some2.flv :)

January 23, 2007 by 134.226.86.54 -
Added line 14:

(:flv-player Attach:some2.flv :)

January 22, 2007 by 134.226.86.54 -
Changed lines 12-13 from:

(:flv-player Attach:some.flv -link:)

to:

(:flv-player Attach:some.flv :)

January 22, 2007 by 134.226.86.54 -
Deleted lines 11-14:
January 22, 2007 by 134.226.86.54 -
January 22, 2007 by 134.226.86.54 -
Changed lines 16-17 from:

(:flv-player some.flv -link:)

to:

(:flv-player Attach:some.flv -link:)

January 22, 2007 by 134.226.86.54 -
Changed lines 16-17 from:

(:flv-player (Attach:some.flv) -link:)

to:

(:flv-player some.flv -link:)

January 22, 2007 by 134.226.86.54 -
Deleted lines 17-76:

Recent research[6] has indicated a connection between Developmental Dyslexia and the retention of primary reflexes. The Dysvideo project at Trinity college was established to investigate this connection by observing the development of 400 children aged 4-7.

The essential idea is that children with a propensity to develop Dyslexia are unable to execute particular movements without some unavoidable associated reflex. Figure 1 below shows an example of one such movement. test10

Figure: Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.

Video recordings are made of children observed though 3 sessions, each of 20 mins duration, and 6 months apart. Unfortunately, in each recording of 20 mins, less than 5 mins is useful material. Children may take a long time to settle down and may need to be cajoled through each session. The Dysvideo project exploits automated content based audio and video analysis to allow Psychologists to index directly the useful portion of the video.

Preliminary work on automated parsing was presented in Joyeux et al.[7] The focus there was the framework design and algorithms for coarse parsing and encoding of metadata by exploiting the audio stream. The parsing provided was good enough such that the estimated index points were guaranteed to contain the useful visual information. The size of the indexed portion typically contained about 20 seconds of material before the start and after the end of the actual useful motion experiment itself. This was good enough for the psychologist to manually browse very quickly to the start of the useful recording and perform the subjective motion measurement. Recall that the point of the programme is to measure the presence of certain motion based reflexes in children. Currently for psychologists the only reliable way of doing this is for a human to assess the degree to which the child cannot hold a particular position. It is true that motion tracking equipment exists for this purpose, but magnetic based tracking is expensive, while visual marker based tracking would require the cooperation of the child. Quantitative visual motion measures can be designed by estimating the motion of the particular limb in the field of view. The important observation here is that this is only possible by ensuring that the motion measurement is being made of the correct limb portion and starts at the right time. Therefore for quantitative assessment of motion a finer granularity of parsing is needed. parsing

Figure 2. An example of content throughout a typical recording of 60 mins. The location of the audio markers coarsely delineates the relevant video for one Test. This is shown at mins 13 and 14. The actual useful content lies inside this period, indicated by two typical frames. The paper focuses on using motion based features for delineating this exact start and end of the experiment within the coarse audio marker period. Note that the idea of coarse indexing is to quickly reject the outlier content e.g. when the child is not in view, not cooperating, or not assuming the start position. Example content in these areas are shown in the remaining (smaller) images on the timeline.

Figure 2 shows the difference between the granularity provided by audio parsing[7] and the exact start and end of a particular session, Test 10. The audio parsing is achieved by allowing the user to insert a specific audio tone into the recording using a handheld PC (PalmPilot). Postprocessing the audio signal allows DTMF audio tones to be detected.7 The figure illustrates the importance and reliability of using audio markers to reject unwanted content. In this Test, the experimenter firmly rotates the head of the child while the child is on all fours. The hypothesis is that in children with a retained reflex one of the arms will bend at the elbow involuntarily during the motion of the head. To isolate the relevant content for analysis it is necessary to i) locate arms during that motion and ii) identify video portions during which the head is rotated. For some time before the start of the actual experiment, the child is coached and may undergo a few trials, in addition the child may move around and simply not be in the field of view. During the relevant video portions, the arm location will be more stable, and the rotation of the head more coherent. Therefore these two features can be used to index the video with increased granularity.

Information about the technical details of video based and audio based parsing can be found in [7]. The Dyslexia project is funded by Enterprise Ireland [RIF Programme] in two grants from 2002-2004 and now is subsumed into the LastActionReplay project also funded by Enterprise Ireland 2004-05 [Proof Of Concept Programme]. REFERENCES

   1. E. Y. Chang and Y.-F. Wang, Video surveillance, in First ACM SIGMM international workshop on Video surveillance, November 2003.
   2. A. Hampapur, S3-r1: The ibm smart surveillance system- release 1, June 2005.
   3. Y.-L. Tian, M. Lu, and A. Hampapur, Robust and efficient foreground analysis for real-time video surveillance, in IEEE International Conference on Computer Vision and Pattern Recognition, June 2005.
   4. J. Aggarwal and Q. Cai, Human motion analysis: A review, Computer Vision and Image Understanding 73, pp. 428-440, March 1999.
   5. I. Kakadiaris and D. Metaxas, Model based estimation of 3d human motion, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, pp. 1453-1459, December 2000.
   6. M. McPhillips, P. Hepper, and G. Mulhern, Effects of replicating primary-reflex movements on specific reading difficulties in children; a randomised double-blind, controlled trial, Lancet 355, pp. 537-541, 2000.
   7. L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bousseau, A.Kokaram, and R. Fuller, Content based access for a massive database of human observation video, in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004. Publications
   8. S. Goddard, A Teachers Window into a Child's Mind, A non-invasive approach to solving learning and behaviour problems, Fern Ridge Press, Oregon, 1996.
   9. K. Holt, Child Development: Diagnosis and Assessment, Butterworth-Heineman Ltd, 1991.
  10. P. Peer, J. Kovac, and F. Solina, Human skin colour clustering for face detection, in EUROCON 2003 - International Conference on Computer as a Tool, 2003.
  11. A. C. Kokaram, Motion Picture Restoration: Digital Algorithms for Artefact Suppression in Degraded Motion Picture Film and Video, Springer Verlag, ISBN 3-540-76040-7, 1998.

Browsing and access

A software GUI was developed for use as a browser for this project. The GUI allows playback of the stored video clips in real time to allow human assmens of motion. The browser is linked to an index for each video recording that allows the Psychologist to directly start playback from the moment each experiment is begun. The GUI also incorporates child identification and personal detail information as well as scores for each Test. A typical process chain begins with teh recording, then automated audio parsing and simultaneous encoding on a PC. This step is done in the night after each recording is made. The index generated by the audio parser is automatically associated with the compressed video data and available for access the next morning. Two compressed copies of the video recording are made. A high bit rate (1Mbit/sec) MPEG2 copy for archival and a lower bit rate (256Kbits/sec) copy for playback and assessment. The figure below shows an example of the GUI in operation. browser

The GUI allows the user to watch three different exercises; a window is split horizontally or vertically, when an exercise is added (right-top bottom). Three sliders are used to navigate throughout the exercise, allowing the user to repeatably view the important sections of the session. On the right part of the window, a tree displays information on all exercises taken by a particular child in addition to current user scores. Other user scores can be displayed depending on access rights. The browser also allows computations to derive score statistics across individuals. Personnel

The Dysvideo project is a collaboration between the Sigmedia group www.sigmedia.tv and the Dyslexia group at the Dept. of Psychology.

Personnel involved in the audio/video aspects of the Dysvideo project include Dr. Laurent Joyeux (now with Joanneum Research, Graz, Austria), Andrew Crawford (currently collaborating from CNR, Rome, Italy), Daire Lennon and Hugh Denman. Andrew Crawford in collaboration with Erika Doyle (with the Psychology Department) were responsible for the creation of the Dysvideo Content Based Browsing System. Hugh Denman in collaboration with Erika Doyle (with the Psychology Department) are responsible for the database design and access. Dr. Anil Kokaram and Dr. Ray Fuller are resposible for the projects. Dr. Sid Ahmed Berrani (now with France Telecom R&D, Rennes, France) worked on aspects of statistical analysis for motion measures. Ms. Doyle in collaboration with Dr. Kokaram are now pursuing the statistical analysis of the data from this project. Publications

L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bosseau, A. Kokaram, and R. Fuller, "Content based access for a massive database of human observation video", in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004. Conference presentations and posters

Nov 2005: Poster

    * Early Identification and Treatment of Dyslexia Psychological Society of Ireland, Annual Conference, Cork.

March 2004: Presentation

    * Dyslexia Early Years Project, 6th International Confrence of British Dyslexia

June 2003: Poster

    * DysVideo: Treating Dyslexia using video content and retrieval analysis technology, Models for Unified Multimedia Information Retrieval confrence in Porto, Portugal
January 22, 2007 by 134.226.86.54 -
Changed lines 16-17 from:

(:flv-player Attach:some.flv -link:)

to:

(:flv-player (Attach:some.flv) -link:)

January 22, 2007 by 134.226.86.54 -
Changed lines 14-17 from:

some.flv

(:flv-player some.flv -link:)

to:

(:flv-player Attach:some.flv -link:)

January 22, 2007 by 134.226.86.54 -
Changed lines 14-15 from:

to:

some.flv

January 22, 2007 by 134.226.86.54 -
Changed lines 16-17 from:

(:flv-player Attach:some.flv -link:)

to:

(:flv-player some.flv -link:)

January 22, 2007 by 134.226.86.54 -
Added lines 12-13:
January 22, 2007 by 134.226.86.54 -
Added lines 12-13:

January 22, 2007 by 134.226.86.54 -
Changed lines 12-13 from:

(:flv-player (Attach:some.flv) -link:)

to:

(:flv-player Attach:some.flv -link:)

January 22, 2007 by 134.226.86.54 -
Added lines 12-13:

(:flv-player (Attach:some.flv) -link:)

January 19, 2007 by 134.226.86.54 -
Changed lines 4-5 from:
 PhD subject, Erika Doyle (edoyle4@tcd.ie)
to:

PhD subject, Erika Doyle (edoyle4@tcd.ie)

[@

Changed lines 10-11 from:
to:

@]

Changed lines 17-18 from:

Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.

to:
Figure: Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.
January 19, 2007 by 134.226.86.54 -
Added lines 1-68:

\logo{dysvideo_logo.jpg}

 PhD subject, Erika Doyle (edoyle4@tcd.ie)

Dyslexia Research Group, Dept. of Psychology Trinity College Dublin 2, Ireland

Recent research[6] has indicated a connection between Developmental Dyslexia and the retention of primary reflexes. The Dysvideo project at Trinity college was established to investigate this connection by observing the development of 400 children aged 4-7.

The essential idea is that children with a propensity to develop Dyslexia are unable to execute particular movements without some unavoidable associated reflex. Figure 1 below shows an example of one such movement. test10

Figure 1. The experimenter rotates the head of a child right and left. While doing so any involuntary bend in the arms is noted. The idea is that the presence of that reflex is in some way correlated with the presence of Dyslexia.

Video recordings are made of children observed though 3 sessions, each of 20 mins duration, and 6 months apart. Unfortunately, in each recording of 20 mins, less than 5 mins is useful material. Children may take a long time to settle down and may need to be cajoled through each session. The Dysvideo project exploits automated content based audio and video analysis to allow Psychologists to index directly the useful portion of the video.

Preliminary work on automated parsing was presented in Joyeux et al.[7] The focus there was the framework design and algorithms for coarse parsing and encoding of metadata by exploiting the audio stream. The parsing provided was good enough such that the estimated index points were guaranteed to contain the useful visual information. The size of the indexed portion typically contained about 20 seconds of material before the start and after the end of the actual useful motion experiment itself. This was good enough for the psychologist to manually browse very quickly to the start of the useful recording and perform the subjective motion measurement. Recall that the point of the programme is to measure the presence of certain motion based reflexes in children. Currently for psychologists the only reliable way of doing this is for a human to assess the degree to which the child cannot hold a particular position. It is true that motion tracking equipment exists for this purpose, but magnetic based tracking is expensive, while visual marker based tracking would require the cooperation of the child. Quantitative visual motion measures can be designed by estimating the motion of the particular limb in the field of view. The important observation here is that this is only possible by ensuring that the motion measurement is being made of the correct limb portion and starts at the right time. Therefore for quantitative assessment of motion a finer granularity of parsing is needed. parsing

Figure 2. An example of content throughout a typical recording of 60 mins. The location of the audio markers coarsely delineates the relevant video for one Test. This is shown at mins 13 and 14. The actual useful content lies inside this period, indicated by two typical frames. The paper focuses on using motion based features for delineating this exact start and end of the experiment within the coarse audio marker period. Note that the idea of coarse indexing is to quickly reject the outlier content e.g. when the child is not in view, not cooperating, or not assuming the start position. Example content in these areas are shown in the remaining (smaller) images on the timeline.

Figure 2 shows the difference between the granularity provided by audio parsing[7] and the exact start and end of a particular session, Test 10. The audio parsing is achieved by allowing the user to insert a specific audio tone into the recording using a handheld PC (PalmPilot). Postprocessing the audio signal allows DTMF audio tones to be detected.7 The figure illustrates the importance and reliability of using audio markers to reject unwanted content. In this Test, the experimenter firmly rotates the head of the child while the child is on all fours. The hypothesis is that in children with a retained reflex one of the arms will bend at the elbow involuntarily during the motion of the head. To isolate the relevant content for analysis it is necessary to i) locate arms during that motion and ii) identify video portions during which the head is rotated. For some time before the start of the actual experiment, the child is coached and may undergo a few trials, in addition the child may move around and simply not be in the field of view. During the relevant video portions, the arm location will be more stable, and the rotation of the head more coherent. Therefore these two features can be used to index the video with increased granularity.

Information about the technical details of video based and audio based parsing can be found in [7]. The Dyslexia project is funded by Enterprise Ireland [RIF Programme] in two grants from 2002-2004 and now is subsumed into the LastActionReplay project also funded by Enterprise Ireland 2004-05 [Proof Of Concept Programme]. REFERENCES

   1. E. Y. Chang and Y.-F. Wang, Video surveillance, in First ACM SIGMM international workshop on Video surveillance, November 2003.
   2. A. Hampapur, S3-r1: The ibm smart surveillance system- release 1, June 2005.
   3. Y.-L. Tian, M. Lu, and A. Hampapur, Robust and efficient foreground analysis for real-time video surveillance, in IEEE International Conference on Computer Vision and Pattern Recognition, June 2005.
   4. J. Aggarwal and Q. Cai, Human motion analysis: A review, Computer Vision and Image Understanding 73, pp. 428-440, March 1999.
   5. I. Kakadiaris and D. Metaxas, Model based estimation of 3d human motion, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, pp. 1453-1459, December 2000.
   6. M. McPhillips, P. Hepper, and G. Mulhern, Effects of replicating primary-reflex movements on specific reading difficulties in children; a randomised double-blind, controlled trial, Lancet 355, pp. 537-541, 2000.
   7. L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bousseau, A.Kokaram, and R. Fuller, Content based access for a massive database of human observation video, in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004. Publications
   8. S. Goddard, A Teachers Window into a Child's Mind, A non-invasive approach to solving learning and behaviour problems, Fern Ridge Press, Oregon, 1996.
   9. K. Holt, Child Development: Diagnosis and Assessment, Butterworth-Heineman Ltd, 1991.
  10. P. Peer, J. Kovac, and F. Solina, Human skin colour clustering for face detection, in EUROCON 2003 - International Conference on Computer as a Tool, 2003.
  11. A. C. Kokaram, Motion Picture Restoration: Digital Algorithms for Artefact Suppression in Degraded Motion Picture Film and Video, Springer Verlag, ISBN 3-540-76040-7, 1998.

Browsing and access

A software GUI was developed for use as a browser for this project. The GUI allows playback of the stored video clips in real time to allow human assmens of motion. The browser is linked to an index for each video recording that allows the Psychologist to directly start playback from the moment each experiment is begun. The GUI also incorporates child identification and personal detail information as well as scores for each Test. A typical process chain begins with teh recording, then automated audio parsing and simultaneous encoding on a PC. This step is done in the night after each recording is made. The index generated by the audio parser is automatically associated with the compressed video data and available for access the next morning. Two compressed copies of the video recording are made. A high bit rate (1Mbit/sec) MPEG2 copy for archival and a lower bit rate (256Kbits/sec) copy for playback and assessment. The figure below shows an example of the GUI in operation. browser

The GUI allows the user to watch three different exercises; a window is split horizontally or vertically, when an exercise is added (right-top bottom). Three sliders are used to navigate throughout the exercise, allowing the user to repeatably view the important sections of the session. On the right part of the window, a tree displays information on all exercises taken by a particular child in addition to current user scores. Other user scores can be displayed depending on access rights. The browser also allows computations to derive score statistics across individuals. Personnel

The Dysvideo project is a collaboration between the Sigmedia group www.sigmedia.tv and the Dyslexia group at the Dept. of Psychology.

Personnel involved in the audio/video aspects of the Dysvideo project include Dr. Laurent Joyeux (now with Joanneum Research, Graz, Austria), Andrew Crawford (currently collaborating from CNR, Rome, Italy), Daire Lennon and Hugh Denman. Andrew Crawford in collaboration with Erika Doyle (with the Psychology Department) were responsible for the creation of the Dysvideo Content Based Browsing System. Hugh Denman in collaboration with Erika Doyle (with the Psychology Department) are responsible for the database design and access. Dr. Anil Kokaram and Dr. Ray Fuller are resposible for the projects. Dr. Sid Ahmed Berrani (now with France Telecom R&D, Rennes, France) worked on aspects of statistical analysis for motion measures. Ms. Doyle in collaboration with Dr. Kokaram are now pursuing the statistical analysis of the data from this project. Publications

L. Joyeux, E. Doyle, H. Denman, A. C. Crawford, A. Bosseau, A. Kokaram, and R. Fuller, "Content based access for a massive database of human observation video", in Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pp. 45-62, October 15-16 2004. Conference presentations and posters

Nov 2005: Poster

    * Early Identification and Treatment of Dyslexia Psychological Society of Ireland, Annual Conference, Cork.

March 2004: Presentation

    * Dyslexia Early Years Project, 6th International Confrence of British Dyslexia

June 2003: Poster

    * DysVideo: Treating Dyslexia using video content and retrieval analysis technology, Models for Unified Multimedia Information Retrieval confrence in Porto, Portugal
Page last modified on April 25, 2007