Listening to the audio without reading along leads to confusion as soon as an unknown word is sounded. Better to read along with the audio so one can "see" the new word as it is spoken. I'm sure that causes at least some imprinting on the brain and then when going through the text a second time (with or without the audio) the new word is recognized and with the pop-up, the tx makes more sense.
I think the instructions should state: Read along with the audio to the end. Make a mental note of any new vocabulary and the second time through, use the pop-ups to see a clear explanation of the new words.