I think it has to do with the targeting of the window and no indication of recording — though I am now more accustomed to seeing it up in my bar (it just isn't that clear to me).
I think doing a little more work there will go a long way in user education + trust; also allows you to "own" the voice-input experience and therefore make everyone else feel inferior even if their results are comparable.