The 20mm is a great photo lens, it is not a great/fast video lens. In video, it is notably sluggish (compared to other lenses) at continuous AF, and rather noisy in its aperture stepping. If you're only experiencing this "grinding" in video, then I suspect it's just those characteristics of the lens not being a video optimized lens.
If the audio is your only concern, then just dub over the sound in editing.
If you want the audio of what you are filming, you could either record off device and sync in post, or get the SEMA-1 adapter for your E-PL1 and attach a different microphone to it that will move the microphone farther away from the lens (or not, not many reports on the SEMA-1's included mic; it might be far enough away already...). In either case you could get much better sound quality than the camera itself will give you (stereo, surround, different audio source locations, etc.). Would just be a tiny bit more work in post.
If you're particularly handy, you could also try just level adjusting the audio track to lessen the lens noise, but that will affect the whole audio track if you aren't careful or can't separate out the waveforms.