[pulseaudio-discuss] Testing echo cancellation on an armhf OMAP phone

Discussion:

Neil Jerram

2012-12-17 21:49:48 UTC

Hi pulseaudio folk. I've been following the list for a while, but this
is my first post...

I'm working with PulseAudio on the GTA04 phone, specifically trying to
use it to route the audio during a call, with echo cancellation.

Without the echo cancellation, the picture would be:

+----------+ +--------------------+
| GSM chip |------ module-loopback -------->|earpiece (sink) |
| sound | | |
| card |<------- module-loopback -------|microphone (source) |
+----------+ +--------------------+

The earpiece and microphone belong to a single sound card, which is
different from the GSM chip sound card.

The GSM source and sink are named
alsa_input.platform-soc-audio.1.analog-mono and
alsa_output.platform-soc-audio.1.analog-mono. The earpiece is
alsa_output.platform-soc-audio.0.analog-stereo and the microphone is
alsa_input.platform-soc-audio.0.analog-stereo.

To add in echo cancellation, I load module-echo-cancel, and then start
up the loopbacks like this:

exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel \
rate=8000 \
sink=alsa_output.platform-soc-audio.1.analog-mono

exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.1.analog-mono \
rate=8000 \
sink=alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel

Does that all sound correct in theory?

Now, I'm not actually at the point of doing all that yet. First I'm
trying to test the echo cancellation. To do that, I:

- load module-echo-cancel

- do "paplay -d
alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel
/media/card/Documents/audio/ogg/Do\ They\ Know\ It\'s\ Christmas.ogg"
in one terminal

- do "parecord -d
alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel
--file-format=wav > record1.wav" in another terminal

- speak into the microphone.

Then the idea is that I would play record1.wav back and see if contains
an echo of the song.

However, I seem to be hitting various problems, which I suspect are all
to do with resampling.

- With the default resample method (speex-float-3), I don't get any
sound at the earpiece, except for intermittent crackling.

- I then tried speex-fixed-3. This gives recognisable song playback at
the earpiece, but with strange echo-like distortions - i.e. as though
short snatches of the song are being repeated.

- I then tried src-sinc-fastest, and found that PulseAudio exited as
soon as I loaded module-echo-cancel.

- I then tried src-linear. This gives good song playback, except for
occasional clicks and crackles.

The song is at 44.1 kHz, I think the sound card's default rate is 48
kHz, and it looks from the log as though module-echo-cancel causes the
song to be resampled to 32 kHz (and presumably then back to 48 kHz?).
Is that all expected, and is there any way of reducing this amount of
playback resampling?

Now - still with src-linear - if I try the parecord line at the same
time as the playback, the log goes crazy with umpteen rapid repeats of:

Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-source.c: Trying resume...
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: Trying to disable ALSA period wakeups, using timers only
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: Device hw:0 doesn't support 44100 Hz, changed to 48000 Hz.
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: ALSA period wakeups disabled
Dec 17 21:04:34 neo pulse.sh: W: [alsa-source] alsa-source.c: Resume failed, couldn't restore original sample settings.

and I get no content (apart from the WAV header) in the file that I'm
trying to record.

On the other hand, if I try the parecord on its own when not also
playing back, it works fine.

I'd very much appreciate any input on whether what I'm doing looks right
(which I'm not yet confident at all about) and on the observations of
things not working as I'd expect.

Many thanks,
Neil

Tanu Kaskinen

2012-12-18 04:58:25 UTC

Permalink

Post by Neil Jerram
Hi pulseaudio folk. I've been following the list for a while, but this
is my first post...
I'm working with PulseAudio on the GTA04 phone, specifically trying to
use it to route the audio during a call, with echo cancellation.
+----------+ +--------------------+
| GSM chip |------ module-loopback -------->|earpiece (sink) |
| sound | | |
| card |<------- module-loopback -------|microphone (source) |
+----------+ +--------------------+
The earpiece and microphone belong to a single sound card, which is
different from the GSM chip sound card.
The GSM source and sink are named
alsa_input.platform-soc-audio.1.analog-mono and
alsa_output.platform-soc-audio.1.analog-mono. The earpiece is
alsa_output.platform-soc-audio.0.analog-stereo and the microphone is
alsa_input.platform-soc-audio.0.analog-stereo.
To add in echo cancellation, I load module-echo-cancel, and then start
exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel \
rate=8000 \
sink=alsa_output.platform-soc-audio.1.analog-mono
exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.1.analog-mono \
rate=8000 \
sink=alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel
Does that all sound correct in theory?

Yes, I think so.

Post by Neil Jerram
Now, I'm not actually at the point of doing all that yet. First I'm
- load module-echo-cancel
- do "paplay -d
alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel
/media/card/Documents/audio/ogg/Do\ They\ Know\ It\'s\ Christmas.ogg"
in one terminal
- do "parecord -d
alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel
--file-format=wav > record1.wav" in another terminal
- speak into the microphone.
Then the idea is that I would play record1.wav back and see if contains
an echo of the song.
However, I seem to be hitting various problems, which I suspect are all
to do with resampling.
- With the default resample method (speex-float-3), I don't get any
sound at the earpiece, except for intermittent crackling.
- I then tried speex-fixed-3. This gives recognisable song playback at
the earpiece, but with strange echo-like distortions - i.e. as though
short snatches of the song are being repeated.
- I then tried src-sinc-fastest, and found that PulseAudio exited as
soon as I loaded module-echo-cancel.
- I then tried src-linear. This gives good song playback, except for
occasional clicks and crackles.
The song is at 44.1 kHz, I think the sound card's default rate is 48
kHz, and it looks from the log as though module-echo-cancel causes the
song to be resampled to 32 kHz (and presumably then back to 48 kHz?).
Is that all expected, and is there any way of reducing this amount of
playback resampling?

If you haven't configured the sample rate of module-echo-cancel, then it
will default to 32 kHz (I don't know why), which indeed will cause
unnecessary resampling just as you described. If the hardware runs at 48
kHz, then I think it's best to pass "rate=48000" to module-echo-cancel.

I think it would make sense to modify module-echo-cancel to use the rate
of the microphone by default...

Post by Neil Jerram
Now - still with src-linear - if I try the parecord line at the same
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-source.c: Trying resume...
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: Trying to disable ALSA period wakeups, using timers only
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: Device hw:0 doesn't support 44100 Hz, changed to 48000 Hz.
Dec 17 21:04:34 neo pulse.sh: I: [alsa-source] alsa-util.c: ALSA period wakeups disabled
Dec 17 21:04:34 neo pulse.sh: W: [alsa-source] alsa-source.c: Resume failed, couldn't restore original sample settings.

Are only these five lines repeated? I don't understand why this would be
looping, maybe setting the log level to more verbose would reveal the
reason.

Anyway, looping or not, the reason why you can't get anything recorded
is that the source fails to resume from suspended state. If this happens
only when playback is happening at the same time, it suggests that
initially, when playback was not active, the source successfully opened
the device with 44100 sample rate, at which point the rate got locked in
pulseaudio (I think pulseaudio could be fixed to not do that). When
playback is active (presumably at 48 kHz), the hardware doesn't anymore
support capturing at 44.1 kHz, so when pulseaudio tries to open the
device with the old rate, it doesn't work anymore.

You can fix this by setting the default sample rate to 48000.

--
Tanu

Arun Raghavan

2012-12-18 05:26:19 UTC

Permalink

Post by Tanu Kaskinen

That's quite interesting!

Post by Tanu Kaskinen

Post by Neil Jerram
+----------+ +--------------------+
| GSM chip |------ module-loopback -------->|earpiece (sink) |
| sound | | |
| card |<------- module-loopback -------|microphone (source) |
+----------+ +--------------------+
The earpiece and microphone belong to a single sound card, which is
different from the GSM chip sound card.
The GSM source and sink are named
alsa_input.platform-soc-audio.1.analog-mono and
alsa_output.platform-soc-audio.1.analog-mono. The earpiece is
alsa_output.platform-soc-audio.0.analog-stereo and the microphone is
alsa_input.platform-soc-audio.0.analog-stereo.
To add in echo cancellation, I load module-echo-cancel, and then start
exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel \
rate=8000 \
sink=alsa_output.platform-soc-audio.1.analog-mono
exec pactl load-module module-loopback \
source=alsa_input.platform-soc-audio.1.analog-mono \
rate=8000 \
sink=alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel
Does that all sound correct in theory?

Yes, I think so.

As Tanu says, yes it does.

Post by Tanu Kaskinen

You could try setting the resampler to 'ffmpeg', which is really
light-weight. speex-fixed-0 might be useful to test as well.

Post by Tanu Kaskinen
If you haven't configured the sample rate of module-echo-cancel, then it
will default to 32 kHz (I don't know why), which indeed will cause
unnecessary resampling just as you described. If the hardware runs at 48
kHz, then I think it's best to pass "rate=48000" to module-echo-cancel.
I think it would make sense to modify module-echo-cancel to use the rate
of the microphone by default...

Different echo-cancellation algorithms work best at certain sample rates
(depending on the filters they embed). I've picked the highest viable
one for each canceller as the default, so setting something higher is
not a good idea.

What would make sense is to pick the sample rate that you're getting
from the GSM sound card, which it seems you're doing already
(rate=8000)?

Also, are you using the webrtc echo canceller or speex?

Cheers,
Arun

Neil Jerram

2012-12-19 08:06:17 UTC

Permalink

Post by Arun Raghavan

Post by Tanu Kaskinen

That's quite interesting!

Thanks! It's very educational for me, too!

Post by Arun Raghavan

Post by Tanu Kaskinen

Post by Neil Jerram
The song is at 44.1 kHz, I think the sound card's default rate is 48
kHz, and it looks from the log as though module-echo-cancel causes the
song to be resampled to 32 kHz (and presumably then back to 48 kHz?).
Is that all expected, and is there any way of reducing this amount of
playback resampling?

You could try setting the resampler to 'ffmpeg', which is really
light-weight. speex-fixed-0 might be useful to test as well.

Thanks, I'll remember to try those settings.

Post by Arun Raghavan

Different echo-cancellation algorithms work best at certain sample rates
(depending on the filters they embed). I've picked the highest viable
one for each canceller as the default, so setting something higher is
not a good idea.
What would make sense is to pick the sample rate that you're getting
from the GSM sound card, which it seems you're doing already
(rate=8000)?

Yes, I see that now, and have written/asked more about it in my other
replies.

Post by Arun Raghavan
Also, are you using the webrtc echo canceller or speex?

I've tried both. As far as I recall there was no significant difference
in the effect on the playback sound (through the ...echo-cancel sink)
that I heard. I think that makes sense, because distortions of the
playback sound are mostly due to resampling quality and load, not the
echo cancellation algorithm.

I haven't really reached looking at echo cancellation quality yet. What
would you recommend, for the best combination of quality and low CPU
use?

Thanks again,
Neil

Neil Jerram

2012-12-19 08:00:12 UTC

Permalink

Thanks, I'll try that.

Post by Tanu Kaskinen

Are only these five lines repeated? I don't understand why this would be
looping, maybe setting the log level to more verbose would reveal the
reason.

Thanks; if I keep seeing this, despite the following help, I'll try to
get a better log.

Post by Tanu Kaskinen
Anyway, looping or not, the reason why you can't get anything recorded
is that the source fails to resume from suspended state. If this happens
only when playback is happening at the same time, it suggests that
initially, when playback was not active, the source successfully opened
the device with 44100 sample rate, at which point the rate got locked in
pulseaudio (I think pulseaudio could be fixed to not do that). When
playback is active (presumably at 48 kHz), the hardware doesn't anymore
support capturing at 44.1 kHz, so when pulseaudio tries to open the
device with the old rate, it doesn't work anymore.
You can fix this by setting the default sample rate to 48000.

I'm still a bit confused on the detail here, but I think I understand
the principle of what's happening now. Presumably there's something I
can find inside pacmd that will tell me what the current locked-in rate
is? I'll check for that, and also try changing default sample rate as
you suggest.

Now, as I wrote in my reply just now to Arun, I realise that I really
want my in-call audio to run entirely at 8000. Does that mean that I
need to modify your advice above to:

- load-module module-echo-cancel rate=8000

- default-sample-rate = 8000

If I did that, should I then expect the microphone sink to be detected
and used at 8000? (Currently it's initially detected at 44100.)

Many thanks,
Neil

Arun Raghavan

2012-12-18 05:30:00 UTC

Permalink

On Mon, 2012-12-17 at 21:49 +0000, Neil Jerram wrote:
[...]

Post by Neil Jerram
- load module-echo-cancel
- do "paplay -d
alsa_output.platform-soc-audio.0.analog-stereo.echo-cancel
/media/card/Documents/audio/ogg/Do\ They\ Know\ It\'s\ Christmas.ogg"
in one terminal
- do "parecord -d
alsa_input.platform-soc-audio.0.analog-stereo.echo-cancel
--file-format=wav > record1.wav" in another terminal
- speak into the microphone.

In general, to start with, you should pick a recording of voice rather
than music since that's the sort of echo that is designed to be
cancelled. I've noticed varying degrees of success for music with speex
and much better success with the webrtc canceller, but starting with the
basics is better.

Also, if you're hitting trouble with double-resampling, you could
resample the file to what the canceller sink supports before doing your
test.

Cheers,
Arun

Neil Jerram

2012-12-19 07:41:21 UTC

Permalink

Post by Arun Raghavan
[...]

Good point, thanks, I'll do that. Also I realise now that I really want
the entire process of in-call audio routing to be running at 8000 only -
because that's all I need for voice, and because I presume that should
take less power than involving higher rates.

Overall, for this phone, I have two audio scenarios.

- In-call audio, which can/should all be handled at 8000.

- Media playback outside calls, which I think should be at 44.1 kHz for
best quality.

Is it possible for a single instance of PulseAudio to switch between
those scenarios. If not, I think I can pretty easily stop and restart
PulseAudio when the scenario changes. (I'm guessing from your and
Tanu's other replies to me that I might need to restart with different
default-sample-rate settings, to get the best outcome and performance
for my two scenarios.)

Thanks,
Neil

Tanu Kaskinen

2012-12-20 07:25:42 UTC

Permalink

Post by Neil Jerram

Post by Arun Raghavan
[...]

Good point, thanks, I'll do that. Also I realise now that I really want
the entire process of in-call audio routing to be running at 8000 only -
because that's all I need for voice, and because I presume that should
take less power than involving higher rates.
Overall, for this phone, I have two audio scenarios.
- In-call audio, which can/should all be handled at 8000.
- Media playback outside calls, which I think should be at 44.1 kHz for
best quality.
Is it possible for a single instance of PulseAudio to switch between
those scenarios. If not, I think I can pretty easily stop and restart
PulseAudio when the scenario changes. (I'm guessing from your and
Tanu's other replies to me that I might need to restart with different
default-sample-rate settings, to get the best outcome and performance
for my two scenarios.)

Restarting pulseaudio would be an atrocious hack. I really doubt that it
can work well.

Anyway, I recommend you to start with configuring the sound card with 48
kHz and module-echo-cancel with 8 kHz.

The sound card appears to support both 44.1 kHz and 48 kHz (but when
using both input and output at the same time, the rates must match).
There is then some room for optimization: normally 44.1 kHz would be
better, but during phone calls 48 kHz would probably be better
(resampling between 48 kHz and 8 kHz should be easier than between 44.1
kHz and 8 kHz, but I don't know if the resamplers in pulseaudio are able
to optimize the 48/8 kHz case in practice).

Switching between 44.1 kHz and 48 kHz would ideally be done by making
two different card profiles, which you would switch when the current
scenario changes. It's not currently possible to specify the sample rate
in the profile configuration, however, so this is not viable right now.

Pulseaudio supports automatic sample rate switching depending on the
connected streams (set default-sample-rate to 44100 and
alternate-sample-rate to 48000, like they are by default), and this
would be a great solution, if it wasn't for the fact that the input and
output must always have matching rate. The sample rate switching logic
isn't able to take that into account, so if the output is active when
you change from the music playback scenario to the phone call scenario,
it probably doesn't work. It might be possible to make this work so that
you forcefully suspend both the sink and the source before changing the
scenario, then tear down the music stream and start the phone streams,
and then unsuspend the sink and the source.

If the sound card supports 8 kHz, then the above still applies, just
replace 48000 with 8000.

By the way, the failure to resume the source, which you reported
earlier, should be avoidable by recording using some sample rate that is
divisible by 4000 (i.e. 8 kHz or 48 kHz should work fine). You didn't
specify the sample rate in your parecord command, so it defaulted to
44100 Hz. That caused pulseaudio to try to resume the source at 44.1
kHz. If you used e.g. 8 kHz, then pulseaudio would have tried to resume
the source at 48 kHz, which would probably have worked.

--
Tanu