IR Modulation Processing Algorithm Development. Part XII

Posted 01 July 2017

I went down a bit of a rabbit-hole between my last post on this subject and today.  I was attempting to run down the problems with my IR demodulation code when I discovered that the basic rate at which the demodulator captured samples was off by a factor of 5 or so – yikes!!  Instead of 20 samples per cycle, I was seeing more like 100, as shown below.

Raw data capture using ‘sinceLastOutput = 0’ in the first demodulation step

Raw data capture using ‘sinceLastOutput -= USEC_PER_SAMPLE’ in the first demodulation step

As you might expect, this development threw me for a bit of a loop – as the change from ‘sinceLastOutput = 0’ to ‘sinceLastOutput -= USEC_PER_SAMPLE’ was instrumental (or at least so I thought) to getting the transmit and receive frequencies matched more accurately.  So, now I had to go chase down yet another tangential problem – what I refer to as ‘going down a rabbit hole’. The only saving grace in all this is that, as a twice-retired engineer, I have no deadlines! 😉

To resolve this problem, I wound up creating an entirely new test program to isolate the issue, with just the following lines in the loop() function:

With the ‘sinceLastOutput -= USEC_PER_SAMPLE’ line active, I got about 100 samples/cycle. With this line commented out and the ‘sinceLastOutput = 0’ line active, I got the normal 20 samples/cycle, with nothing else being changed.

Once I was sure the test program was consistently producing the anomalous results I had noticed in the complete program, I posted this issue to the PJRC Forum, so I could get some help from the experts.  I knew it was something I was doing wrong – I just didn’t know  what!

Within a few hours I had received several responses, and the one that hit the bullseye was the one correctly identifying a subtlety with the the ‘-=’ elapsedMicros() usage format.  When this format is used, the accompanying ‘elapsedMicros’ variable must be initialized in the setup() code; otherwise it will be some arbitrary (and possibly quite large) value when the loop() function is first entered. This will cause the ‘if’ statement to trigger repeatedly until the ‘-=’ line eventually reduces the variable to a value below USEC_PER_SAMPLE, at which point it will start behaving as expected.  This odd behavior never happens with the ‘=0’ usage, as the variable is initialized on the first pass through the ‘if’ statement.  Sure enough, when I added a line at the bottom of my setup() function to set the ‘sinceLastOutput’ variable to zero, my little test program immediately stopped mis-behaving.

Well, this little side-journey only cost me a couple of days, and a few more white hairs (oh, wait –  my hair is already completely white – no problem!)  Back to my regularly scheduled program…

Frequency Matching:

My friend and mentor John Jenkins, who has been looking over my shoulder (and occasionally whacking me on the head) during this project, was unsure that my frequency matching setup was actually 100% complete, as the video I took the last time didn’t run long enough to convince him (and there were some un-explained triggering glitches as well).  So, I thought I would re-do this part to make him happy.  To do this I modified my little test program from the above ‘rabbit-hole’ elapsedMicros issue to output a square wave from the demodulator board that could be compared to the transmitter square wave.

As shown in the above video, the transmit and demodulator frequencies are quite well matched, showing essentially zero relative drift even over the 30-40 second time of the video.  Mission accomplished! ;-).

Sample Acquisition Step:

I modified my demodulator program to properly initialize the ‘elapsedMicros’ variable being used for sample timing, and verified proper operation by commenting out everything but the sample acquisition step.  I captured several hundred samples, and plotted the first hundred in Excel as follows:

100 samples using proper ‘elapsedMicros’ variable initialization

Sample Sum Step:

Next, each group of five samples (1/4 cycle) is summed, and the ‘In-phase’ and ‘Quadrature’ components are generated using the appropriate sign sequences.  As shown in the following, this appears to be happening correctly:

Cycle Sum Accumulation:

As each sample group is summed and the I/Q components generated, accumulate the 1/4-cycle I/Q sums into a ‘Cycle Sum’.  As the following printout shows, this step also is being performed properly

 

Running Sum Accumulation:

The last step in the algorithm is to compute the N-cycle running sum of all the cycle sums.  This is done by subtracting the oldest value from the circular buffer from the current running sum, adding the current cycle sum to the running sum, and then replacing the oldest cycle sum value in the circular buffer by the current cycle sum.

  1. RunningSum = RunningSum + CurrentCycleSum – OldestCycleSum
  2. OldestCycleSum = CurrentCycleSum
  3. Circular buffer index incremented by 1 (MOD N)

This one took a while to instrument properly. I first tried just adding some more columns on to the current display setup, but that became too cumbersome, too fast.  Verifying the running sum calculation requires looking at not only the current running sum, but also its value from N cycles (or N*5*4 samples) previously.  So, I modified the code to only print one line per cycle, and this was much easier to manage.  Here’s a partial printout showing a little over 200 cycles, representing about 400mSec (200 cycles * 2mSec/cycle).

 

To analyze these results, I dropped them into Excel, and used it’s ‘Freeze Panes’ feature to juxtapose the results from cycles 0 through 4 with cycles 65, 66, and 67.  This allowed me to verify that the running sum expression was being calculated correctly and the circular buffer was being loaded and referenced correctly.  When I finished verifying these results, I plotted the final value column, as shown below:

 

New ‘Final Value’ results

As an experiment, I also momentarily blocked the IR path with my finger, and plotted the results, as shown below:

Filter response with finger blocking IR path for a few seconds, then removed

And again, just waving my finger into and out of the IR path several times over several seconds

Filter response with finger moved in and out of IR path several times over 3-5 seconds

John’s comment on all this was “still not right”, as explained below:

 

“Think I misinterpreted earlier plot  as the one below makes it clearer what is happening.   The flat segments are when sample timing is proper.   So 3 good sets of cycles samplesin a row is max before there is a relative phase slip of ~1/10 of 180degs of Tx signal or ~96us = (time_for_180deg = (1/520)/2) / 10.
The slips are thus equal to  10 raw samples which are lost every ~35 Tx cycles = (51-16) for CSumQ  and ~34 Tx cycles (68-34) for CSumI.   So 20 raw samples (1 full Tx cycle)  are lost every  ~69 Tx cycles and this results in 64-cycle running I and Q sums being  only 5/64  of 64 times the actual  signal magnitude because all but  5 of the 64 values in running sums are canceled out by other values in the I and Q running sums.   Applying this to the running sum of abs(I)+abs(Q) gives ~35.8K which is close to what  you are seeing — that is:    ~35,823 = (5/64)*(64*7164.667)
Seems almost certain  that the calcs are correct and the problem is that the Teensy can’t keep up with the needed sample rate (probably due  coding approach or interrupts but less likely “magic timing code” Tx and Rx were very stable on scope).   Back to thinking ADC –> DMA may be the cure”

My take on the above was that it is the print statements that are the cause of the problem, not anything to do with Teensy ADC speed.  To test this theory, I removed all but the ‘Final Value’ print statement, and put that one statement in a block that executed only once per 64 cycles.  With this change, I could see that the print output was keeping up with real time and not lagging as before. Below shows a plot with several IR beam block/unblock cycles, and it appears that the ‘Final Value’ output is much more responsive to the block/unblock events.  Interestingly, it appears there is some sort of Gibbs phenomenon associated with the ‘fast’ cycles when the system switches between one equilibrium level and another.

The next step is to remove the print statements entirely and route the ‘Final Value’ output to one of the two DAC lines on the Teensy 3.5.  To test this idea, I adapted the HobbyTronics sine wave generator code to my project (pin A21 on the Teensy 3.5 vs A14 on the Teensy 3.2), and got the following display on my ‘scope:

Output from Teensy 3.5 DAC0 (A21) pin

So, now I know that the DAC output works – now I just need to scale the ‘Final Value’ numbers correctly and connect them to the DAC output.  Then I can watch the filter action in real time without worrying about the impact of print statements.

With an input square wave amplitude of about 0.35V p-p, or about 0.35/3.3 = 0.106 FS, the output is about 29,000.  Therefore the peak output from the FV stage should be about 29,000/0.106 ~ 273,584.  so an input of 0.35/3.3 = 0.106 should produce an output of (29,000/273584) *3.3 = 0.106*3.3 = 0.35

So, I modified my Sinewave test code to output the Final Value number, scaled by 273,584, and got the following display.  In this test, the input amplitude is about 0.35V p-p, and the output is about 0.35VDC

 

 

 

Stay tuned,

Frank

 

 

 

 

 

 

 

 

IR Modulation Processing Algorithm Development. Part XI

Posted 26 June 2017

In my last post on this subject, I had just decided to replace the Trinket transmit module with a second Teensy 3.5 to see if this would produce a more stable transmit waveform.  Indeed, it did seem to be more stable, as shown by the plots at the bottom of yesterday’s post.

Today I decided to see if I could further improve the situation by getting a closer match between the transmit and receive  frequencies.  To do this I modified the demodulator program to output a square wave at what it thinks is the right frequency, and compare it to the waveform being used to drive the IR LED.  If I can get these to match exactly, then there will still be a phase offset, but it will be a (mostly) constant factor.

The starting point for this effort is shown in the following short video, taken several eons (OK only 10 days) ago  on June 16.

 

 

With this as the starting point, I started first by tweaking the transmit frequency.  For this test, I triggered the scope on the receiver square wave signal, and tried to stop the drift of the transmit waveform.  After a while I got pretty close, but couldn’t get it stopped.  This was more than a little bit frustrating, as I thought the Teensy much higher clock speed should provide commensurately better resolution using the Teensy 3.x-specific ‘elapsedMicros’ type.

So, back to the inet for some more research.  After an hour or so I found this post, and buried in the middle of it was a reply by  PaulStoffregen    (creator of the Teensy line) that noted that

The proper way to get repeating on a 1000 us interval is this:

Code:

If you set “sinceLastRead” to zero, you could “lose” any increment it makes between the time your “if” condition runs and the time it’s written. An interrupt might occur, for example. By subtracting 1000, and increment it’s made will be preserved. Software latency can still cause jitter, but overall you’ll get the correct rate, even if interrupts delay the write to sinceLastRead.

I had been setting my elapsedMicros variable to zero, so I decided to try using Paul’s technique to see if the timing accuracy improved at all.  When I did, suddenly I was able to get the transmit and receive frequencies to match  very closely, as shown in the following short video clip

 

After seeing this miracle of timing, I decided to run another set of ‘final value’ plots in Excel, with the following results

These plots  seem a lot more consistent than before, and a lot less ‘spiky’, so maybe I’m making some progress.  However, friend and mentor John Jenkins’ comment was “still something wrong – magnitude is off” – rats! ;-).

Stay tuned,

Frank

 

IR Modulation Processing Algorithm Development. Part X

Posted 24 June 2017

Well, I may have spoken too soon about the perfection of my implementation of John’s ‘N-path’ band-pass filter (BPF) intended to make Wall-E2 impervious to IR interference.  After my last post on this subject, I re-ran some of the ‘Final Value’ plots for different received IR modulation amplitudes and the results were, to put it bluntly, crap 🙁 . Shown below is my original plot from yesterday, followed by the same plot for different input amplitudes

Computed final values vs complete input data cycles for sensor channel 1 (This is the original from yesterday)

 

So, clearly something is ‘fishy in Denmark’ here, when the ‘no-signal’ case with only high-frequency noise causes the output to increase without limit, and the ‘input grounded’ case is decidedly non-zero (although the values are  much lower than in the ‘signal present’ cases).

Time to go back through the entire algorithm (again!) looking for the problem(s).  

25 June 2017

My original implementation of the algorithm was set up to handle four sensor input channels, so each step of the process required an iteration step to go through all four, something like the code snippet below:

In order to simplify the debug problem, I decided to eliminate all these iteration steps and just focus on one channel.  To do this I ‘branched’ my project into a ‘SingleChannel’ branch using GIT and TortoiseGit’s wonderful GIT user interface (thanks TortoiseGit!).  This allows me to muck about in a new sandbox without completely erasing my previous work – yay!

Anyway, I eliminated all the 4-sensor iteration steps, and went back through each step to make sure each was operating properly.  When I was ‘finished’ (I never really finish with any program – I just tolerate whatever level of bugs or imperfections it has for some time).  After this, I ran some tests for proper operation using just one channel.  For these tests, the Teensy ADC channel being used was a) grounded, b) connected to 3.3VDC, c) unterminated.  For each condition I captured the ‘Final Value’ output from the algorithm and plotted it in Excel, as shown below.

Single channel testing with grounded, unterminated, and +3.3VDC input

As can be seen from the above plot, things seem to be working now, at least for a single channel.  The ‘grounded’ and ‘3.3VDC’ cases are very nearly zero for all time, as expected, and the ‘unterminated’ case is also very low.

Next, I added a 0.5V p-p signal at ~520Hz to the sensor input, and re-ran the program.  After capturing the ‘Final Value’ data as before, I added it to the above plot, as shown below

Final Value vs Cycles for 0.5V p-p input

As can be seen in the above plot, the ‘Final Value looks  much more reasonable than before. When plotted on the same scale as the ‘grounded’, ‘unterminated’, and ‘+3.3VDC’ conditions, it is clear that the 0.5V p-p case is a real signal.

Then I ran a much longer term (11,820 cycles, or about 22-23 sec) test with 0.5V p-p input, with the following results.

As can be seen from the above plot, the final value is a lot more ‘spiky’ than I expected.  The average value appears to be around 30,000, but the peaks are more like 60,000, an approximately 3:1 ratio.  With this sort of variation, I doubt that a simple thresholding operation for initial IR beam detection would have much chance of success.  Hopefully, these ‘spikes’ are an artifact of one or more remaining bugs in the algorithm, and they will go away once I find & fix them 😉

Update:  Noticed that there was a lot of time jitter on the received IR waveform – wonder if that is the cause of the spikes?

sensor waveform jitter. Note that this display is separately (Vert Mode) triggered.

Following up on this thread, I also looked at the IR LED (transmit) and photodetector (receive) waveforms together, and noted that there is quite a bit of time jitter on the Tx waveform as well, and this is received faithfully by the IR photodetector, as shown in the following short video clip

 

So, based on the above observations, I decided to replace the Trinket transmit waveform generator with another Teensy 3.5 to see if I could improve the stability of the transmit signal.  Since I never order just one of anything, I happened to have another Teensy 3.5 hanging around, and I soon had it up and running in the setup, as shown below

Replaced Trinket transmitter with Teensy 3.5

Transmit and receive waveforms

As the above short video and photos show, the Teensy implementation of the transmit waveform is  much more stable than the Trinket version.  Hopefully this will result in better demodulation performance.

The next step was to acquire some real data using a 0.5V p-p input signal through the IR beam path.  I took this in stages, first verifying that the raw samples were an accurate copy of the input signal, and then proceeding on to the group-sum, cycle-sum, and final value stages of the algorithm.

Sample capture using an input of 0.5V p-p through the IR path

GroupSum I/Q plots using 0.5V p-p Input Signal

I then used Excel to compute the cycle sums associated with each group of 4 group sums

Calculated Cycle Sums for a 0.5V p-p Input Signal

And then I used Excel again to calculate the ‘Final Value’ from the previously calculated cycle sum data

Calculated final value

Keep in mind that all the above plots are generated starting with  real IR photodector data, and not that large of an input at that (0.5V p-p out of a possible 3.3V p-p).

The next step was the real ‘proof of the pudding’.  I ran the algorithm again, but this time I simply printed out final values – no intermediate stages, and got the following plots

Final Value vs time, for 0.5V p-p Input Signal

Detail of previous plot

From the above plots, I think it is clear that the algorithm is working fine, and most of the previous crappy results were caused by poor transmit timing stability.  I’m not sure what causes the ripple in the above results, but I have a feeling my friend and mentor John Jenkins is about to tell me! 😉

Sleeeeeep, I need sleeeeeeeeeeep….

Frank

 

 

 

IR Modulation Processing Algorithm Development. Part IX

Posted 19 June, 2017

In my last post on this subject, I showed that my 4-sensor band-pass filter (BPF) algorithm was feasible when run on a Teensy 3.5 SBC.  However, what I haven’t done  yet is to verify that the algorithm is indeed producing valid results, when fed with real sensor input.

I should be able to verify proper algorithm operation with my single-sensor test bed (as shown in the following photo) by moving the single sensor input line to each sensor channel (ADC input) in turn, and monitoring the data at different stages in the processing chain.

Teensy 3.5 installed on my algorithm test bed, with the Uno shown for size comparison. The small processor in the foreground is an Adafruit ‘Trinket’

Since I now have plenty of RAM to play with, I should be able to save a representative sample of the input data and intermediate results in suitably sized arrays, run the algorithm long enough to fill those arrays, and then print them all out at the end.

  • I will probably want to run the process long enough to completely fill the 64-element I & Q ‘running sum’ arrays.  These arrays already exist for all 4 sensor channels, so this has no effect on available RAM
  • The next step backward in the chain are the I & Q ‘cycle group sum’ elements (one pair per sensor channel) used to generate one element in the running sum arrays.  To store all these cycle group sum elements will require two 256-element arrays per sensor channel.
  • And the first step in the process is the raw sensor input data.  To store all the data required to generate 64 elements in the running sum arrays will require a single 1280-element array per sensor channel.

In summary, to instrument one sensor channel from start to finish will require

  • 1ea 1280-element array to hold the raw data
  • 2ea 256-element arrays to hold the cycle group sums

for a total of 1280 + 512 = 1796 elements at 2 bytes/element = 3592 bytes.  If I wanted to do this for all 4 sensor channels at once, the total would be 14368 bytes, still well within the 192KB RAM availability on the Teensy – nice!

Results – Capture Stage:

The first step was to capture/display the raw ADC data to make sure that part was operating correctly.  The plots below show all 4 sensor channels.

Raw ADC data for all 4 sensor channels, 1280 elements (enough to fill the entire 64-element running sum array)

First 40 elements of the raw ADC capture

As can be seen in the above plots, channel 1 shows the 520Hz detected IR waveform, and the other three channels show just noise.

Results – Intermediate Stages:

The next step was to verify proper operation of the step that accumulates a 1/4 cycle group of samples and generates the I & Q ‘sample group sum’ components.  To verify this stage of the algorithm, I captured 5 cycles of data, as shown below:

Sensor channel 1 raw data and I/Q sample group sums

In the above plot, the dark blue line is the raw ADC data input, which varies from about the ADC maximum of 4096 to about 3890,  or about 161mV (3.3V ADC reference and IR detector supply).  The resulting ‘sample group sums’ are shown in orange (for the I component) and gray (for the Q component).  The significance of the plot is that the sample group sums and the I/Q component generation appears to be happening correctly.  The orange points follow a {+1, +1, -1, -1} sequence, while the gray ones follow a  {+1, -1, -1, +1} sequence, as expected.

Next, I printed out this same 5-cycle segment in text form, as shown below (double-click in code window to enable scrollbar)

The above table shows the raw data, the sample-group sums, and the corresponding cycle-group sums. For example, the first set of 5 data samples adds to  19673.  Since this is the first sample-group sum, it is multiplied by “+1” to form the I component, and “-1” to form the Q component, and these are shown adjacent to the last raw data in that sample group.  After 4 such sample-group sums, the cycle-group sum I/Q components are generated by adding the 4 sample-group I/Q components respectively; for the first cycle these are -1736 & -246 as shown adjacent to the 20th sample (sample #19).

Results – Final Stages:

The cycle-sum I & Q components generated above are saved in separate 64-element circular buffers, and the running sum of these buffers are then used to form the final demodulated value for the channel of interest.  The final value is computed as the sum of the absolute values of the I & Q component running sums, i.e. FV = abs(RunningSumI) + abs(RunningSumQ).  To demonstrate proper algorithm functioning, I printed out the computed final values for well over 1000 cycles of raw data, as shown in the Excel plot below

Computed final values vs complete input data cycles for sensor channel 1

As shown in the plot above, the final value rapidly rises from zero to around 2×106 in the first 64 cycles of the run, after which it generally levels off for the rest of the run.  There is quite a bit of ripple on the signal, which my friend and mentor John Jenkins mentioned might happen as the non phase-locked input and sampling frequencies slowly slid by each other (at least I hope that is what is happening!).

So, it looks like the algorithm is doing what it should, and my ‘scope measurements to date indicate that the Teensy is doing it all without breaking a sweat, even with print statements thrown in.  It appears that I could probably double the number of samples/cycle and still have plenty of time to finish all the computations.

However, there are still a number of things to be accomplished before this new feature makes it into the field.

For starters, I’m not sure how to normalize the final value.  For a fairly weak (~160mv out of 3V) signal the final value is north of 2 million – what happens for stronger signals, and how to I normalize this down to a range that I can use to drive an analog output?  I suppose I could simply apply the IR modulation signal directly to the analog input (bypassing the IR path entirely) and see what happens, but I’d also like to understand the math.  Maybe John Jenkins can help with this (hint, hint, wink, wink!).

Also, I’d like to validate the idea that this algorithm will selectively reject other signals that aren’t close to the desired 520Hz modulation frequency.  I plan to test this by modifying the Trinket algorithm to make it a swept frequency generator (say from 470 – 570 Hz) and see how the output changes.

Stay tuned!

Frank

 

 

 

 

 

IR Modulation Processing Algorithm Development. Part VIII

Posted 18 June 2017

In my last post on this subject, I showed how I could speed up ADC cycles for the Teensy 3.5 SBC, ending up with a configuration that took only about 5μSec/analog read.  This in turn gave me some confidence that I could implement a full four-sensor digital BPF running at 20 samples/cycle at 520Hz without running out of time.

So, I decided to code this up in an Arduino sketch and see if my confidence was warranted.  The general algorithm for one sensor channel is as follows:

  1. Collect a 1/4 cycle group of samples, and add them all to form a ‘sample_group’
  2. For each sample_group, form I & Q components by multiplying the single sample_group by the appropriate sign for that position in the cycle.  The sign sequence for I is (+,+,-,-), and for Q it is (-,+,+,-) .
  3. Perform steps 1 & 2 above 4 times to collect an entire cycle’s worth of samples.  As each I/Q sample_group component is generated, add it to a ‘cycle_group_sum’ – one for the I and one for the Q component.
  4. When a new set of cycle_group_sums (one for I, one for Q) is completed, use it to update a set of two N-element running sums (one for I, one for Q).
  5. Add the absolute values of the I & Q running sums to form the final demodulated signal value for the sensor channel.

To generalize the above algorithm for K sensor channels, the ‘sample_group’ and ‘cycle_group_sum’ variables become K-element arrays, and each step becomes a K-step loop. The N-element running sum arrays (circular buffers) become [K][M] arrays, i.e. two M-element array for each sensor (one for I, one for Q).

All of the above sampling, summing, and circular buffer management must take place within the ~96μSec ‘window’ between samples, but not all steps have to be performed each time.  A new sample for each sensor channel is acquired at each point, but sample groups are converted to cycle group sums only once every 5 passes, and  the running sum and final values are only updated every 20 passes.

I built up the algorithm in VS2017 and put in some print statements to show how the gears are turning.  In addition, I added code to set a digital output HIGH at the start of each sample window, and LOW when all processing for that pass was finished.  The idea is that if the HIGH portion of the pulse is less than the available window time, all is well. When I ran this code on my Teensy 3.5, I got the following print output (truncated for brevity)

And the digital output pulse on the scope is shown in the following photo

Timing pulse for BPF algorithm run, shown at 10uS/cm. Note the time between rising edges is almost exactly 96uSec, and there is well over 60uSec ‘free time’ between the end of processing and the start of the next acquisition window.

As can be seen in the above photo, there appears to be plenty of time (over 60μSec) remaining between the end of processing for one acquisition cycle, and the start of the next acquisition window.  Also, note the fainter ‘fill-in’ section over the LOW part of the digital output.  I believe this shows that not all acquisition cycles take the same amount of processing time.  Four acquisition cycles out of every 5 require much less processing, as all that happens is the individual samples are grouped into a ‘sample_group’.  So the faint ‘fill-in’ section probably shows the additional time required for the processing that occurs after collection/summation of each ‘sample_group’.

The code for these measurements is included below:

More to come,

Frank

IR Modulation Processing Algorithm Development. Part VII

Posted 17 June 2017

In my previous post on this subject, I discussed my decision to change from an Arduino Uno SBC to a Teensy 3.5 for implementing the  ‘degenerate N-path’ digital band-pass filter (BPF) originally introduced to me by my old friend and mentor John Jenkins.  After replacing the Uno with the Teensy and getting everything running  (which took some doing, mostly due to my own ignorance/inability), it was time to see if the change would pay off in actual operation.

In my initial perusal of the available documentation for the Teensy 3.x SBC (have I told you lately how much I love the widespread availability of information on the  inet?), I ran across some new programming features that aren’t available in the rest of the Arduino world.  The Teensy 3.x supports two independent 32-bit timers, supported by two new libraries (TimerOne and TimerThree).  When I first looked at this new functionality, I thought – “wow – this is just what I need to implement the sampling front-end portion of the digital BPF – I can use it with an appropriate ISR to get accurate sample timing!”.   And then I ran across Paul’s ‘Delay and Timing‘ page with it’s description of the new ‘elapsedMillis’ and ‘elapsedMicros’ functions; These functions allow for accurate periodic execution of code blocks inside the normal ‘loop()’ function, without having to deal with interrupts and ISRs – cool!  And then I ran across the ‘FrequencyTimer2’ library written by  Jim Studt….

So now I found myself going from no real good options for accurate sample timing to a ‘veritable plethora’ of options, all of which looked pretty awesome – what’s a guy to do?  Since the ‘elapsedMicros’ option looked like the simplest one to implement, I decided to try it first.

elapsedMicros:

From previous work I have a Trinket SBC transmitting an IR beam modulated by  a square-wave at approximately 520Hz.  The plan is to sample this waveform 20 times per cycle, and to have the sampling frequency as close as possible to 20×520 = 10.4Ksamples/sec, or approximately 96μS/sample.

I created a small test program to explore the feasibility of using the ‘elapsedMicros()’ function for IR detector sensor sampling.

 

In the above program, I simply generate a 10μS pulse every 95.7μS.  The ‘95.7’ value was empirically determined by watching the transmitted  IR waveform and the  10μS pulses together on a scope, and adjusting the value until the difference between the two frequencies was as small as possible (i.e. when the movement of the transmit waveform compared to the pulse train was as slow as possible), as shown in the short video below:

 

In the above video, the lower trace is the generated pulse train, and the upper trace is the transmitted IR modulation waveform.  The scope trigger was set to the pulse train, with the modulation waveform free to slide left or right based on the ‘beat frequency’ between the two waveforms.

Next, I added code to save ADC samples to an array for later printout.  Now that I am no longer constrained by the minuscule amount of RAM available on the Uno, I opened up the array size to 2000 elements to allow more viewing time before the program was interrupted by the serial output delays.  The code for this and the resulting Excel plot are shown below:

The resulting 2000 element array was dropped into Excel and plotted, as shown below:

All 2000 samples from the test program

First 40 samples. Note that 40 samples covers exactly two cycles of the modulation waveform

So, it looks like the ‘elapsedMicros()’ function is doing exactly what I want it to do – sampling the input waveform at almost exactly 20 samp/sec without me having to figure out the exact delay time needed.

The next step was to determine how much ‘free time’ is left over for other processing steps like sampling multiple sensor channels, doing the ‘sample’ and ‘cycle’ sums, etc.  For this step, I removed the array loading section and replaced it with a call to ‘delayMicros()’.  Then I manually adjusted the delay value until the period of pulse train started expanding away  from the desired 95.7μS value.  The result was that a delay value of 85μS did not change the pulse period, but a value of 90μS did (slightly).  So, I have between 85 and 90μS of ‘free time’ available (out of a total of 96!!!)  for other processing chores.  Adding a single call to ‘analogRead(IRDET_PIN)’ reduced the available ‘free time’ by about 15μS, from between 85 & 90 to between 70 & 75μS.  This shows that the time for a single analog read is about 15μS, which may be due to the same pre-scaling issue as I saw on the Uno (to be determined).  In any case, even if I utilize 4 sensor channels, I should be have about 25μS left over for the summation and array load operations.

To investigate the analogRead() timing issues, I set up a small program to measure the time required to read a pin 1000 times.  Here’s the code:

With the above code, and with all default settings, the time required for 1000 reads was 17mSec, so about 17μS, which tracks well with  the above measurements.

After changing the conversion speed to ADC_CONVERSION_SPEED::HIGH_SPEED, the time required for 1000 measurements was reduced to 11mS, so about 11μS per read.

I ran a whole series of test with the different Teensy ADC library settings, with the following results.  All times are in microseconds, and are the average of 1000 iterations

  • conversion and sampling speed set to “HIGH”: 10.997
  • all adjustments commented out: 17.281
  • just conversion speed set to “HIGH”: 11.014
  • just sampling speed set to “HIGH”: 15.190
  • just resolution changed to 12 bits: 17.276
  • just resolution changed to 8 bits: 17.242
  • HIGH conversion and sampling speeds, and with 8-bit res: 8.931
  • HIGH conversion and sampling speeds, and with 12-bit res: 10.998
  • All of the above, plus averaging set to 1: 4.758

So, I can get the ADC time down to about 5μS/sensor, which means that even with four sensor channels being monitored, I will have over 70μSec for ‘other stuff’, which should be more than enough to get everything done.

Frank

 

IR Modulation Processing Algorithm Development. Part VI

Posted 14 June 2017

In my previous posts on this subject, I have been working with an Arduino Uno as the demodulator processor, but I have been plagued by its limitation of 2KB for program memory. This has caused severe limitations with timing debug, as I can’t make debug arrays long enough for decent time averaging, and I can’t do more than one sensor channel at a time.

So, I finally took the plunge and acquired some of Paul J Stoffregen’s Teensy 3.5 processors from their store.  From their site: “Version 3.5 features a 32 bit 120 MHz ARM Cortex-M4 processor with floating point unit. All digital pins are 5 volt tolerant.” The tech specs are shown on this page, but the main features I was interested in are:

  • 120MHz processor speed vs 16MHz for the Uno
  • 192KB RAM vs 2KB for the Uno
  • Analog input has 13 bit resolution vs 12 for the Uno
  • As an added bonus, the Cortex-M4 has an FPU, so integer-only math may be unnecessary.
  • Much smaller physical footprint – the Teensy 3.5 is about 1/4 the area of the Uno
  • Lower power consumption – The Teensy 3.5 at 120MHz consumes about 30mA at 5V vs about 45mA at 5V for the Uno.

Here are some photos of the Teensy 3.5 as installed on my algorithm test bed, and also on my Wall-E2 robot where it might be installed:

Teensy 3.5 installed on my algorithm test bed, with the Uno shown for size comparison. The small processor in the foreground is an Adafruit ‘Trinket’

Side-by-side comparison of the Uno and Teensy 3.5 SBC’s

Closeup of the Teensy 3.5 shown atop the ‘sunshade’ surrounding the IR sensors.  this is a possible installed location

Wider view of a Teensy 3.5 placed atop the ‘sunshade’ surrounding the IR sensors

In addition to all these goodies, the folks at Visual Micro added the Teensy line to their Microsoft Visual Studio add-on, so programming a Teensy 3.5 is just as easy as programming a Uno – YAY!

Of course, I’ll need to re-run all the timing tests I did before, but being able to create and load (almost) arbitrary-length sample capture arrays for debugging purposes will be a great help, not to mention the ability to use floating-point calculations for better accuracy.

Stay tuned,

Frank

 

 

IR Modulation Processing Algorithm Development. Part V

Posted  09 June, 2017

In getting the Arduino code working on my Uno/Trinket test setup (shown below),  I have been having some trouble getting the delays right.  It finally occurred to me that I should run some basic timing experiments, so here goes:

Sample Group Acquisition Loop:

this is the loop that acquires analog samples from the IR detector, and sums 1/4 cycle’s worth into a single ‘sample group’.  To measure this time, I ran the following code:
int startusec = micros();
int sum = 0;
for (int i = 0; i < 1000; i++)
{
int samp = analogRead(SQWAVE_INPUT_PIN1);
sum += samp;
}
int endusec = micros();
Serial.print("time required for 1000 analog read/sum cycles = "); Serial.println(endusec - startusec);

The time required for 1000 cycles was 15064 uSec, meaning that one pass through the loop takes an average of just over 15 uSec. Adding a 85 uSec delay to the loop should result in a loop time of exactly 100 uSec, and a 1000 pass loop time of 100,000 uSec or 0.1sec.  The actual result was 99504, or about 99.5 uSec/cycle – pretty close!

Next, I replaced the summation with a write to a 500-element array (couldn’t do 1000 and still fit within the Uno’s 2K memory limit), and verified that this did not materially change the loop timing.  The time required for 500 loops was 49788; twice that time would be 99576, or almost exactly the same as the 99504 time for the summation version.

Then I tweaked the delay to achieve as close to 25 complete cycles as possible, as shown in the Excel plot below.  With an 82uSec loop delay, the total time for 500 loop iterations was  48272, or about 96.544 uSec per loop iteration.

96.544 uSec per loop iteration, and 20 loop iterations per cycle gives 20*96.544 = 1930.88 uSec per cycle or 518 Hz.  This is very close to the 525Hz value I got from my O’scope frequency readout when I first fabricated my little test setup.

Next, I coded 500 iterations of a two-detector capture/sum operation, and got: “time required for 2-detector 500 analog read/store cycles = 15520”.  So,  about 31 uSec/iteration, or almost exactly twice the one-detector setup.  A four-detector setup yielded a time of  30352 uSec for 500 iterations, or about 60.15 uSec/iteration.  So, a 4-detector setup is possible, assuming the Uno 2KB memory constraint issue can be addressed successfully.

In summary:

  • It takes about 15 uSec to read each sensor’s A/D value and either sum it or store it in an array
  • A four-sensor setup can probably be accommodated, but only if the required summing arrays fit into available memory (not possible for Uno, but maybe for others.
  • A loop delay value of 82 uSec results in almost exactly 20 samples/cycle.

Stay tuned

Frank

 

 

 

IR Modulation Processing Algorithm Development. Part III

Posted 27 May 2017

In my previous post I demonstrated an algorithm for processing a modulated IR signal to extract an intensity value, but the algorithm takes too long (at least on an Arduino Uno) to allow for 20 samples/cycle (admittedly  way over the required Nyquist rate, but…).  So I decided to explore ways of speeding up the algorithm.

First, the baseline:  The starting point is the 17,384  Î¼Sec required to process 100 samples in the current algorithm, or 174 μSec/sample.  At an input frequency of 520Hz,  20 samples/cycle is about 96  Î¼Sec/sample, so I’m off by a factor of 2 or so.  And this is only for  one channel, so I’m really off by a factor of 4 (for a 2-channel setup) or 8 (for my current 4-channel arrangement)

As an experiment, I reduced the running average length from 5 to 1  cycles, or from 100 to 20 samples.  This reduces the shifting operation load by a factor of 5, and resulted in a total processing time of 1876  Î¼Sec  for all 100 samples – wow!

Then I discovered I had failed to uncomment the line that loads the new running average value into the front of the running average array, so I put that back in and re-ran the measurement.  This time the number came up as 10748  Î¼Sec!  This is just not possible!  It is impossible that 10,000 (100 iterations/sample, 100 samples) iterations of a copy operation from one location in the array to another one takes 1/10 the time as 100 iterations (1/sample) of a copy operation from a variable into the array – not possible!!!

But, since it was happening anyway – whether possible or not, I decided I was going to have to figure it out :-(.  So, I changed the line

RunningAvg1[0] = (int)chan1Avg;

to

RunningAvg1[0] = 0;

and re-ran the measurement.  This time the total for processing 100 samples was 1896  Î¼Sec – much more believable!  So, what’s the difference between these two operations?  The only thing I could think of is that it must take a  lot of time to convert a double to an int.

So, I  ran a test where I executed the ‘RunningAvg1[0] = (int)chan1Avg;’ line 10 times, all by itself, and measured the elapsed time.  I got 72  Î¼Sec – a much more believable number, but not what I was expecting.  Increasing the number of iterations to 100 resulted in an elapsed time of 672 μSec – consistent with 72  Î¼Sec for 10 iterations.  That’s nice, but I’m still not any closer to figuring out what’s going on.

Well, after a bunch more experiments, I  think I have the problem narrowed down to the use of floating point math on a few operations.  I have seen some posts to the effect that floating point math is much slower than integer math on Arduino processors, and these experiments tend to bear that out.  I should be OK with integer math everywhere, I hope ;-).

After completely re-writing the algorithm to eliminate floating point math (and correcting several logic errors – oops!), I re-ran the 100-element process for 1 channel, with the following results:

All components – original captured samples, running average, AC component, and full-wave rectified component. Note elapsed time of 3008 uSec

From the above Excel plot, it is clear that the algorithm successfully  extracted the full-wave rectified value for the incoming modulated IR signal, and did so in only 3008 uSec for 100 samples.  This should mean that I can easily handle up to three simultaneous channels, and maybe even four – YAY!

Another run with two simultaneous channels  was made.  The following Excel plot shows the Channel 2 results, along with the elapsed time for both channels.

Channel 2 all components – original captured samples, running average, AC component, and full-wave rectified component. Note elapsed time of 4268 uSec

The above results for two channels strongly suggests that all four channels in the current hardware implementation can be processed simultaneously while still maintaining a 20 sample/cycle sample rate.  This is extremely good news, as it implies that I can ‘simply’ insert an Arduino Uno or equivalent between the detector array and the robot controller.  The robot contoller will continue to see left/right analog values as before (but inverted – more positive is more signal), but background IR interference will be averaged out by the intermediate processor – cool!

Rather than use a Uno, which is physically very large, I hope to be able to use something like an Adafruit Arduino Pro Micro, as shown below:

Adafruit’s Arduino Pro Micro. 16MHz, 9 Analog 12 Digital I/O

This should fit just about anywhere (probably on top of the sunshade), and be very easy to integrate into the system – we’ll see.

Stay tuned!

Frank

 

 

 

IR Modulation Processing Algorithm Development. Part II

Posted 25 May 2017

One of the things I didn’t understand about the analog sample runs from my previous post  was why there were so many cycles of the IR modulation signal in the capture record; I had set the algorithm up to capture only 5 cycles, and there were more than 10 in the record – what gives?

Well, after a bit of on-line sleuthing, I discovered the reason was that the A/D conversion process associated with the analogRead() function takes a  LOT longer than a digitalRead() operation.   This put a severe dent in my aspirations for real-time processing of the modulated IR signal, as I would have to do this for at least two, and maybe four independent signal  streams, in real time – oops!

One thing I have discovered for sure in the modern internet era; if you are having a problem with something, it is a certainty (i.e. Prob = 100%) that many others  in the universe have had the same problem, and most likely someone has come up with (and posted about) one or more solutions.  So, I googled ‘Arduino Uno  faster  analogRead()’, and got the following hits:

The very first link above took me to this forum post, and thanks to jmknapp and oracle, I found the Arduino code to reset the ADC clock prescale factor from 128 to 16, thereby decreasing the conversion time by a factor of 8, with no reduction in ADC resolution – neat!

To test the effect of the prescaler adjustment, I measured the time it took for 100 ADC measurements with no delay between measurements.  As shown below, there is a dramatic difference in the ‘before’ and ‘after’ plots:

 

100 ADC cycles with no delay, prescale = 128

100 ADC cycles with no delay, prescale = 16

Next, I adjusted the delay between ADC cycles to collect approximately 5 cycles at the 520Hz input rate, as shown below:

Delay adjusted so that 100 samples ~ 5 cycles at 520Hz.

With the prescaler set to 16, the ADC is  much faster.  With a 5-cycle collection window at 520Hz, I have 80 uSec/cycle to play with for other purposes, so it seems reasonable that I can handle multiple input streams with relative ease – YAY!!.

The next step was to simulate a 4-channel capture operation by capturing 400 samples, 100 each from four different channels. In this simulation, all the data comes from the same IR link, but the processing load and timing is the same.  All the samples from the same time slot are taken within a few microseconds of each other, and the loop (inter-sample) delay was adjusted such that approximately five cycles were captured from each ‘channel’, as shown in the following Excel plot

Simulated 4-channel capture

As can be seen in the above plot, the channel plots overlap almost exactly.  What this shows is that the Arduino Uno can capture all four IR detector channels at sufficient time resolution (about 20 samples/cycle) for effective IR signal detection/evaluation, and with sufficient time left over (about 30 uSec) for some additional processing.

If the design is changed from four channels to just two, then the processing load goes down significantly,  as shown in the following plot

Simulated 2-channel capture

To complete the simulation, I added the code to perform the following operations on a sample-by-sample basis:

  • Update  the running average of the sample array
  • Subtract the running average from  the sample, and take the absolute value of the remainder (full-wave rectification)
  • Store the result in another array so it can be plotted. This last step isn’t necessary except for debugging/evaluation purposes

Initial results as shown below are very promising. The following Excel plots show the results of processing 100 ADC samples in real time.  First 100 samples were loaded into an array to represent the last 100 samples in a real-time scenario, and the running average value was initialized to the average of all these samples.  Then each subsequent real-time sample was processed using the above algorithm and the results were placed in holding arrays for later printout, with the following results

All components – original captured samples, running average, AC component, and full-wave rectified component

Detail view of original captured samples and the running average component

Detail view of the AC component of the original captured samples and the computed full-wave rectifed component

The above plots confirm that the ADC samples can indeed be processed to yield the full-wave rectified intensity of a modulated IR beam.  However, there is a fly in the ointment – it takes too long; it took 17,384  Î¼Sec to process 100 samples – but 100 samples at 20 samples/cycle only takes approximately 9600  Î¼Sec – and this is only for one channel :-(.  I will need to find some serious speedup tricks, or reduce the number of samples/cycle, or both in order to fit the processing steps into the time available.

Stay tuned,

Frank