Wall tracking: finding the heading parallel to the nearest wall

Posted 14 February 2020,

Happy (American) Valentines Day! In my last post, I described my plan to use Wall-E2’s new relative heading super power to find the relative heading parallel to the nearest wall.  I ended that post with “…and not all that hard to program, either”.  Well, this turned out to be a bit of an exaggeration as things weren’t quite as easy as I first thought; the interaction of the physics of the robot and the time scales associated with ping measurements complicated things a bit.

Background:

For some time now I have been working on ways to enhance Wall-E2’s autonomous wall-tracking ability.  Wall-E2 can track walls fairly well, but lacks the ability to track a wall at a specified stand-off distance.  Currently, tracking occurs at whatever distance Wall-E2 first detects the nearest wall. While this isn’t terrible, I wanted to do better.

Unfortunately, the way in which the measured ‘ping’ distance to the nearest wall interacts with the relative orientation of the robot with respect to that wall makes it almost impossible to determine the actual offset distance, and therefore how to determine what to do to maintain a constant offset distance.  As shown in the following diagram, when the robot makes a turn, the measured distance to the wall will change just due to the orientation change, without the robot’s actual offset distance changing at all.

Without having some idea of the angle theta in the above diagram, making a judgement of where the robot is relative to the target offset distance is difficult, if not impossible.   This situation was the impetus for adding the MPU6050 Inertial Measurement Unit (IMU) to Wall-E2’s list of super powers. The general idea was that knowledge of relative headings would allow Wall-E2 to make accurate heading-controlled turns without relying solely on timing.  After a lot of work to eliminate RFI/EMI problems associated with the Pololu metal-geared motors on Wall-E2, I’m happy to say that the MPU6050 is now quite stable, and making turns of just a few degrees is quite possible.

However, acquiring and then maintaining a particular offset distance from the nearest wall is still not straightforward.  Back in early December last year I demonstrated the ability to acquire and then maintain a constant offset distance, but only if the robot started out reasonably parallel to the wall.  If the robot was oriented toward or away from the wall by more than a few degrees, it would not work.  So I needed to find a way to first orient the robot parallel to the nearest wall at any distance, so that my current acquisition & tracking algorithm would work successfully.

The basic idea behind finding the parallel heading is that when the robot is turned through a forward arc and the measured ‘ping’ distance decreases and then starts increasing, the robot’s relative heading at this inflection point is the desired parallel heading.   If the distance instead starts increasing, then the robot started out either parallel to or facing away from the wall.  In either case, reversing the turn back toward the wall will cause the measured distance to decrease to a minimum and then increase again.  As in the first case the heading at the point at which the measured distance starts to increase is the desired parallel heading.

Although the basic idea as described above is very straightforward, as usual there are some ‘gotchas’ in the actual implementation:

In order to minimize heading overshoot due to the robot’s mass and angular momentum,  the parallel heading search turns must be performed at lower-than-normal speeds.  After some experimentation I settled on a turn rate of about 60 deg/sec.  With the robot starting with an angle-in orientation of about 30 deg, this means that it takes about 1 second to sweep through to an angle-out orientation of about 30 deg.  With the Arduino UNO setup I’m using for the tests, I was getting distance measurements about every 30-50 mSec, so about 33 to 20 measurements/sec, or around 2-3 measurements/degree.

The lower turn rate significantly reduces the rate at which the ‘ping’ distance changes per unit time, making it much harder to detect the distance inflection point.  In effect, the lower turn rate flattens the ‘distance/degree’ slope, making inflection point detection more difficult.  At 20-30 measurements/degree and only a few cm change from max on one side to max on the other, there are a lot of identical measurements returned.

My initial cut at addressing the the above issue was to space the ping measurements further apart in time, thereby increasing the ‘distance/degree’ slope. After trying this (using a ‘elapsedMillisec’ variable) I realized that an equivalent method would be to simply increase the size of the inflection detection window (the number of times the ping measurement must be on the ‘other side’ of the inflection point in order to qualify as a valid inflection). After some experimentation, I arrived at a value of 20.

For some reason, it was much easier to find a good parallel heading value if the robot started out pointed toward the near wall. If it started out pointed away from the wall, the robot often stopped well short of or well after the actual parallel heading.  Eventually I developed a 4-turn process for this case to really nail down the parallel heading.  Here are some short videos demonstrating the algorithm.

Now that I can reliably determine the relative heading that orients the robot parallel to the nearest wall, I should be able to marry this capability with my already-developed algorithm for acquiring and maintaining a specific offset distance.

20 February 2020 Update:

So I combined the ‘find parallel heading’ feature with my already-existing angle-based tracking algorithm, and this worked fairly well.  Here’s a short video demonstrating the technique:

In the above video, the blue painter’s tape strips are marked every 10 cm, with a double-width mark at 30 cm (the desired offset distance). As the video shows, the robot first determines an approximate parallel heading, and from there starts the normal angle-based tracking algorithm.

Next, I tried an ‘enhancement’ to the above by having the robot move toward the wall on a 30-45 deg ‘cut’ from the parallel heading, and then turning back to parallel at the desired offset distance.  As the following video shows, this didn’t turn out so well.  If the robot doesn’t start out exactly parallel, then the ‘cut’ is either too steep or too shallow, resulting in a too-early or too-late turn back to the parallel heading.

So it looks like the ‘find parallel then start tracking’ approach works pretty well, but the ‘find parallel then drive to offset on a cut then back to parallel’ approach hasn’t been very successful.

27 February 2020 Update

After thinking about the difficulties I was encountering with my ‘FindParallel’ algorithm, I realized that the reason the robot was often overshooting the parallel orientation was due to small aberrations in ‘ping’ distance measurements that caused the ‘hit counter’ to reset to zero in the middle of an otherwise perfect arc of distance values.  The ‘hit counter’ is incremented each time the newest ‘ping’ distance measurement trends along the same line, and is reset to zero whenever the newest ping measurement breaks the trend. When the hit counter exceeds a preset level, the parallel condition is considered to be detected.  I thought I might be able to improve performance by making the algorithm a little more tolerant of such aberrations.  So, rather than having the ‘hit counter’ reset to zero, I changed the algorithm to decrement by a set amount rather than reset it to zero.  This markedly improved performance, as shown in the following videos.

There are four sections in the above video.  In the first clip, the robot starts out pointed away from the wall and outside the desired 30 cm offset. The ‘FindParallel’ algorithm executes, and then approaches and then tracks the wall at the desired 30 cm offset.  The next three clips show the same situation, except starting outside the 30 cm offset and pointed toward the wall, and then from inside the 30 cm offset, pointed away from and toward the wall.  In each case, the robot successfully acquires a reasonably parallel heading and then acquires and tracks the 30 cm offset distance.

Stay tuned!

Frank

 

Back to the future with Wall-E2. Wall-following Part VIII

Posted 25 January 2020

About 6 weeks ago I posted that I had finally killed the “intermittent MPU6050 failure” dragon, by belatedly following Pololu’s recommendations for installing bypass capacitors on their metal-geared motors.  Unfortunately it turned out that my celebration was cut short by more annoying intermittent MPU6050 failures, so I was once again forced back to the drawing board.

This time I decided that the only way to figure out what was going on was to actively examine the I2C traffic in real time, to determine who exactly was doing what to whom.  So, over the course of the six week period between my last declaration of victory to this one, I created a Teensy based I2C bus ‘sniffer’ and used it to figure out what was going on.  I was able to determine that the ‘master’ micro-controller continued to operate normally through a failure, but the MPU6050 didn’t. I was also able to determine that just resetting the IMU would not allow the system to recover, but resetting the micro-controller often did.   Moreover, I was able to definitively show that the problem was caused by ‘contact bounce’ on one or more of the four 6″ male-male jumper wires connecting the micro-controller to the IMU.  Eliminating these jumpers also (I hope) eliminated the last piece of the “I2C Intermittent failure” puzzle.

Looking back over the entire I2C failure saga, I now realize that this was the classic case of multiple failure modes complicating the troubleshooting effort.  The RFI/EMI problem caused by the Pololu metal-geared motors completely overshadowed the issue of non-secure jumper connections. Then, after finally coming to my senses and installing the recommended bypass capacitors on the motors, the ‘contact bounce’ problem was unmasked.  I do love interesting problems, but this one went past ‘interesting’ and was well into ‘agonizing’ by the time I got it solved ;-).

After getting everything set up, I ran some wall tracking tests in my entry hall ‘test range’ with pretty good results, as shown in the short video clip below.

Stay tuned,

Frank

30 January 2020 Update:

Still having trouble with the initial approach to a wall from outside the target distance. The robot still has a tendency to dive into the wall, unable to cope with the problem of the measured distance increasing instead of decreasing when the robot turns into the wall.  This inverse relationship makes it almost impossible to use a simple ‘turn toward the wall and wait for the distance to count down’ technique.

After thinking about this for while, I realized that this would all be so much simpler if I cheated and started with the robot placed parallel to the target wall.  Then the robot could simply turn 45 deg toward the wall and proceed until the measured wall distance was appropriate (Dtgt / 0.707), and then turn parallel again.  Then I realized that I could easily determine the parallel condition by turning the robot toward and/or away from the wall while continuously measuring the distance; when the measurement goes through a minimum, then the robot is parallel to the wall.  Simple in concept, and not all that hard to program, either.

 

IMU Motor Noise Troubleshooting, Part III

Posted 19 January 2020

In Part II of this saga, I described my continuing efforts to track down and fix the problem of intermittent failures associated with the MPU6050 IMU on my robot.  The MPU6050 IMU is required for the ability to make precise heading-based turns, which is in turn required to track walls at a designated stand-off distance.

This post summarizes the work to date and suggests new avenues of investigation for fully addressing the motor noise issue.

Summary of work to date:

  •  July 2019: First started working with heading-based turns, and first noticed the motor noise problem.  Basically the problem presented itself as frequent, abrupt, and wildly divergent heading readings when the motors were running, but perfectly stable readings when the motors are not running.  See this post for the details.
  • October 2019: Successfully demonstrated polling-based (vs interrupt-based) MPU6050 IMU management. This development meant that I could acquire yaw (heading) values on an as-needed basis rather than at a 20 or 200Hz rate, throwing away 99% of the results.  This was demonstrated in this post.
  • November 2019: Made another run at solving the motor noise problem using a home-brew optical isolator and  a 2-stage power filter.  After a LOT of work, I wound up discovering that most (but not all!) of the problem could be addressed with proper RF bypassing at the terminals of the metal-geared Pololu motors I was using.  See this post for the details.
  • Early December 2019:  Demonstrated heading-based wall offset tracking using my 2-motor robot, with RF bypassing installed on both Pololu metal gear motors.  No IMU failures were noticed during these runs.  See this post for details.
  • Early December 2019:  Reprised some of the motor driver testing performed back in May of 2019 (see this post), and again noticed MPU6050 IMU communication failures when the motors were running, but none when the motors weren’t running. This test was performed on the 2-motor robot using the Pololu motors with the RF bypassing in place. So clearly just the bypassing was not of and by itself sufficient to solve the problem; something else had to be going on.  See this post for the details.
  • Late December 2019 to mid-January 2020:  I decided I needed a tool to monitor the I2C bus traffic between the robot’s controller and the MPU6050 IMU – an I2C ‘sniffer’.  After some research, I found that the cheapest commercial sniffer cost about $330, and DIY sniffers were few and far between. I did, however, find a Teensy-based sniffer program by Kito, so I had a starting place.  After three major development stages, I had a Teensy 3.2 program that would reliably monitor I2C communications between an Arduino (Mega or Uno) master and a MPU6050 slave, using the polling approach developed earlier.  See this post, this post, and this post for the development details.

Current Effort:

With the above history in mind, I applied my new I2C sniffer tool to the Motor Noise Problem.  As usual, I started this using the simplest possible setup; an Arduino Mega acting as the I2C master running my polling based ‘MPU6050_MotorNoiseTest1’ program, and a Teensy 3.2 and a MPU6050 IMU module both mounted on a small plugboard, as shown below.

Arduino Mega I2C master, with Teensy I2C sniffer and MPU6050 module on a separate plugboard

I played around with this setup for a while, and captured at least one IMU communications failure with the sniffer active. The failure occurred when I was moving the plugboard around a bit to verify that the MPU6050 IMU heading values changed appropriately.  At some point I noticed the I2C monitor output had changed its character significantly, so I quickly stopped the sniffer program and opened the log file (see attached file below).

From the log I can see that things proceeded normally until 6012443 mSec ( 1.67 hours) and then changed to report that nothing was being received from register 0x72 (the FIFO count register). This continued until 6022224 mSec (9.8 seconds later) where it returned to what looks like normal operation.

So, my preliminary guess at what happened is the connection from the Mega to the Teensy/MPU6050 got dropped momentarily, and it took the Teensy a while to find another START sequence in the I2C data stream from the Mega, as the ‘2048’ number in “6017280: processed = 2048 elements in 3 mSec” means that the capture buffer overflowed before a START sequence was detected.  “At 6022240: processed = 1224 elements in 2 mSec” means that a normal Mega ‘burst’ was captured and operation returned to normal.

Since the Teensy I2C monitor is on the MPU6050 end of the male-male jumpers, It begins to look like the Mega was still doing fine, but the jumper connection burped on one end or the other.  More testing to follow.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

Download

 

Next, I moved the plugboard containing the Teensy I2C Sniffer and the MPU6050 module to my 2-motor robot, and used the existing Arduino Uno on the robot as the master, as shown below.

I loaded my MotorNoiseTest1 program on the Uno, and allowed it to run both motors at a steady rate, while monitoring the I2C traffic with the Teensy, and also monitoring the heading values being printed out by the Uno.  I started the program just before 1PM, and it was still running fine with no IMU errors at 10pm, more than 9 hours later!  The I2C sniffer log shows regular communication with the MPU6050, and the calculated yaw value based on the packet bytes received by the sniffer program matches the yaw value calculated in the Uno program. This is clear verification that the sniffer program will run ‘forever’, and that at least in this case, the two motor robot will also run ‘forever’ with no  yaw errors.

Based on my earlier experience with the captured I2C communications failure, I’m more inclined now to believe that motor vibration or other mechanical perturbation is causing a momentary I2C bus or power/ground lead disconnect.  More tests to follow:

21 January 2020 Update:

After the 10-hour run described above, I tried to induce some failures by fiddling with the I2C and power/ground jumper wires, and found that I could easily and reliably cause a failure by ‘flicking’ the wires with my finger or a pen.  After each failure, the built-in recovery routine of clearing the FIFO and resetting the DMP failed to restore communications.  However, manually resetting the UNO did allow the system to recover.

From the above, I believe it’s safe to say that the current male-male jumper connections between the UNO and the Teensy/IMU are unreliable, and are hopefully the only remaining failure mode.  I haven’t quite figured out how to replace the connections with something more reliable, but I’m working on it.  I moved the IMU module from the plugboard and plugged its I2C pins directly into the I2C sockets on the UNO.  Then I replaced the power & ground leads with a permanent twisted pair connection to the Wixel shield, as shown in the following photo.

MPU6050 plugged directly into Uno board, with pwr/gnd jumpers replaced with permanent twisted pair

Then I fired up the system and ran it for a while but was unable  to make it fail.  This is encouraging news to say the least.

Stay tuned,

Frank

Teensy I2C Sniffer for MPU6050 Part II

Posted 13 January 2020,

In my last post on this subject, I described my efforts to build an I2C bus sniffer using a Teensy 3.2 micro-controller.  This post describes my efforts to move from a fixed array containing a 928-byte snapshot of an I2C bus conversation between an Arduino Mega 2560 and a MPU6050 IMU to a live, repeated-burst setup.

As the source for I2C traffic for the MPU6050 IMU I am using my MPU6050_MotorNoiseTest1 Arduino project with no motors or sensors connected.  All the code does is ask the MPU6050 for a yaw value every 200 mSec (the value of NAV_UPDATE_INTERVAL_MSEC), as shown below:

The Teensy code to monitor the I2C bus traffic is shown below.  When I first started working with this project, I copied Kito’s I2C sniffer code, which used Teensy’s Timer1 interval timer set to produce interrupts every 1 uSec, and an ISR to capture the data.  This turned out to be hard to deal with, as I couldn’t add instrumentation code to the ISR without overrunning the 1 uSec interrupt period, leading to confusing results.  So, for this part of the project I disabled the Timer1 interrupt, and called the ISR directly from the loop() function.  As others have pointed out, the Arduino loop() function does a lot of housekeeping in the background, so for top performance it is best to never let loop() execute, by placing another infinite loop inside loop() or inside setup().  This is what I did with the code designed to investigate whether or not the Teensy could keep up with an I2C bus running at 100Kbs.

The ‘capture_data()’ function (no longer used as an ISR) captures SCL & SDA states with a single port operation as shown

and then everything from a START pair (0xC followed by 0x4) to a STOP pair (0X4 followed by 0xC) inclusive is captured in the raw_data array.

Any I2C Sniffer project like this one assumes that I2C activity occurs in short bursts with fairly long pauses in between.  This is certainly the case with my robot project, as yaw data is only acquired every 200 mSec.  However, there is still the problem of determining when a I2C ‘burst’ has finished so the sniffer program can decode and print the results from the last burst.  In my investigation, it became clear that at the end of the burst both the SDA line goes HIGH and stays that way until the next START condition (a 0XC followed by a 0X4).  So then the question becomes “how many 0XC/0XC pairs do I have to wait before determining that the last burst is over?”

In order to answer this question I decided to use my trusty Tektronix 2236 O’Scope and Teeny’s ‘digitalReadFast’ and ‘digitalWriteFast’ functions to implement a hardware-based timing capability using Teensy pins 0,1, and 2 (MONITOR_OUT1, 2 & 3 respectively).  Among other things, this allowed me to definitively determine that a ‘idle’ (0XC/0XC) count of 1000 was too small, but an idle count of 2500 was plenty, without consuming too much of the available processing time.  It also turned out that ‘idle’ counts all the way up to 30,000 work too, but leave less time for processing.

O’Scope shot showing I2C traffic on the bottom trace, and the point at which 2500 0xC/0xC (Idle) pairs is reached on the top trace (the high-to-low transition)

As can be seen in the above photo, the I2C ‘sentence’ lasts about 15 mSec, and the ‘idle’ condition is detected about 5 mSec later for a total of about 20 mSec out of the nominal 200 mSec cycle time for my robot application. This leaves about 190 mSec for I2C sentence processing and display.

18 January 2020 Update:

Success!!  I now have a working Teensy 3.2 I2C Sniffer program that can continuously monitor the I2C traffic between my Arduino Mega test program acting as a I2C master and a MPU6050 IMU I2C slave.   The Teensy code is available on my GitHub account here.

A major challenge in creating the sniffer program was the requirement to sample the I2C SCL & SDA lines quickly enough to accurately detect the line transitions denoting all the different I2C signals.  With the I2C bus running at 100Kbs, SCL (clock) transitions occur every 5 uSec. Good sampling requires at least 2 and preferably more samples per SCL state.  As noted above, I started off by copying the ISR routine from Kito’s I2C sniffer, but discovered I needed to add some logic to zero in on the desired I2C bus states (IDLE, START, DATA & STOP), and the additional code made the ISR take more than the desired 1 uSec window.  After posting about this problem to Paul Stoffregen’s Teensy forum, I got some good pointers for speedup, incuding a post that mentioned the Teensy FASTRUN macro that runs functions from RAM rather than FLASH. As it turned out, adding this macro to the program allowed me to reduce the ISR cycle time from about 1.4 uSec to about .89 uSec – yay!  The final ISR routine is shown below:

Note the use of digitalWriteFast() calls to output timing pulses on Teensy hardware pins so I could use my trusty Tek 2236 100 MHz O’scope to verify proper timing.

Once I got the ISR running properly, then I focused on getting the data parsing algorithm integrated into the program.  I had previously shown that I could correctly parse simulated I2C traffic, so all the current challenge was to integrate the algorithm in a way that allowed continuous capture-decode-print cycles at at rate that could keep up with the desired 5 measurements/sec rate.  So, I instrumented the sniffer program to display the decoded IMU traffic, along with the calculated yaw value and the time required to perform the decode.

Here’s a short section of the printout from the test program, showing the time (in minutes), the yaw (relative heading ) value retrieved from the IMU, and left/right ping distances (unused in this application).

And here is the corresponding output from the I2C sniffer program

In the above printout, each printout shows the individual transmit & receive ‘sentences’ to/from the IMU, and the 28-byte packet received from the IMU containing, among other things, the values required to calculate a yaw (relative heading value).  As can be seen, the yaw value calculated from the received bytes, closely matches the yaw values retrieved using the test program.  In addition the last line of each section of the readout shows the time tag for the start of the decode process, and the total time taken to decode all the bytes in that particular burst.  From the data, it is clear that only 1-2 mSec is required to decode and display a full burst.

The complete I2C Sniffer program is available on my GitHub site here.  The complete test program that obtains a yaw value from the IMU every 200 mSec is shown below:

The above program was intended to help me troubleshoot the intermittent MPU6050 connection failures I have been experiencing for some time now.  The purpose of the new I2C sniffer project is to create a tool to log the actual I2C traffic between this  program and the IMU. The idea is that when a failure occurs, I can look back through the sniffer log to see what happened; did the Arduino Mega stop transmitting requests, or did the IMU simply stop responding, or something else entirely.

 

 

Teensy I2C Sniffer for MPU6050

Posted 02 January 2020

On my last post on the I2C subject, I described an Excel VBA program to parse through a 928-byte array containing the captured I2C conversation between an Arduino Mega and a MPU6050 IMU module. The Arduino was running a very simple test program that repeatedly asked the MPU6050 to report the number of bytes available in its FIFO.  Then I used Kito’s I2C sniffer code to capture the SDA/SCL transitions, which I then copy/pasted into Excel.

This post describes the next step, which was to port the Excel VBA code into a Teensy sketch using the Arduino version of C++, moving toward the ultimate goal of a Teensy based, fast I2C sniffer that can be used to monitor and log long-term (hours to days) I2C bus conversations to determine what is causing the intermittent hangups I’m seeing with my Arduino Mega/MPU-6050 robot project.

The code port took a while, but mostly because of my own lack of understanding about the details of the I2C protocol and the specifics of the communication between the Arduino test program and my MPU6050 IMU module.  After working through these problems, the end result was surprisingly compact – less than 500 lines, including 50 lines of test data, LOTS of comments and debugging printouts, as shown below:

When I ran this program against the captured data, I got the following output:

which, when compared against the debug printout from Jeff Rowberg’s I2CDev program,

shows that my Teensy program correctly decoded the entire test dataset.

The next step in the process is to modify the above program to allow long-term real-time monitoring and logging of a live I2C bus. By ‘long-term’, I mean hours if not days, as the object of the exercise is to figure out why the I2C bus connection to the MPU6050 on my two-motor robot intermittently fails when the motors are running.  A failure can occur within minutes, or only after several hours, and there doesn’t seem to be any rhyme nor reason, except that the motors have to be running.

In normal operation, my two-motor robot obtains a heading value from the MPU6050 once every 200 mSec or so.  This I2C bus activity might comprise only 100 SCL/SDA transitions or so, and no other I2C bus activity takes place in the times between heading value requests.  So, there will be few mSec of burst activity, followed by a 150-190 mSec idle period.   To monitor and log in real time, I need some sort of FIFO arrangement, where the I2C transition data can be saved into the FIFO during the burst, and then processed and saved into a log file during the idle period.

While I was searching the web for I2C sniffer code, I also ran across this thread by tonton81 describing his template based circular buffer library for the Teensy line.  The thread started a little over two years ago, but has been quite active during that period, and tonton81 has made several bugfixes, updates, and enhancements to his code.  This might be just the thing for my project.

06 January 2020 Update:

After integrating tonton81’s circular buffer library into the project (thanks tonton81 for being so responsive with bugfixes!), I was able to demonstrate that the circular buffer version, when run against the same 928-byte simulated dataset, produced the same output as before, as shown below:

From the output above, it is clear that the Teensy can parse and print a typical 1000-byte burst in just a few mSec (3 in the above run), so it should have no problem keeping up with a 200 mSec data burst interval, and should be able to keep up with burst intervals down to around 10 mSec (100 bursts/sec!)  I suspect that the Teensy’s major problem will be not dying of boredom waiting for the next burst to process!

Here’s the full code (not including circular_buffer.h):

The next step in the project will be to modify the code (hopefully for the last time) to capture and process live I2C traffic bursts in real time.

I modified the interval processing code in loop() to reset the stop/restart Timer1 & clear FIFO each time the interval block is executed, and then reduced the processing interval from 200 to 50 mSec, to produce the following output:

The modified code:

and a short segment of the output:

Note that the processing block is indeed called every 50 mSec, and takes only a few mSec to complete.  The following is an O’scope image showing multiple 50 mSec periods.  As can be seen on the image, there is still a LOT of dead time between 928-byte bursts.

Top trace toggles at 50 mSec intervals to simulate periodic IMU data retrieval. Bottom trace shows IMU/MCU I2C communication bursts.

Stay tuned!

Frank

 

 

I2C Bus Sniffing with Excel VBA

In my never-ending quest to figure out why my I2C connection to an MPU6050 dies intermittently, I decided to try and record the I2C bus conversation to see if I can determine if it is the MPU6050 or the microcontroller goes tits-up on me.

Of course, this adventure turned out to be a LOT more complicated than I thought – I mean, how hard could it be to find and run one of the many (or so I thought) I2C sniffer setups out there in the i-verse?  Well, after a fair bit of Googling and forum searches, I found that there just aren’t any good I2C sniffer programs out there, or at least nothing that I could find.

I did run across one promising program; it’s a Teensy 3.2 sniffer program written by ‘Kito’ and posted on the PJRC Teensy forum in this post.  I also found this program written for the Arduino Mega.  So, I created a small Arduino Mega test program connected to a MPU6050 using Jeff Rowberg’s I2CDev library.

This program sets up the connection to the MPU6050 and then once every 200 mSec tests the I2C connection, resets the FIFO, and then repeatedly checks the FIFO count to verify that the MPU6050 is actually doing something.

When I ran Kito’s I2C sniffer program on a Teensy 3.2 (taking care to switch the SCL & SDA lines as Kito’s code has it backwards), I get the following output

which isn’t very useful, when compared to the debug output from Jeff Rowberg’s I2CDev program, as follows:

As can be seen from Jeff’s output, there is a LOT of data being missed by Kito’s program. It gets the initial sequence right (S,Addr=0x68,W,N,P), but skips the 8-bit data sequence after the ‘W’, and mis-detects the following RESTART as a STOP.  The next sequence (S,Addr=0x68,R,N,P) is correct as far as the initial address is concerned, but again omits the 8-bit data value after the ‘R’ direction modifier.

Notwithstanding its problems, Kito’s program, along with this I2C bus specifications document  did teach me a LOT about the I2C protocol and how to parse it effectively. In addition, Kito’s program showed me how to use direct port bus reads to bypass the overhead associated with ‘digitalRead()’ calls – nice!

I got lost pretty quickly trying to understand Kito’s programming logic, so I decided I would do what any good researcher does when trying to understand a complex situation – CHEAT!!  I modified Kito’s program to simply capture the I2C bus transitions associated with my little test program into a 1024 byte buffer, then stop and print the contents of the buffer out to the serial port.  Then I copy/pasted this output into an Excel spreadsheet and wrote a VBA script to parse through the output, line-by-line. By doing it this way, I could easily examine the result of each script change, and step through the script one line at a time, watching the parsing machinery run.

Here’s a partial output from the data capture program:

So then I copy/pasted this into Excel and wrote the following VBA script to parse the data:

The above script assumes the data is in column A, starting at A1. A partial output from the program is shown below, showing the first few sequences

The above output corresponds to this line in the debug output from Jeff Rowberg’s I2Cdev code:

So, the VBA program is parsing OK-ish, but is missing big chunks, and there are some weird 1 and 2 bit sequences floating around too.

After some more research, I finally figured out that part of the problem is that the I2C protocol allows a slave device to pull the SCL line low unilaterally to temporarily suspend transmissions until the slave device catches up.  This causes ‘NOP’ sequences to appear more or less randomly in the data stream.  So, I again modified Kito’s program to first capture a 1024 byte data sample, and then parse through the sample, eliminating any NOP sequences. The result is a ‘clean’ data sample.  Here’s the modified Kito program

and a partial output from the run:

After processing all 1024 transition codes, 96 invalid transitions were removed, resulting in 928 valid I2C transitions.

When this data was copy/pasted into my Excel VBA program, it was able to correctly parse the entire sample correctly, as shown below:

This corresponds to the following lines from Jeff’s program:

Although the VBA code correctly parsed all the data and missed nothing, there is still a small ‘fly in the ointment’; there is still an extra ‘0’ bit after every transmission sequence.  Instead of

we  have

with an extra ‘0’ between the ACK/NAK and the RESTART.  This appears in every transmission sequence, so it must be a real part of the I2C protocol, but I haven’t yet found an explanation for it.

In any case, it is clear that the Excel VBA program is correctly parsing the captured sequence, so I should now be able to port it into C++ code for my Teensy I2C sniffer.

Stay tuned!

Frank

 

 

 

 

 

 

 

 

Wall-E2 Motor Controller Study Part II

Posted 07 December 2019 (Pearl Harbor Day)

Back in May of this year I posted about different motor driver possibilities for my 2-motor and 4-motor robots, and showed some results for two drivers (an Adafruit Featherwing and an Adafruit DRV8871).  However, I got kind of sidetracked after this post when I discovered the RFI/EMI problem with the MPU6050 IMU.  At the time, I blamed the RFI/EMI problem on the non-linear nature of the newer drivers, and went back to using the L298N linear driver.

After quite a bit of experimentation and work, it finally turned out that most (if not all) the RFI/EMI problem was the Pololu 20D metal-geared motors themselves, and properly suppressing the noise at the motor terminals with bypass capacitors solved the problem.  So I have decided to repeat some of my initial motor driver testing.

Adafruit DRV8871 Single Channel Motor Driver Testing:

I removed the L298N driver I have been using up until now on my 2-motor robot, and replaced it with two DRV8871 Single Channel Motor Drivers. The hookup with the DRV8871’s is actually significantly simpler than the L298N, requiring only two control lines per channel instead of three.  After the normal number of errors, I got it running and started some long-term tests with the motors running continuously (with varying speeds and directions) while polling the MPU6050 every 200 mSec for yaw data.

This test ran for over 5 hours without problems, and then the GY-521 (generic MPU6050) module stopped responding to requests for data.  This is the second MPU6050 module that  has behaved this way – running for an indefinite amount of time and then refusing to respond.

The good news is, the DRV8871 motor drivers worked flawlessly the entire time, and are efficient enough so the T0-3 size chip container was just barely warm to the touch.

I have run  several long-term tests now with the DRV8871 drivers and two different MPU6050 modules, and the longest error-free run has been about 5 hours as shown above.  However, I have also had the test terminate in less than 5 minutes, so this is clearly not reliable enough for prime time. Also, adding my 2-stage power supply filter back into the system did not seem to effectively suppress errors, so no I have no clue what is going on.

11 December 2019 Update:

I ran the system overnight with the same test program, except I commented out the motor run commands so the motors themselves did not run. The test ran for over 11 hours and was still running fine when I terminated it.  So, the motors definitely have to be running for the problem to occur.

15 December 2019 Update:

Hoping to maybe eliminate a variable, I changed from the UNO controller to a Teensy 3.2. As usual, this ‘small change’ took a LOT more time than I thought it would.

There aren’t any good example programs for interfacing a Teensy 3.x with the MPU6050, especially using the I2CDEV/MPU6050/DMP libraries. I had a huge problem trying to track down compile issues with the I2CDev libraries, but in the end it came down to figuring out a way to get I2CDev to use i2c_t3.h instead of Wire.h. I solved this by copying I2CDev.h/cpp from my Libraries folder to my local project/solution folder, and editing the code to define the I2C implementation as I2CDEV_ARDUINO_WIRE and then replacing “#include <Wire.h>” with “#include <i2c_t3.h> in the appropriate section as shown below. I’m sure there are more elegant ways of doing this (maybe adding a ‘I2CDEV_TEENSY_3.X’ section at the top?)

After making the above change, the project started compiling OK, and I was able to connect to the MPU6050 and pull off yaw values using the DMP.

The next problem occurred when I tried to run a test program with the Adafruit DRV8871 drivers.  The two control lines alternate between being used as digital outputs (for direction control) and analog/PWM outputs (for speed control). Unfortunately, once a Teensy 3.2 line has been written to with ‘analogWrite()’, it can no longer be used as a digital output without first running ‘pinMode([pin],OUTPUT)’ on it. This particular little ‘gotcha’ appears to have been there since around 2016, but got lost in the shuffle as it is still there in the latest TeensyDuino libraries.

After fixing that problem, I was successful in both getting yaw values from the MPU6050 using Jeff’s I2CDev libraries, and in driving my Pololu D20 metal-geared motors via the Adafruit DRV8871 motor driver modules.

After getting everything working, I took the time to fork Jeff Rowberg’s I2CDev library, edit I2CDev.h/cpp appropriately and create a pull request for Jeff.  The changes were merged into the Github master distro pretty quickly, so in theory at least, all a new Teensy 3.2 user has to do is grab Jeff’s I2CDevLib stuff and go.

Here’s my Teensy 3.2 program for testing the MPU6050/DMP and the Adafruit DRV8871 motor drivers:

Back to the future with Wall-E2. Wall-following Part VII

Posted 06 December 2019

Back in August of this year (4 months ago!) I demonstrated successful wall tracking at an arbitrary offset distance using heading information from a MPU6050 IMU using my little 2-motor robot.   At the time (silly me) I thought the next step was to integrate this new capability back into my larger 4-motor robot and let it loose into the ‘wild’ (my house).  Unfortunately the EMI/RFI problem that I thought I had solved reared its ugly head again, and sent all my beautiful plans right into the crapper :-(.

So, I have spent the last four months once again tracking down and eliminating the issue (really for sure this time – honest!).  As it turned out, the solution was pretty simple, and one that Pololu, the supplier of the metal-geared motors I am using had already described in some detail in this post.  Along the way I learned a lot, and had a lot of fun, but to quote Abraham Lincoln “I feel like the man who was tarred and feathered and ridden out of town on a rail. To the man who asked him how he liked it, he said: ‘If it wasn’t for the honor of the thing, I’d rather walk.” 

In any case, the MPU6050 IMU on my little 2-motor robot with metal-geared motors is now happily churning out yaw values on a polled basis (no interrupt required, thank you!), and I’m back in business with angle-enhanced wall tracking.  Once I got everything working again, I ran some tests on my ‘local field test site’ (AKA my office) to verify that the algorithm still worked and I was still getting good tracking behavior.  I have included some Excel plots and a short video showing the results.

 

Stay tuned,

Frank

 

IMU Motor Noise Troubleshooting, Part II

Posted 13 November 2019,

After a month-long diversion to help Homer Creutz investigate some of the many issues associated with the Invensense MPU6050 IMU chip, I’m now back to figuring out how to keep my robots’ motor & driver noise from corrupting the yaw values from my MPU6050 IMU module.  In my last post on this issue, I worked through a number of theories for what might be causing the problem, and eventually decided the issue was more associated with the motor drivers than with the motors themselves.  At the conclusion of this prior study, I was able to demonstrate effective turn control using MPU6050 yaw values, even with the motors running.

However, I have since determined that I basically had gotten lucky; the problem still persists, even after changing back from switch-mode to linear motor drivers.  After a lot more work, I am now convinced that the basic problem is very high frequency voltage spikes on the order of 2-3 V p-p being conducted onto the Mega 2560 microcontroller board via the motor control connections from the Mega to the motor driver.  The result is intermittent and erratic behavior from the Mega and/or the MPU6050 module.

As a potential solution to the problem, I spent some time investigating whether or not I could use a ESP32 module to replace the Mega and it’s accompanying wireless module  (HC-05 BT module on the 2-motor robot, and a Wixel wireless serial extender on the 4-motor model). The idea was that since the ESP32 module is much smaller and has an integrated/shielded wireless module, it might be more immune to the conducted noise issue.  As it turned out though, getting the ESP32 to work the way I needed it to was next to impossible, and so I abandoned ship after a few weeks.

Anyway, now I’m back to working the motor noise problem again.  When I left the problem the last time, I thought that one possible solution to the noise conduction problem would be to use an optical isolator module such as this one, to isolate the motor power circuits from the Mega circuits.

So, I set up an 6-channel optical isolation bank between the Mega 2560 controller and the 2-motor robot’s motor controller, as shown below:

6-channel opto-isolator setup with 2-motor robot and Mega 2560 controller

However, when I tried this trick, I still got lots of high-frequency transients on the control and power lines, and the computed yaw values from the MPU6050 soon became invalid.  After poking around a bit with a scope, I realized that while this cheap 4-channel optical isolator is great for voltage difference isolation and low frequency isolation, the high-frequency stuff I was seeing was going around the optical isolation and capacitively coupling from the output side back to the controller side – bummer!

After getting back on the project, the first thing I did was to refresh my memory on a prior project to test a Sparkfun MPU9250 (basically the same as the MPU6050 but with a magnetometer included) interfaced to a Teensy 3.2 microcontroller.  The Teensy is much faster and physically much smaller than the Mega controller currently on my robots, so I thought it might be less susceptible to motor noise – worth a shot anyway.  So, I got the Teensy/MPU9250 configuration working again, and then did the absolute minimum to get the Teensy to drive just one motor on my 2-motor robot.  Surprisingly, the Teensy/MPU9250 combination went Tango Uniform as soon as the motor started up – wow!

So, now I wasn’t any closer to solving the problem, but I did have some additional information.  Now I knew that the problem wasn’t unique to the Mega 2560/MPU6050 combination – it also happened in the same manner with the Teensy 3.2/MPU9250. This indicated to me that the issue really was high-frequency noise spikes being conducted down the motor control wires and back into the microcontroller circuits.

So, I needed a way to effectively block these transients from reaching the microcontroller.  As noted above, I tried the cheap eBay optical isolator module, but although it clearly isolated the DC circuits, the high-frequency transients still made it through to the controller board. I needed an optical isolator setup with an ‘air gap’ big enough so that there would be no chance for the transient energy to jump the gap.  And, because to a man with a 3D printer, everything looks like a 3D fabrication problem, I figured I could whip something up using IR LED/IR phototransistor pairs, something like the following:

This model is just a small solid piece of plastic with 4 holes, sized so that a 3-mm LED / phototransistor will slide in, with a step to stop if from sliding all the way through.  There is no metal at all, other than the LED/phototransistor leads, so (hopefully) no conductive or capacitive path.

After the required three or four quick and dirty model iterations, I had a model that I could use for a very basic experiment.  In fact, I realized my little 4-hole model was way overkill, as all I really needed was one channel for the motor speed PWM signal – the other two inputs to the motor controller could be tied HIGH or LOW as necessary on the motor controller side for purposes of this experiment.

The setup is shown below:

Teensy 3.2 controlling one robot motor via homebrew optical isolator

Closeup comparing the 3D-printed optical isolator with the commercial 4-channel module

After getting the 1-channel optical isolator working, I used it to control one of the two motors on the 2-motor robot, and found that I could run this setup indefinitely with no data corruption – yay!  Here’s an Excel plot of a 35+ minute run with the robot motor running.  As can be seen from the plot, the MPU9250 responded properly the entire time. The five ‘spikes’ are caused by occasional manual sensor rotation to confirm proper operation

35-minute run showing reliable yaw data acquisition. The three ‘spikes’ are caused by manual sensor rotation to confirm proper operation

For the next step, I plan to expand my homebrew opto-isolator to 2 channels so I can control both robot motors.  If that is successful, I’ll add 4 more channels so I can control both motors completely.  If this all works, then I’ll need two complete 6-channel opto-isolators to control all 4 motors on the 4-motor robot.

15 November 2019 Update:

I added two more optical isolator channels so I could control both the speed and the direction of one motor on my two-motor robot.  Then I modified my Teensy 3.2 test program to run the motor at increasing and then decreasing speeds, both forward and backward, forever.  The motor speed increments and direction changes occur in the same section of code that acquires a new yaw value, every 200 mSec or so.  Here’s the setup

This setup puts about 30 mm separation between the two circuits, with the only connection being via my homebrew optoisolator.

I ran the system for over two hours with the motor running in both directions and with the speed varied across the entire range.  The motor, the program, and the yaw retrieval process worked perfectly the entire time, as shown in the following Excel plot and short video clip.

 

This clearly demonstrates that my homebrew optical isolator works as it should, and effectively isolates motor/driver related transients from the Teensy board.  What it doesn’t do is demonstrate the same capability with the Teensy connected to the same power supply; right now the Teensy is powered from my laptop via USB, and the motor is powered from the robot battery supply.  The next step will be to figure out how to suppress these transients to the point where both the Teensy and the motor/driver can live on the same circuit.

16 November 2019 Update

Success!  I now have the Teensy controller and the Robot running from the same power supply – yay!!  I accomplished this by constructing a two-stage RC filter between the 8.5V battery supply voltage and the 4-6V Teensy power input, as shown below, and modifying the USB cable to cut the red power wire.

Two-stage power supply filter for Teensy

2-stage power supply filter

With this setup, I ran for almost 30 minutes with no problems, as shown in the following Excel plot (the ‘glitches’ at the start and finish are manual sensor rotations to confirm valid data retrieval.

 

Stay Tuned!

17 November 2019 Update

According to this link, Thomas Huxley once said “The great tragedy of science—the slaying of a beautiful hypothesis by an ugly fact.”  Well, my ‘beautiful hypothesis’ has been well and truly slain by an ugly fact! As a last experiment to verify my hypothesis about the need for both an optical isolator and a power supply filter to adequately address the two-wheel robot’s motor noise transients, I bypassed the optical isolator as shown in the following photo, and ran the setup for several minutes.  Unfortunately (or fortunately depending on one’s point of view), the setup did not cooperate with my hypothesis – the setup ran fine, with no missed yaw value acquisitions, as shown in the following Excel plot (the ‘glitches’ in the plot are due to me manually rotating the sensor to verify proper operation).  So, apparently the power supply filter is required, but the optical isolator is not – Yay! (I think).

 

 

 

So then I removed the Female-Female jumpers and wired the motor control lines directly to the Teensy – no problem.  Then I added the three control lines for the second motor, and changed the Teensy program to control both motors.  This ran fine, so it is now clear that all that is required for motor transient suppression (at least for the Teensy setup) is a good power supply filter.  It’s even possible that my 2-stage power supply filter circuit is overkill for the application, and one stage would do fine.  To test this theory, I eliminated the first stage of the filter entirely, and tried again.  This works as well, so it looks like only a single-stage power supply filter is required for reliable operation, at least with the two motor robot and the Teensy microcontroller.  However, one fly in the ointment is that eliminating the first stage causes the power dissipation of the 5V Zener diode to increase to well over its nominal max Pd of 1W.  With a 10-ohm series resistor dropping 3.5V, the current through the zener is 3.5/10 = 0.25A (minus the negligible Teensy current), so Pd = 0.25 * 5 = 2.5W – oops!  I need to increase the series resistor by a factor of 2.5 to get Pd down to 1W.  So, I increased the series resistor to 20 ohms and with the actual power supply voltage of 8V, the zener Pd falls well below 1W.

For a final confirmation test, I eliminated the power supply filter entirely, bringing the setup back to to the original baseline, with the Teensy controlling both robot motors with no power supply filtering or optical isolator.  As expected, this caused the Teensy to lose synch with the IMU within a few minutes.

19 November 2019 Update:

After testing with the Teensy/MPU9250 combination, I decided to try and go back to the Arduino series of microcontrollers.  The reasons for doing this are:

  • The Teensy doesn’t have sufficient I/O channels to control all the required peripherals on the 4-wheel robot.  I does have enough I/O to control the 2-wheel robot, but that’s not what I’m after.
  • Remote programming of the Teensy is not possible without the use of another controller, like a Raspberry Pi Zero W or something like that.  However, this is easy to accomplish with the Arduino UNO or Mega 2560.

So, I started with an Arduino UNO that has been modified by removing T1 so board power is isolated from the USB cable power lead, and hooked up the motor control lines directly to the 2-wheel robot.  This arrangement failed after a few minutes, with or without power supply filtering.

So then I re-introduced my homebrew optical isolator, and found that the UNO will run indefinitely while running one of the two motors via the optical isolator and the power supply filter.  So at this point a reasonable hypothesis is that the Teensy is a bit more robust with respect to EMI/RFI effects than the UNO/Mega controllers, but the combination of the two-stage power filter and my homebrew optical isolator effectively suppresses motor and motor driver EMI/RFI (at least for one motor).

30 November 2019 Update:

I now have my two-motor robot running reliably with an Arduino UNO running my DF Robots MPU6050 module, with both motors running. As far as I have been able to determine, this requires both a power supply filter between Vbatt and the UNO Vin, and a 6-channel optoisolator between the UNO and the motor driver module.  Some photos and schematics are shown below:

6-channel homebrew optoisolator and power filter mounted on an Adafruit ‘Perma-Proto’ half-size breadboard

Optoisolator mounted on two-motor robot. Note DF Robots MPU6050 module in top left of photo

6-channel optoisolator schematic

 

03 December 2019 Update:  The Final Chapter (I hope)

After going through the entire process described above, proving to my satisfaction that the ‘final cure’ to the motor noise problem was to opto-isolate the motor driver control signals and thoroughly filter the power to the UNO, I was once again rudely smacked in the face by reality when my finely tuned setup refused to cooperate. As I prepared to start ‘field’ testing again, the yaw value corruption problem once again reared its ugly head.  This was beyond depressing – I have now spent upwards of 5 months trying to kill this particular alligator, and it keeps coming back.

Based on my philosophy that if I’m having an apparently insoluble problem, there’s someone (possibly many someones) out there in the i-verse that has had (and solved!) that particular problem already, I started over on internet research, googling for ‘motor noise problem’ and similar terms.  One of the hits I got was for an application note on the Pololu site dealing with just this problem (don’t know why I didn’t see it before, but…). In any case, Pololu’s solution was to install one or three bypass caps on the motor body itself.  Since I had already tried everything else, I thought ‘what the heck -it can’t get any worse than it is already’, and installed the three-capacitor arrangement, using non-polar 1 uF caps from each terminal to the motor body, and a 0.01 uF cap across the terminals.

Pololu D20 metal-geared motor with bypass caps installed. Blue caps are 1uF, orange cap is 0.01uF

Lo and Behold! It worked!  I was able to run for 90 minutes without a problem with all three elements in place; the motor bypass capacitors, the power supply filter, and the 6-channel optoisolator.  So, the next step was to bypass the optoisolator and see if that was a necessary component – and it continued to work without problems – yay!  Next, I bypassed the power supply filter circuit, and everything STILL continued to work great – double yay!

Two-motor robot with power supply filter and optoisolator bypassed

So, I wound up back at the beginning of my five-month circular journey, having learned a lot and had a lot of fun, but having wasted half a year.  So, if you are using metal-geared motors like the Pololu D20/D25 models, for Pete’s sake install bypass capacitors before doing anything else!

Stay tuned,

Frank

 

MPU6050 FIFO Buffer Management Study

Posted 13 October 2019

I have been attempting to use the Invensense MPU6050 6-axis IMU for some time now in both my two and four-wheel robots for improved wall-following ability. By measuring the relative heading change during turns, I could get the robot to accurately acquire and maintain a specified distance from the currently tracked wall.  I say ‘attempting’, as I have experienced somewhat mixed results in getting reliable results from the IMU.  A large part of the problem, as I described in this post, wasn’t the IMU at all, but rather the sensitivity of the Arduino I2C bus to RFI/EMI caused by the motor drivers.  However, even after solving this problem, my programs would still occasionally ‘lose synch’ with the IMU’s FIFO buffer and start returning garbage for heading values – not good!

In addition to the I2C issue, there are several factors that make the MPU6050 harder do work with:

  • Invensense, the company that makes the MPU6050 chip, appears to have been purchased by TDK, and their technical support forums don’t appear to be supported any longer.
  • Apparently there are significant pieces of the MPU6050 firmware that aren’t freely available as human-readable code, so much of the MPU6050 magic is just that – magic.  In particular, the details of how the MPU6050 handles its internal FIFO are (at least to me) completely unknown, except by reverse-engineering.
  • While there is a lot of MPU6050-related information and code floating around out there in the i-verse, that is as much a trial as a blessing; it has been very difficult for me to wade through everything and to try to sort out the wheat from the chaff.
  • Jeff Rowberg’s wonderful I2CDevLib contains support for the MPU6050, with examples.  While it is fairly easy to get started using Jeff’s examples, it was difficult for me to understand how the examples work so I would know how to modify them for my application without running off into the bushes.
  • Almost all the example code out there assumes an interrupt based IMU management scheme.  For my wall-following robot application, the interrupt scheme was overkill and then some, so I wanted to use a polling arrangement, which is very poorly documented. Eventually I developed a working polling algorithm (described here) , which I now use in my robots.
  • Invensense (now TDK) has released several updates to the MPU6050 firmware, and it is difficult (at least for me) to figure out what the differences are and whether or not those differences are worthwhile for my application.  There is some information in the header files provided by Jeff Rowberg, but if there is any sort of formal change history, I haven’t found it.

Despite all this, the MPU6050 is a wonderful device, and it’s EVERYWHERE – you can get MPU6050 modules from Adafruit, Sparkfun, DFRobots, and GY-521 Chinese knockoffs from eBay.  The GY-521 modules have some reputed issues with quality control, but at about $1-2 per module, it’s hard to go wrong.

In my continued attempts to understand how the MPU6050 works, and the details of some of the latest example code provided on Jeff Rowberg’s Github site, I posted an issue related to a particular part of the example code that defied my ability to understand, as shown below:

This code obviously works, but I get a headache everytime I try to figure out how it makes sense.  One of the replies I got from this post was from Homer Creutz, who I knew to be one of the very best experts on all things MPU6050.  The gist of Homer’s reply was “yeah, it looks kinda clumsy, but it does work”. But then Homer went on to suggest an alternative using the modulus (%) operator that piqued my interest, as I had used this technique in my four-wheel robot code.  In subsequent email conversations Homer went WAY out of his way to thoroughly answer my stupid questions about the MPU6050, especially about the issue with FIFO overflow.  He sent me a link to this video explaining how circular buffers work, and the following graphic illustration (slightly edited for clarity) of the problem of MPU6050 overflow and multiple-byte packets

The combination of the video and the above simple graphic finally allowed me to understand why properly managing FIFO overflow is so critical for successful MPU6050 implementations. Ironically, FIFO overflow management is more of an issue in my low-speed, high mass, low-dynamics environment than at the other end of the scale.  In high-dynamics applications, FIFO overflow is almost never an issue because the application sucks data out of the FIFO as fast as it is put in, in order to provide the best possible stabilization and control.  However, in low-dynamics applications like my wall-following robots, there is no need for IMU information more than a few times per second, meaning that the FIFO will almost certainly overflow if it isn’t proactively drained even if the information isn’t really needed.

Homer also sent me a couple of untested alternatives for FIFO management to replace the ‘while() within an if()’ algorithm, so I decided to test them and report the results back to Homer. After all, Homer was going WAY out of his way to answer my ignorant questions, so the least I could do was to be his lab tech.  So, I started with Jeff Rowberg’s MPU6050_DMP6_using_DMP_V6.12 example (the one with the ‘while() within an if()’ snippet) and modified it to deliberately cause FIFO overflows. After a bit of debugging and some very slight changes to Homer’s code, I got the following output from a run using a 100 mSec delay at the start of loop():

As can be seen from the above, about 20 interrupt cycles are skipped in each loop() iteration, causing the 1024-byte FIFO contents to expand by either 280 or 252 bytes each time, until it overflowed and was reset.  The example code handled FIFO overflows properly resetting the FIFO each time so that the retrieved yaw values continued to make sense.

The next step was to replace the example code with Homer’s proposed setup using the modulus operator for FIFO management. The first section below shows the example code loop() function before Homer’s modifications:

And the following shows the same loop() function after implementing Homer’s suggested code:

This resulted in the following output:

Showing that FIFO overflow was indeed handled properly.  The FIFO overflowed after the 3rd time through the loop, (the returned count was capped at the 1024-byte physical length of the FIFO), and Homer’s code correctly removed the 16 corrupted bytes at the beginning of the FIFO, plus one more 28-byte packet.  The subsequent mpu.getFIFOBytes() call retrieved an entire valid packet, which was then process to produce a valid yaw value.  Of course, since only one extra packet was removed, the FIFO overflowed again when the next 336 bytes were loaded during the 100 mSec delay at the start of the next loop() iteration.

When the code snippet to retrieve all available packets

was uncommented from the above program, I got an almost perfect output as shown below:

There were 4 bad values starting at 15250 mSec, as shown below

I’m not real sure what happened here.  A FIFO count of 308 is nowhere near the overflow condition, and it is an even multiple of 28 (the packet size), so everything should have gone swimmingly.  However the displayed value of 2.04 degrees at time 8859 mSec is clearly incorrect, as I was manually (and slowly) rotating the MPU at the time.

Another issue with all of this is the FIFO count associated with the number of interrupts shown in the output.  22 interrupts should produce 22*28 = 616 bytes, but mpu.GetFIFOCount() only returns 308 – exactly half the expected value.  So, either the packet size is actually 14 bytes (not very likely, as mpu.GetPacketSize() returns 28) or the IMU is only loading half a packet on each interrupt.  I added digitalWrite() statements to the ISR so I could directly monitor interrupt occurrences with my O’scope, and the interrupts happened exactly as expected, at approximately 4.58 mSec intervals (about 220Hz).  The 100 mSec delay at the top of loop() should produce approximately 100/4.58 = ~22 interrupts each iteration, and that is what is reported in the output.  So, why the x2 error in the reported FIFO count?

I ran another test, and this one responded perfectly for as long as I let it run (about 22 seconds). During the run I manually rotated the robot (and its attached IMU) back and forth, as shown in the following Excel plot

14 October 2019 Update:

There is clearly something not-quite-right with the way the MPU6050 reports the current length of the FIFO contents.  Here are the results from a recent run:

Aside from the fact that 22 x 28 = 616 not the reported 308, there is also the problem that after one more interrupt (23 vs 22), the reported FIFO content length didn’t go up by 14 (half the expected 28, but…) but instead by 12 bytes – what the heck?  This clearly implies that MPU interrupts aren’t entirely synchronous with actually filling the FIFO – like some sort of race condition?  In other words, the number that is reported by mpu.getFIFOCount() isn’t always an integer multiple of the packet size!  This is contrary to Homer’s assumption that the only way for mpu.getFIFOCount() to retrieve a non-integral multiple was for the FIFO to overflow. This clearly is not happening here, but I’m still getting non-integer multiple results.  Here’s another snippet from the same run:

In the above snippet, it can be seen that a 22 interrupt interval can sometimes result in 316 bytes being reported rather than the expected (ignoring for the moment the issue of a 2x error), and the ‘success’ of removing 36 bytes still resulted in a ‘bad’ yaw value computation (-179.17) 308.  In the very next loop iteration, 22 interrupts resulted in 328 bytes being reported. This time removing the excess allowed a valid yaw computation (-30.46).  So, a 22 interrupt loop interval can result in 308 (the ‘normal’ result), 316, or 328.

15 October 2019 Update:

I changed the loop() delay time from 100 mSec to 200 mSec, and (with no other changes) re-ran the test, with the following output:

The above results showed the expected 42-43 interrupt count between loop() iterations, and the expected (ignoring for the moment the 2x error factor) FIFO contents byte count of 588-616.  However, there were a couple of anomalous occurrences on two consecutive loop() iterations at 19094 and 19312 mSec.  The first one reported a FIFO contents count of 600 instead of 616 and (even after error correction) a clearly erroneous yaw value of -179.20, and the second one reported 604 bytes and (after error correction) an apparently valid yaw value of 61.54.

After an email exchange with Homer, I tried replacing this line

with this one:

In the following section of Homer’s algorithm

After this change, I re-ran the test at the 100 mSec loop() iteration delay setting with and without the above code change.  In both cases, I still got errors trapped,  as shown below (The test conditions for each run below are noted at the top of the text file)

So then, also at Homer’s suggestion, I instrumented the code to get the FIFO count rapidly several times after an error detection to see if the first mpu.getFIFOCount() occurred while data was actually being loaded into the FIFO and therefore got an erroneous count. So, I changed Homer’s code correction section to the following:

and re-ran the 100 mSec loop() iteration delay test.  What I got was this:

Wow!  the FIFO count changed!  The first mpu.getFIFOCount() at the top of the detection section got 334, and the next 3 calls all got 336!  So the first mpu.getFIFOCount() call must have hit the mpu 26/28 of the way through the packet load!

So, the MPU6050 packet load scheme isn’t atomic and there is, in fact, some sort of a race condition.  I think we have all been assuming that the MPU6050 loads the FIFO with a complete packet and then triggers an interrupt, while it appears that it is actually happening the other way around; the interrupt is triggered and then the packet is loaded into the FIFO. Most of the time this doesn’t cause a problem, but if you ‘get lucky’, the register associated with the mpu.getFIFOCount() call is read before the entire packet is loaded

16 October 2019 Update:

Homer asked me to change the code to determine exactly how long it takes to “clear the error”, which I take to mean “how long would a program have to wait for the MPU6050 to finish loading the rest of the packet into the FIFO”.  Homer sent me some sample code, which I modified slightly to produce the output Homer was looking for, as shown below:

When I ran this code, I got the following output:

 

21 October 2019 Update:

After several more email exchanges with Homer, he believes that he has now come up with pretty ‘bullet-proof’ way of handling MPU6050 errors, with the following subroutine

The idea behind this subroutine is to ensure that any overflow condition is detected and managed properly. The routine is completely independent of interrupts, so it can be used in a program using interrupts or polling.

Homer also sent me some test results using the program, with a variable loop delay time designed to be just below and then just above the delay required to overflow the buffer. This demonstrated that his subroutine properly handles FIFO overflow conditions, and returns valid packet data whenever possible.

In the above output, the first column is the loop delay in mSec, then the ‘Flag’ value returned by the subroutine, and then the ypr (yaw,pitch, roll) values extracted from the buffer filled by the subroutine.  As is shown, loop delay values above about 177 mSec start returning ‘2’ Flag values, indicating the routine detected (and recovered from) an overflow condition.

I replicated this experiment on my end, but discovered that for my installation, I had to use a loop delay almost exactly twice the value used by Homer (360-370 vs 177-178). The implication is that either my MPU6050 is loading the FIFO at one-half the rate of Homer’s unit, or my IMU has a buffer size twice the one being used by Homer.  Curioser and Curioser ;-).

Here’s my code and results:

Summary of results to date:

 

  • Homer has clearly created an effective algorithm for detecting and recovering from FIFO overflow events, and the subroutine that implements his algorithm can be used in an interrupt-driven or polling configuration.  I personally like the polling arrangement because it requires one less connecting wire, and removes the need for an ISR.
  • Both Homer and I have demonstrated that the algorithm works as designed, but the loop delay required to just trigger FIFO overflows in my configuration is almost exactly twice the delay needed for Homer’s. This needs to be explained.
  • There is still the problem of a factor of 2 error between the expected return from mpu.getFIFOCount() and the number calculated by multiplying the number of interrupts times the expected packet length. In my configuration using an interrupt-driven arrangement, a 22-interrupt loop delay resulted in a FIFO count of 308.  But, 22 x the expected packet size of 28 yields 616, not 308!  This also needs to be explained.

23 October 2019 Update:

To further investigate the ‘factor of 2 error’ problem, I went back and re-ran the initial experiment that produced the problem, just to verify that it was still there.  Here’s the entire program to recreate the results, and a partial printout of the results:

Significant points to note from the above output:

  • The time (in mSec) between adjacent output lines is about 111 mSece on average (110.5762 mSec according to Excel), and the reported number of interrupts is either 20 or 21 in almost every case (average is 20.0833333 according to Excel).  O’Scope observations confirm this is the case, as the output from the interrupt monitoring pin shows almost exactly 5 mSec between interrupts and an almost exactly 200 Hz interrupt frequency.  So, an interrupt count of 20 or 21 is reasonable, and cannot be the reason for the ‘factor of 2’ error in the FIFO buffer count.  However, there is an apparent ‘off by 2’ problem with the interrupt count, as the reported FIFO counts are consistent with interrupt counts of 22 & 23 rather than 20 or 21 as shown
  • Every so often a 22 interrupt span produces a FIFO count of 336 instead of 308 – a 28 byte difference.  In fact, over a run time of about 6.5 minutes (388,671 mSec), this phenomenon occurred 264 times, about 0.07% of the time.

The inference I draw from the above two points is that the MPU6050 chip isn’t actually loading an entire 28-byte packet during each interrupt cycle, but is in fact loading only 14 bytes each time.  With an artificially imposed 100 mSec (about 22 interrupt cycles) loop delay, the MPU loads 22 * 14 = 308 bytes.

An alternative explanation is that the MPU6050 loads complete packets into the FIFO every other interrupt. Under this scenario, it takes 11 cycles (22 interrupts) to load 11*28 = 308 bytes, and 12 cycles (24 interrupts) to load 12 * 28 = 336 bytes.

Another (and maybe even more reasonable) possibility is that the MPU6050 loads packet data into the FIFO one byte at a time, at an average rate of 14 bytes per 5 mSec, or 2.8 KBS. To the outside world, this might not be noticeable unless the application was trying to retrieve packets at the full 200 Hz.  At rates of 100Hz or less, the MPU would still have loaded at least one complete packet every time one was requested.

29 October 2019 Update:

As someone once said as both a benediction and a curse – “May you live in interesting times”.  In our ongoing investigation into the depths of MPU6050 behavior, we now have solved one mystery solved, only to encounter another one.

As I have noted several times above, there appears to be a factor of 2 mismatch between the number of interrupts counted from ISR activations and the number of bytes reported by mpu.getFIFOCount().  Either the interrupt count is off, or the FIFO count is off, and there doesn’t seem to be any other explanation.  Well, now I have determined that there is apparently a third option – that the MPU6050 only loads a packet into the FIFO on every other interrupt! This doesn’t make a whole lot of sense to me, but I now have what I believe to be irrefutable proof that this is exactly what is happening.

I set up a very simple program to get the FIFO byte count as rapidly as possible.  In order to avoid slowing the system down, I stored the results in a 1000-entry array during the retrieval process, and then printed out all 1000 entries at the end. Then I plotted the results in Excel as shown in the following figure:

The stairsteps in the above plot are (almost) uniformly the expected packet size of 28 bytes;  the MPU6050 is clearly loading entire 28-byte packets into the FIFO each time, contrary to one of the possibilities I had considered to explain the factor of 2 inconsistency between the interrupt count and the FIFO count.

However, when I started looking at the time interval between FIFO loads, I got the following plot:

As the above plot clearly shows, the MPU6050 loads a new packet into the FIFO every 10 mSec or so (I think it is exactly 10 mSec, with the differences explained by the lack of resolution in the time axis).  But wait – the MPU6050 produces an interrupt on the Arduino interrupt pin every 5 mSec – not every 10 mSec!  So, the mystery of the ‘factor of 2’ error is now solved.  The MPU6050 loads a new packet into the FIFO every other interrupt – not every interrupt as expected. So, the interrupt count is too high by a factor of 2 when compared to the actual FIFO count.

Unfortunately (or fortunately depending on one’s point of view), that simply begs the question – why is the MPU6050 producing interrupts on a 5 mSec schedule when it only changes the FIFO count on a 10 mSec schedule?  Who knows?

Changing the subject slightly, I got another ‘overflow proof’ version of GetCurrentFIFOPacket() from Homer Creutz to try.  I set up a small program to test it.   Since the idea was that GetCurrentFIFOPacket() would return a valid packet no matter how long it had been since the last time it was called, I set up a program to iterate through a sequence of delay times from 100 mSec to 500 mSec. For each loop delay setting I called GetCurrentFIFOPacket() multiple times and printed out the extracted yaw value and the return status from the function.

As the plot below shows, everything works swimmingly up to a loop delay of 350 mSec. Unfortunately, the wheels came off with a 400 mSec loop delay, and the packet values were all invalid after that – bummer!

 

06 November 2019 Update:

Holy cow!  Homer and I started this marathon project back in mid October, and we don’t seem to be any closer to nirvana than we were before.  What I can say though, is that I have learned a lot more about the MPU6050 and Jeff Rowberg’s driver software ;-).

One of the major issues we encountered with the polling method (vs interrupt-driven) is that, without the synchronization with MPU6050’s internal processes provided by the interrupt model, we can’t count on (no pun intended) the FIFO count returned by mpu.getFIFOCount() being accurate.  Depending on the timing of the call, the return value could be wildly inaccurate.  However, we discovered that two back-to-back calls to mpu.getFIFOCount() always resulted in an accurate count, although there was still a very small probability that a 3rd call would be required.  So, I created a small routine called ‘getPollingFIFOCount()’ to wrap this construct, as follows:

This function simply loops until two adjacent calls to mpu.getFIFOCount() return the same value. As always, however, there is a backup counter to force the function to exit if it gets hung up for any reason.  In the calling program I have a line like:

to set the backup loop counter value.

Another major issue with the MPU6050 is that it can overflow its packet FIFO buffer, and there doesn’t appear to be any way to prevent this, other than removing packets from the FIFO at the same rate or higher than they are loaded.  This may not be a problem for ‘high dynamics’ applications like quadcopters or balancing robots that need continuous attitude information, but for ‘slow dynamics’ applications like my wall-following robot where yaw information is only needed a few times per second, overflow becomes a practical certainty.  In an interrupt -driven environment, it might be reasonable to simply retrieve and then discard DMP packets as fast as they arrive, and then only process the latest packet when the application needs an update.  However, for a polling strategy, doing this may or may not work depending on what else is going on. So, for polling we need a way of ensuring we can retrieve a valid packet from the DMP FIFO, whether or not the FIFO has overflowed. Doing so would be trivial if the FIFO length was an integral multiple of packetSize, but it isn’t – yuk!!  So, now when the FIFO overflows, the first packet in the FIFO is guaranteed to be corrupted. The good news is, the last complete packet (i.e. the most recent information)  in the FIFO is always valid, but getting to that last good packet is non-trivial.  To summarize:

  • FIFO overflow is almost certain to happen in a ‘low dynamics’ polling-based program
  • When FIFO overflow occurs, the first packet is always corrupted, but the last one is still valid
  • The last (valid) packet isn’t the last [packetSize] bytes, due to the non-modular size of the FIFO
  • The MPU6050 DMP may start loading another packet at any time, but there will always be 10 mSec or so between packet loads
  • Any last-valid-packet retrieval algorithm must work for any loop delay time.

So, the ‘last-valid-packet-retrieval’ algorithm is something like this:

  1. Get the current packet count, using the ‘getPollingFIFOCount()’ routine above
  2. Extract [packetSize] chunks until there are less than 2* [packetSize]  bytes left. This ensures there is exactly one valid packet remaining in the FIFO.
  3. Extract one more [packetSize] chunk and return it as the result.

Note that implementation of this algorithm will require several ‘while’ loops, so there must also be provision for forcibly terminating all such loops in case some edge condition causes the normal exit condition to never be reached.  Homer and I have been creating and testing versions of this function for the last couple of weeks, without quite getting there yet.  Either they don’t handle all the edge conditions, or they are too slow as the loop delays get larger.

While I was testing my most recent concoction, I ran into a third major problem with the MPU6050 API.  I wanted to be able to remove up to 35 [packetSize] chunks (980 bytes) in one go, and I expected the mpu.getFIFOBytes() API call to manage the required chunking for me.  When I tried this trick, the getFIFOBytes() function crashed repeatedly.  Eventually I figured out that the reason it was crashing wa that it’s ‘length’ parameter is declared as a ‘uint8_t’ object and of course it couldn’t handle a value greater than 255 without choking.  I thought that was a little odd, but that maybe changing the declaration from ‘uint8_t’ to ‘uint16_t’ in MPU6050.h/cpp would solve the problem.  Nope – It turns out that, due to the way that the Arduino I2C bus operates, there is a limitation on how many bytes can be transferred across the bus in one operation, and this limit is currently set at 32 bytes.  As a result of this fundamental limitation, all the I2CDev functions that use the I2C bus also have the same underlying limitation and all of them have their length parameters declared as ‘uint8_t’. This reminds me of the old pre-scientific myth about the earth resting on the back of a turtle.  When the myth was challenged, the defender says “its no use – it’s turtles all the way down”.  In our case “it’s no use – it’s uint8_t all the way down!”.

So, I had to figure out a way around this problem, so I decided to create yet another wrapper function, this one cleverly called ‘getManyFIFOBytes()’. The idea for this would be to pull one [packetSize] chunk off the FIFO at a time using the normal ‘mpu.getFIFOBytes()’ call and place the result in a destination buffer large enough to hold the entire result.  Since it has been a (long, long) while since I last played with pointer gymnastics, I decided to write a short test program to figure out a reasonable technique.  Here is the program, and some results:

As the output shows, the ‘getManyFIFOBytes(uint16_t* buffer, uint16_t len)’ function can take an arbitrary ‘uint16_t’ length parameter and fill the destination buffer with [packetSize] (28 bytes in this case) chunks, followed by the non-modular remainder.  Although this test was done with a simulated receive buffer and simulated packet contents, I believe it will work using the actual contents of the MPU6050 FIFO and repeated calls to ‘mpu.getFIFOBytes()’ to retrieve the ‘chunks’ and any non-modular remainder bytes.

Having convinced myself that my two helper functions actually did what I wanted, I revised my latest MPU6050 test program (MPU6050_Overflow8.ino) to test the whole thing out. The test program steps through a series of loop delays from 50 to 550 msec, and takes 25 yaw measurements at each loop delay setting.  The Excel plot and a snippet of the output log from the program is shown below

  • There were no invalid packets in the entire run, so the attempt to avoid invalid packet retrieval was a success.
  • The actual loop delay per measurement pass varied quite a bit from the desired loop delay setting.  For instance the average measured loop delay for the 25 yaw measurement passes at the 200 mSec loop delay setting was actually 274.08 mSec, almost 50% higher than desired.  At the 150 mSec loop delay setting, the average loop delay was 223.88 mSec, and at the 100 mSec setting it was 133.24 mSec.  So, if the application needs 5 measurements/sec, the allowable loop delay between passes needs to be between 100 and 150 mSec.

12 November 2019 Update:

Based on some comments and data from Homer’s experiments, it appears that a FIFO reset can be done in less than 10 mSec.  This means that a packet retrieval algorithm based on a mpu.resetFIFO() call will miss at most one 10 mSec FIFO load interval, which is insignificant compared to the typical polling interval (200 mSec in my case).  So I decided to try a ‘brute force’ approach to ‘GetCurrentFIFOPacket()’ as follows:

When I first tested this algorithm, I ran into an occasional glitch where the FIFO would somehow fail to reset, resulting in a FIFO count > 28. So I added an outer loop to allow multiple shots at getting a clean FIFO reset.

Shown below are some Excel plots from some long runs

 

The first plot above shows a long run (over 17 minutes) of valid yaw data (the perturbations in the yaw plot are due to occasional manual rotations of the sensor to verify that the sensor was still responding).  The interesting thing about this plot is the yellow curve, showing the ‘outer loop’ count. The only way this value can be greater than 1 is if the first mpu.resetFIFO() call fails to actually clear the FIFO, which appears to happen an average of about once per minute, or about once every 6,000 10 mSec MPU6050 FIFO cycles.

The second plot is a closeup of the first 171.42 seconds of the overall plot, showing the detail of the FIFO clear failures, occurring about once per minute.

So, it is clear that the MPU6050 has some significant behavioral quirks, especially when used in a non-interrupt-driven environment.  That being said, I believe the ‘brute force’ algorithm shown above is a reliable way of interfacing with the MPU6050 in a polling environment, and obviates the need for a separate interrupt line, and the associated ISR software.

This will probably be the last update on this subject, as I now think Homer and I have pretty much beat the MPU6050 FIFO issue to death.

Stay tuned!

Frank