Science and Bicycles: Designing Experiments
Designing scientific experiments requires a lot of thought, if you want to arrive at useful results. Most studies of bicycles fall into two categories, which I’ll call observation-based and measurement-based.
Observation-based studies starts with an observation. In our case, we ride a lot of bikes. Then it goes like this:
- We begin to identify trends. For example, the bikes that perform best for us have flexible tubing.
- We form a hypothesis: Flexible tubing makes the bike “plane” and thus perform better for us.
- We design a test for the hypothesis. We have three bikes made, two that are flexible, one that is slightly stiffer. We ride them in a double-blind test. If we can tell the stiffer bike reliably from the others based on its inferior performance alone, then we have proven the hypothesis. Why three bikes and not two? Because we must make sure that there isn’t another factor that influences the results – frame alignment or bearing tightness. If the two flexible frames feel the same, and the stiffer one is different, it’s unlikely that these other factors are the cause.
- Repeat measurements: We repeat the measurements reliably until we are certain that it wasn’t by pure chance that the riders identified the bikes.
- Now we have proven that the flexible bikes perform better and thus “planing” exists. Two riders can experience it, so it’s a real phenomenon. (In fact, a single rider would suffice to prove the existence of a phenomenon.) The fact that one rider could not tell the differences between the slightly different frames indicates that different riders have different thresholds for the differences in frame stiffness they can detect.
- The next question is how many riders prefer a more flexible frame, how many a stiffer frame, and for how many does it not matter at all? To determine this, we would repeat the double-blind tests with more riders. Unfortunately, such a study would require hundreds of participants, which is beyond our budget.
- Model validation: We need a measurement that replicates real-world conditions. We could easily weigh each tire, and rank them by weight. However, we would have to prove that weight is the determining factor of tire performance on the road. We’d do that by riding the tires on the road and measuring their speed. Does speed correlate with weight? If yes, then we can use weight as a test for speed. Unfortunately, weight and speed are not always related, so we need to find a better test.
- Repeatability: We need to make sure that we are measuring tire resistance, and not something else. We do this by running repeat experiments. If our test is good, then the same tires will always produce similar results.
- Accuracy: The repeat experiments tell us how accurate our measurements are. If the measurements for the same tires always fall within 2%, then we know our measurements have a “margin of error” of about +/- 1%.
It’s interesting to compare two common methods of testing tires:
Drum tests use a large steel drum. The tire is pushed onto the drum with a force replicating the weight of the rider and bike. By measuring the additional power required to spin the drum, you can measure the resistance of the tire on the drum.
Drum tests are performed under carefully controlled conditions, so their repeatability is excellent. However, the underlying model has not been validated under real-world conditions. If we were to assign grades, we’d say:
Model Validation: F
Roll-down tests use a bike and rider on a short hill. The rider coasts down the hill. Measuring how quickly the bike slows down on the flat “rollout” section allows you to compare the resistance of different tires.
Roll-down tests on actual pavement, with a rider on board, occur in real riding conditions, so validation is not a problem. However, many other factors can influence the results: wind, rider position, temperature. We have to show that we were able to keep these other factors constant. We do this by running repeat measurements. If the same tires always score the same, but different tires are different, then we know that we are measuring tires and not the speed of crosswinds. Even so, we’ll probably never get up to the accuracy of a lab experiment. Thus, the grades are:
Model Validation: A
Without real-road validation, drum tests are nearly useless. Roll-down tests can provide the validation of drum tests. If the same tires perform well on the drum and on the road, then the drum tests could be used to obtain greater accuracy in the measurements. Unfortunately, the real-road tests show that drum tests overlook a crucial component: suspension losses that occur in the rider’s body. So we use roll-down tests instead. They may require very careful testing and multiple runs, but at least they provide useful data.
When looking at studies of bicycle performance, I often am surprised that many don’t go the extra step to make their results truly useful:
- All tests need repeat measurements. An insider told me once that at the Texas A&M wind tunnel, which was used for much bicycle research, changes in air temperature result in very poor repeatability. The time of day has almost as much influence on your results as the actual aerodynamic performance.
- Other studies test bicycles without riders. To make those results useful, the model first must be validated by proving that the rider has no influence on the results.
- Rims and other components are designed to improve laminar airflow, yet there is a lot of evidence that the vibrations of real-road riding make it impossible to achieve laminar flow. Again, the model must be validated, for example, by putting a vibrating wheel in the wind tunnel.
That makes it all the more exciting when we see tests that are done well (and there are plenty of them). After all, most of us do not have superfluous time and money to spend on “improved” bicycles and components based on testing that may be well-intentioned, but is too flawed to produce reliable results.