Cam Design

The shape of the cam is determined by the sample of audio the user wants to translate into xylophone notes. For this project, We Will Rock You by Queen was chosen because it has a well-known, easily recognizable beat that could be translated into two notes without changing the tune entirely. 


Audio Processing

Although the majority of people know the stomp-stomp-clap tune that this song is famous for, the vocals and instrumentals interfere with a computer's ability to immediately detect the most distinguishable aspects of the audio. Figure 1 shows the entire audio clip before any processing occurs. Visually, this means nothing to humans, but audially it is very clear. That clarity is what needs to be emulated by the software. Before that can happen, a small section of the song needs to be chosen to make for a manageable cam size. The chosen section, Figure 2, is the 6 seconds that make up the clearest and most recognizable beat of the song. 

From here, audio analysis can begin. The process to convert audio input into cam shape is complicated, but can be approximated as 3 main steps:


Figure 1: Full song audio clip

Figure 2: Desired section audio clip



Sound Filtering

The purpose of sound filtering is to emulate the activation of specific cochlea in the inner ear. Cochlea are activated at specific audio energy levels, so the audio clip is divided into different bandwidths based on the maximum frequency of the song. Tuning this parameter aids in the clarity of the peaks of each bandwidth.  

Because humans perceive the tune and beat of a song based on multiple instruments, the bandwidth division needs to be coarse enough to allow the highest energy notes to remain in the same bandwidth, but fine enough to have comprehensible separation of bandwidths. Using Figure 3, it can be seen that the majority of the stomps from the stomp-stomp-clap are in data bands 1 and 2, while the clap in stomp-stomp-clap remains almost exclusively in data band 2. This can especially be seen after the beat of near silence because the stomp-stomp-clap becomes the only sound in the song, so its division becomes most prominent. The notes, and therefore energies, from the vocals change rapidly and continuously, so determining its main bandwidth is difficult, if not impossible.

Figure 3: Separation into specific audio channels

Because the stomp-stomp-clap is the most prominent and well-known portion of the song, data band 2 was chosen for analysis.


  • All code leading to this point, sans tuning, was found on Rice.edu and modified for this project purposes. (It originally was a beat matching program, so all code for the second song was removed, as well as everything after audio filtering and smoothing, which received no change.)


Peak Detection

The analysis of a data band consists of peak location determination. These peaks are found using the 'findpeaks' function from Matlab's Signal Processing Toolbox in two ways: based upon a minimum peak prominence and based upon a minimum distance between peaks. These different methods find different peaks, so by using both, the desired peaks can be found. 

After finding both sets of peak locations, the song is divided at obvious division points for parsing the vectors by threshold values. The obvious division points in this audio clip are between every set of three peaks. The threshold values are then determined based on the minimum valley energy. The peak vectors are then re-combined and compared for intersecting points. The remaining points are the "true" peak points and therefore the ones used for note positioning on the cam. To get the desired tune, some of these notes may need to be removed from the vector, as points 2 and 5 did in this song.

Figure 4: Peak localization of desired audio channel


Angular Positioning

Using the vector of peak data and the desired cam radius, the location of hills and valleys on the cam can be determined. To get the desired output sound from the mechanism, the peak data needs to be evaluated manually. As approximated data is determined from the data band, peak energy is not necessarily consistent with what would be captured from extracting individual instrument data from before audio editing occurred in the recording studio. Because of this, peak height does not always align with what note the listener assumes should be coming. 

To manually evaluate the peak points, the user will determine which notes should be high notes and which should be low. High note peak energy data is then set to -1 (because high notes require reducing the cam radius at that location), and low note energy is set to 1. Every other data point in the data band vector is set to 0. This leaves a linearized and exaggerated view of the cam circumference. To translate this linearized circumference into angular, a vector of the non-zero cam points is found and the linear distance from 0 is found and translated into angular distance. This then produces the (x,y) coordinates of each point and is used to find the angle in degrees from 0.

From here, the (x,y) coordinates of each point are determined and plotted on the circle. The color is determined based on whether the note is high or low. 

Figure 5: Peak localization around cam diameter


SolidWorks Cam Design

Using the angles point colors found in the previous step, the cam can be created. Circles are drawn in each of the point locations such that the cam radius difference at either the highest or lowest point is 6mm. The edges of the hills and valleys were then filleted with radii that would allow the follower to smoothly track the edge of the cam. The radii were determined through trial and error to find ones that the motor torque would be sufficient to allow for continuous motion of the cam.

Figure 6: SolidWords cam drawing

Figure 6 is not the final version of the cam, as the final version was altered to have a square hole in the center to prevent slippage between cam and motor.