This the second part in a four part series about how to use Python for heart rate analysis. In this part you will learn about more complex information embedded in the heart rate signal, and how to extract it using Python.
Important: The code in this tutorial is licensed under the GNU 3.0 open source license and you are free to modify and redistribute the code, given that you give others you share the code with the same right, and cite my name (use citation format below). You are not free to redistribute or modify the tutorial itself in any way. By reading on you agree to these terms. If you disagree, please navigate away from this page.
van Gent, P. (2016). Analyzing a Discrete Heart Rate Signal Using Python. A tech blog about fun things with Python and embedded electronics. Retrieved from: http://www.paulvangent.com/2016/03/21/analyzing-a-discrete-heart-rate-signal-using-python-part-2/
Github rep: Github repository located here
Find the other parts at these pages:
- Part 1: Opening the data, detecting the first peaks and calculating the BPM;
- Part 2: Extracting complex measures from the heart rate signal;
- Part 3: Automatic error detection, outlier rejection and signal filtering;
- Part 4: Detecting and rejecting noisy signal parts.
The following tutorial assumes intermediate knowledge of the Python programming language, FIR-filters and fast fourier transform methods. Highschool mathematics and basic statistics will help in understanding the theory and background, but are not required for the coding.
The tutorial should be suitable for those with intermediate levels of Python skill. It is divided into separate parts so that you can easily skip over those parts you understand. The data was collected with a simple data logger based on an ATTiny45, as described in another post.
Some Theory and Background
The heart rate signal contains a lot of information, not just about your heart but also about your breathing, short-term blood pressure regulation, body temperature regulation and hormonal blood pressure regulation (long term). It has also (though not always consistently) been linked to mental effort, and not surprisingly so since your brain is a very hungry organ, using up to 25% of your total glucose and 20% of your oxygen consumption. If its activity increases your heart needs to work harder to keep it supplied.
The measures we are interested in can be split into time-series data en frequency domain data. If you are familiar with fourier transforms the frequency part will make sense. If not, the wikipedia page has a good explanantion, and also a very nice visualisation of the process. The basic idea is that you take a signal that repeats over time (such as the heart rate signal), and determine what frequencies make up the signal. You ‘transform the signal from the time domain to the frequency domain’. Here is another visualisation I particularly love, which clearly shows how a repeating signal can be approximated as the sum of different sine waves repeating over time.
Time Series Data
For the time-series portion of the heart rate signal we are mostly concerned with the intervals between the heart beats and how they vary over time. We want the position of all R-complexes (R1, R2,…Rn), the intervals between them (RR1, RR2,…RRn, defined as ) and the differences between adjacent intervals (RRdiff-1,…RRdiff-n, defined as ).
The time series measures often found in the scientific literature are:
- BPM, the amount of heart beats per minute, we calculated this in the previous part;
- IBI (inter-beat interval), the mean distance of intervals between heartbeats, we implicitly calculated this in the previous part as part of the BPM calculation;
- SDNN, the standard deviation of intervals between heartbeats:
- SDSD, the standard deviation of successive differences between adjacent R-R intervals:
- RMSSD, the root mean square of successive differences between adjacent R-R intervals:
- pNN50/pNN20, the proportion of differences greater than 50ms / 20ms.
The IBI, SDNN, SDSD, RMSSD en pNNx (and also frequency domain measures) are often grouped under “Heart Rate Variability” (HRV) measures, because they give information about how the heart rate varies over time.
Frequency Domain Data
On the frequency side of the heart rate signal the most often found measures are called the HF (High Frequency), MF (Mid Frequency) and LF (Low Frequency) bands, an eternal testament to the level of creative naming found in science. The MF en HF bands are often taken together and just labeled ‘HF’. LF and HF roughly correspond to 0.04-0.15Hz for the LF band and 0.16-0.5Hz for the HF band. The LF band seems related to short-term blood pressure variation, the HF band to breathing rate.
The frequency spectrum is calculated performing a Fast Fourier Transform over the R-R interval dataseries. This method is, as the name implies, fast compared to the Discrete Fourier Transform method. The larger the dataset, the larger the speed difference between the methods.
We will calculate the measure for HF and LF by first to re-sampling the signal so we can estimate the spectrum, then transforming the re-sampled signal to the frequency domain, then integrating the area under the curve at the given intervals.
Time Domain Measures – Getting Started
Looking at the above time-series measures, we need a few ingredients to easily calculate all of them:
- A list of the positions of all R-peaks;
- A list of the intervals between all subsequent R-R pairs (RR1, RR2,.. RR-n);
- A list of the differences between all subsequent intervals between R-R pairs (RRdiff1,… RRdiffn);
- A list of the squared differences between all subsequent differences between R-R pairs.
We already have the list of all R-peak positions from the detect_peaks() function from the first part, which is contained in dict[‘peaklist’]. We also have a list of R-R pair differences from the calc_RR() function, which is in dict[‘RR_list’]. Great! No additional code written, already 50% of the way there.
To get the final two ingredients we use dict[‘RR_list’] and calculate both the differences and the squared differences between adjacent values:
RR_diff =  RR_sqdiff =  RR_list = measures['RR_list'] cnt = 1 #Use counter to iterate over RR_list while (cnt < (len(RR_list)-1)): #Keep going as long as there are R-R intervals RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1])) #Calculate absolute difference between successive R-R interval RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2)) #Calculate squared difference cnt += 1 print RR_diff, RR_sqdiff
Calculating Time Domain Measures
Now that the ingredients are there, it’s easy to calculate all measures:
ibi = np.mean(RR_list) #Take the mean of RR_list to get the mean Inter Beat Interval print "IBI:", ibi sdnn = np.std(RR_list) #Take standard deviation of all R-R intervals print "SDNN:", sdnn sdsd = np.std(RR_diff) #Take standard deviation of the differences between all subsequent R-R intervals print "SDSD:", sdsd rmssd = np.sqrt(np.mean(RR_sqdiff)) #Take root of the mean of the list of squared differences print "RMSSD:", rmssd nn20 = [x for x in RR_diff if (x>20)] #First create a list of all values over 20, 50 nn50 = [x for x in RR_diff if (x>50)] pnn20 = float(len(NN20)) / float(len(RR_diff)) #Calculate the proportion of NN20, NN50 intervals to all intervals pnn50 = float(len(NN50)) / float(len(RR_diff)) #Note the use of float(), because we don't want Python to think we want an int() and round the proportion to 0 or 1 print "pNN20, pNN50:", pnn20, pnn50
That’s it for the time series measures. Let’s wrap them up in callable functions. We expand the calc_RR() function to calculate our extra ingredients and append them to the dictionary object, and also merge calc_bpm() with the other time series measurements into a new function calc_ts_measures() and append them to the dictionary:
def calc_RR(dataset, fs): peaklist = measures['peaklist'] RR_list =  cnt = 0 while (cnt < (len(peaklist)-1)): RR_interval = (peaklist[cnt+1] - peaklist[cnt]) ms_dist = ((RR_interval / fs) * 1000.0) RR_list.append(ms_dist) cnt += 1 RR_diff =  RR_sqdiff =  cnt = 0 while (cnt < (len(RR_list)-1)): RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1])) RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2)) cnt += 1 measures['RR_list'] = RR_list measures['RR_diff'] = RR_diff measures['RR_sqdiff'] = RR_sqdiff def calc_ts_measures(): RR_list = measures['RR_list'] RR_diff = measures['RR_diff'] RR_sqdiff = measures['RR_sqdiff'] measures['bpm'] = 60000 / np.mean(RR_list) measures['ibi'] = np.mean(RR_list) measures['sdnn'] = np.std(RR_list) measures['sdsd'] = np.std(RR_diff) measures['rmssd'] = np.sqrt(np.mean(RR_sqdiff)) NN20 = [x for x in RR_diff if (x>20)] NN50 = [x for x in RR_diff if (x>50)] measures['nn20'] = NN20 measures['nn50'] = NN50 measures['pnn20'] = float(len(NN20)) / float(len(RR_diff)) measures['pnn50'] = float(len(NN50)) / float(len(RR_diff)) #Don't forget to update our process() wrapper to include the new function def process(dataset, hrw, fs): rolmean(dataset, hrw, fs) detect_peaks(dataset) calc_RR(dataset, fs) calc_ts_measures()
Call it like this:
import heartbeat as hb #Assuming we named the file 'heartbeat.py' dataset = hb.get_data('data.csv') hb.process(dataset, 0.75, 100) #The module dict now contains all the variables computed over our signal: hb.measures['bpm'] hb.measures['ibi'] hb.measures['sdnn'] #etcetera #Remember that you can get a list of all dictionary entries with "keys()": print hb.measures.keys()
Analysing the signal and calculating all measures in a single thread takes about 25ms on my computer. That’s pretty quick!
Frequency Domain Measures – Getting Started
The calculation of the frequency domain measures is a bit more tricky. The main reason is that we do not want to transform the heart rate signal to the frequency domain (doing so would only return a strong frequency equal to BPM/60, the heart beat expressed in Hz). Rather, we want to transform the R-R intervals to the frequency domain. Difficult to understand? Think about it like this: your heart rate varies over time as your heart speeds up and slows down in response to the changing demands of your body. This variation is expressed in the changing distances between heart beats over time (the R-R intervals we calculated earlier). The distances between R-R peaks vary over time with their own frequency. To visualize, plot the R-R intervals we calculated earlier:
From the plot you can clearly see that the R-R intervals do not change abrubtly per heart beat, but vary over time in a sine-wave like pattern (more precisely: a combination of different sine waves). We want to find the frequencies that make up this pattern.
However, any fourier transform method depends on evenly spaced data, and our R-R intervals are most certainly not evenly spaced in time. This is because the position in time of the intervals is dependent on their length, which is different for each interval. I hope this makes sense.
To find our measures, we need to:
- Create an evenly spaced timeline with the R-R intervals on it;
- Interpolate the signal, which serves to both create an evenly spaced time-series and increase the resolution;
- This interpolation step is also called re-sampling in some studies.
- Transform the signal to the frequency domain;
- Integrate the area under the LF and HF portion of the spectrum.
Calculating Frequency Domain Measures
First we create an evenly spaced timeline for the R-R intervals. To do this we take the sample position of all R-peaks, which is housed in the list dict[‘peaklist’] that we calculated in part 1. We then assign the y-values to these sample positions from the list dict[‘RR_list’], which contains the duration of all R-R intervals. Finally we interpolate the signal.
from scipy.interpolate import interp1d #Import the interpolate function from SciPy peaklist = measures['peaklist'] #First retrieve the lists we need RR_list = measures['RR_list'] RR_x = peaklist[1:] #Remove the first entry, because first interval is assigned to the second beat. RR_y = RR_list #Y-values are equal to interval lengths RR_x_new = np.linspace(RR_x,RR_x[-1],RR_x[-1]) #Create evenly spaced timeline starting at the second peak, its endpoint and length equal to position of last peak f = interp1d(RR_x, RR_y, kind='cubic') #Interpolate the signal with cubic spline interpolation
Note that the time series does not start at the first peak, but at the sample position of the second R-peak. Because we work with the intervals, the first interval is available at the second peak.
We can now use the created function f() to find the y value of any x-position within our signal range:
print f(250) #Returns 997.619845418, the Y value at x=250
Simlarly, we can pass our entire timeseries RR_x_new to the function and plot it:
plt.title("Original and Interpolated Signal") plt.plot(RR_x, RR_y, label="Original", color='blue') plt.plot(RR_x_new, f(RR_x_new), label="Interpolated", color='red') plt.legend() plt.show()
Now to find the frequencies that make up the interpolated signal, use numpy’s fast fourier transform np.fft.fft() method, calculate sample spacing, convert sample bins to Hz and plot:
#Set variables n = len(dataset.hart) #Length of the signal frq = np.fft.fftfreq(len(dataset.hart), d=((1/fs))) #divide the bins into frequency categories frq = frq[range(n/2)] #Get single side of the frequency range #Do FFT Y = np.fft.fft(f(RR_x_new))/n #Calculate FFT Y = Y[range(n/2)] #Return one side of the FFT #Plot plt.title("Frequency Spectrum of Heart Rate Variability") plt.xlim(0,0.6) #Limit X axis to frequencies of interest (0-0.6Hz for visibility, we are interested in 0.04-0.5) plt.ylim(0, 50) #Limit Y axis for visibility plt.plot(frq, abs(Y)) #Plot it plt.xlabel("Frequencies in Hz") plt.show()
Nice! You can clearly see the LF and HF frequency peaks in the signal. The high peak left in the graph consists of the VLF (Very Low Frequency) and ULF (Ultra Low Frequency) band in the HRV. In the literature these are often connected to physical activity, thermal regulation and hormonal blood pressure regulation.
The last thing remaining is to integrate the area under curve at the LF (0.04 – 0.15Hz) and HF (0.16 – 0.5Hz) frequency bands. We need to find the data points corresponding to the frequency range we’re interested in. During the FFT we calculated the one-sided frequency range frq, so we can search this for the required data point positions.
lf = np.trapz(abs(Y[(frq>=0.04) & (frq<=0.15)])) #Slice frequency spectrum where x is between 0.04 and 0.15Hz (LF), and use NumPy's trapezoidal integration function to find the area print "LF:", lf hf = np.trapz(abs(Y[(frq>=0.16) & (frq<=0.5)])) #Do the same for 0.16-0.5Hz (HF) print "HF:", hf
LF: 38.8900414093 HF: 47.8933830871
These are the areas under the frequency plot at the frequency spectra of interest. Remember from the theory that HF is related to breathing and LF to short-term blood pressure regulation. The measures have also been implicated in increased mental activity.
Wrapping It All Up
We’ve come a long way and can extract many meaningful measures from a heart rate signal now. However, chances are the module will fail if you input other heart rate data, because it may be more noisy than our ideal data, or may contain artefacts. The third part of this series will deal with signal filtering, error detection and outlier rejection.