Analyzing a Discrete Heart Rate Signal Using Python – Part 2

HB

This the second part in a four part series about how to use Python for heart rate analysis. In this part you will learn about more complex information embedded in the heart rate signal, and how to extract it using Python.


Important: The code in this tutorial is licensed under the GNU 3.0 open source license and you are free to modify and redistribute the code, given that you give others you share the code with the same right, and cite my name (use citation format below). You are not free to redistribute or modify the tutorial itself in any way. By reading on you agree to these terms. If you disagree, please navigate away from this page.

Citation format
van Gent, P. (2016). Analyzing a Discrete Heart Rate Signal Using Python. A tech blog about fun things with Python and embedded electronics. Retrieved from: http://www.paulvangent.com/2016/03/21/analyzing-a-discrete-heart-rate-signal-using-python-part-2/

Github rep: Github repository located here


Find the other parts at these pages:

The following tutorial assumes intermediate knowledge of the Python programming language, FIR-filters and fast fourier transform methods. Highschool mathematics and basic statistics will help in understanding the theory and background, but are not required for the coding.
The tutorial should be suitable for those with intermediate levels of Python skill. It is divided into separate parts so that you can easily skip over those parts you understand. The data was collected with a simple data logger based on an ATTiny45, as described in another post.


Some Theory and Background
The heart rate signal contains a lot of information, not just about your heart but also about your breathing, short-term blood pressure regulation, body temperature regulation and hormonal blood pressure regulation (long term). It has also (though not always consistently) been linked to mental effort, and not surprisingly so since your brain is a very hungry organ, using up to 25% of your total glucose and 20% of your oxygen consumption. If its activity increases your heart needs to work harder to keep it supplied.

The measures we are interested in can be split into time-series data en frequency domain data. If you are familiar with fourier transforms the frequency part will make sense. If not, the wikipedia page has a good explanantion, and also a very nice visualisation of the process. The basic idea is that you take a signal that repeats over time (such as the heart rate signal), and determine what frequencies make up the signal. You ‘transform the signal from the time domain to the frequency domain’. Here is another visualisation I particularly love, which clearly shows how a repeating signal can be approximated as the sum of different sine waves repeating over time.

Time Series Data
For the time-series portion of the heart rate signal we are mostly concerned with the intervals between the heart beats and how they vary over time. We want the position of all R-complexes (R1, R2,…Rn), the intervals between them (RR1, RR2,…RRn, defined as Pt1_EQ1 ) and the differences between adjacent intervals (RRdiff-1,…RRdiff-n, defined as ).

To visualize:

Pt2_intervals

The time series measures often found in the scientific literature are:

 

The IBI, SDNN, SDSD, RMSSD en pNNx (and also frequency domain measures) are often grouped under “Heart Rate Variability” (HRV) measures, because they give information about how the heart rate varies over time.

Frequency Domain Data
On the frequency side of the heart rate signal the most often found measures are called the HF (High Frequency), MF (Mid Frequency) and LF (Low Frequency) bands, an eternal testament to the level of creative naming found in science. The MF en HF bands are often taken together and just labeled ‘HF’. LF and HF roughly correspond to 0.04-0.15Hz for the LF band and 0.16-0.5Hz for the HF band. The LF band seems related to short-term blood pressure variation, the HF band to breathing rate.

The frequency spectrum is calculated performing a Fast Fourier Transform over the R-R interval dataseries. This method is, as the name implies, fast compared to the Discrete Fourier Transform method. The larger the dataset, the larger the speed difference between the methods.

We will calculate the measure for HF and LF by first to re-sampling the signal so we can estimate the spectrum, then transforming the re-sampled signal to the frequency domain, then integrating the area under the curve at the given intervals.


Time Domain Measures – Getting Started
Looking at the above time-series measures, we need a few ingredients to easily calculate all of them:

  • A list of the positions of all R-peaks;
  • A list of the intervals between all subsequent R-R pairs (RR1, RR2,.. RR-n);
  • A list of the differences between all subsequent intervals between R-R pairs (RRdiff1,… RRdiffn);
  • A list of the squared differences between all subsequent differences between R-R pairs.

We already have the list of all R-peak positions from the detect_peaks() function from the first part, which is contained in dict[‘peaklist’]. We also have a list of R-R pair differences from the calc_RR() function, which is in dict[‘RR_list’]. Great! No additional code written, already 50% of the way there.

To get the final two ingredients we use dict[‘RR_list’] and calculate both the differences and the squared differences between adjacent values:

~

RR_diff = []
RR_sqdiff = []
RR_list = measures['RR_list']

cnt = 1 #Use counter to iterate over RR_list
while (cnt < (len(RR_list)-1)): #Keep going as long as there are R-R intervals 
    RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1])) #Calculate absolute difference between successive R-R interval
    RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2)) #Calculate squared difference
    cnt += 1

print RR_diff, RR_sqdiff

Calculating Time Domain Measures
Now that the ingredients are there, it’s easy to calculate all measures:

~

ibi = np.mean(RR_list) #Take the mean of RR_list to get the mean Inter Beat Interval
print "IBI:", ibi

sdnn = np.std(RR_list) #Take standard deviation of all R-R intervals
print "SDNN:", sdnn

sdsd = np.std(RR_diff) #Take standard deviation of the differences between all subsequent R-R intervals
print "SDSD:", sdsd

rmssd = np.sqrt(np.mean(RR_sqdiff)) #Take root of the mean of the list of squared differences
print "RMSSD:", rmssd

nn20 = [x for x in RR_diff if (x>20)] #First create a list of all values over 20, 50
nn50 = [x for x in RR_diff if (x>50)]
pnn20 = float(len(NN20)) / float(len(RR_diff)) #Calculate the proportion of NN20, NN50 intervals to all intervals
pnn50 = float(len(NN50)) / float(len(RR_diff)) #Note the use of float(), because we don't want Python to think we want an int() and round the proportion to 0 or 1
print "pNN20, pNN50:", pnn20, pnn50

That’s it for the time series measures. Let’s wrap them up in callable functions. We expand the calc_RR() function to calculate our extra ingredients and append them to the dictionary object, and also merge calc_bpm() with the other time series measurements into a new function calc_ts_measures() and append them to the dictionary:

~

def calc_RR(dataset, fs):
    peaklist = measures['peaklist']
    RR_list = []
    cnt = 0
    while (cnt < (len(peaklist)-1)):
        RR_interval = (peaklist[cnt+1] - peaklist[cnt])
        ms_dist = ((RR_interval / fs) * 1000.0)
        RR_list.append(ms_dist)
        cnt += 1

    RR_diff = []
    RR_sqdiff = []
    cnt = 0 
    while (cnt < (len(RR_list)-1)): 
        RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1])) 
        RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2))
        cnt += 1
    measures['RR_list'] = RR_list
    measures['RR_diff'] = RR_diff
    measures['RR_sqdiff'] = RR_sqdiff
        

def calc_ts_measures():
    RR_list = measures['RR_list']
    RR_diff = measures['RR_diff']
    RR_sqdiff = measures['RR_sqdiff']
    measures['bpm'] = 60000 / np.mean(RR_list)
    measures['ibi'] = np.mean(RR_list)
    measures['sdnn'] = np.std(RR_list)
    measures['sdsd'] = np.std(RR_diff)
    measures['rmssd'] = np.sqrt(np.mean(RR_sqdiff))
    NN20 = [x for x in RR_diff if (x>20)]
    NN50 = [x for x in RR_diff if (x>50)]
    measures['nn20'] = NN20
    measures['nn50'] = NN50
    measures['pnn20'] = float(len(NN20)) / float(len(RR_diff))
    measures['pnn50'] = float(len(NN50)) / float(len(RR_diff))

#Don't forget to update our process() wrapper to include the new function
def process(dataset, hrw, fs):
    rolmean(dataset, hrw, fs)
    detect_peaks(dataset)
    calc_RR(dataset, fs)
    calc_ts_measures()

Call it like this:

~

import heartbeat as hb #Assuming we named the file 'heartbeat.py'

dataset = hb.get_data('data.csv')
hb.process(dataset, 0.75, 100)

#The module dict now contains all the variables computed over our signal:
hb.measures['bpm']
hb.measures['ibi']
hb.measures['sdnn']
#etcetera

#Remember that you can get a list of all dictionary entries with "keys()":
print hb.measures.keys()

Analysing the signal and calculating all measures in a single thread takes about 25ms on my computer. That’s pretty quick!


Frequency Domain Measures – Getting Started
The calculation of the frequency domain measures is a bit more tricky. The main reason is that we do not want to transform the heart rate signal to the frequency domain (doing so would only return a strong frequency equal to BPM/60, the heart beat expressed in Hz). Rather, we want to transform the R-R intervals to the frequency domain. Difficult to understand? Think about it like this: your heart rate varies over time as your heart speeds up and slows down in response to the changing demands of your body. This variation is expressed in the changing distances between heart beats over time (the R-R intervals we calculated earlier). The distances between R-R peaks vary over time with their own frequency. To visualize, plot the R-R intervals we calculated earlier:

Pt2_RR_List

From the plot you can clearly see that the R-R intervals do not change abrubtly per heart beat, but vary over time in a sine-wave like pattern (more precisely: a combination of different sine waves). We want to find the frequencies that make up this pattern.

However, any fourier transform method depends on evenly spaced data, and our R-R intervals are most certainly not evenly spaced in time. This is because the position in time of the intervals is dependent on their length, which is different for each interval. I hope this makes sense.

To find our measures, we need to:

  • Create an evenly spaced timeline with the R-R intervals on it;
  • Interpolate the signal, which serves to both create an evenly spaced time-series and increase the resolution;
    • This interpolation step is also called re-sampling in some studies.
  • Transform the signal to the frequency domain;
  • Integrate the area under the LF and HF portion of the spectrum.

Calculating Frequency Domain Measures
First we create an evenly spaced timeline for the R-R intervals. To do this we take the sample position of all R-peaks, which is housed in the list dict[‘peaklist’] that we calculated in part 1. We then assign the y-values to these sample positions from the list dict[‘RR_list’], which contains the duration of all R-R intervals. Finally we interpolate the signal.

~

from scipy.interpolate import interp1d #Import the interpolate function from SciPy

peaklist = measures['peaklist'] #First retrieve the lists we need
RR_list = measures['RR_list']

RR_x = peaklist[1:] #Remove the first entry, because first interval is assigned to the second beat.
RR_y = RR_list #Y-values are equal to interval lengths

RR_x_new = np.linspace(RR_x[0],RR_x[-1],RR_x[-1]) #Create evenly spaced timeline starting at the second peak, its endpoint and length equal to position of last peak

f = interp1d(RR_x, RR_y, kind='cubic') #Interpolate the signal with cubic spline interpolation

Note that the time series does not start at the first peak, but at the sample position of the second R-peak. Because we work with the intervals, the first interval is available at the second peak.

We can now use the created function f() to find the y value of any x-position within our signal range:

~

print f(250)
#Returns 997.619845418, the Y value at x=250

Simlarly, we can pass our entire timeseries RR_x_new to the function and plot it:

~

plt.title("Original and Interpolated Signal")
plt.plot(RR_x, RR_y, label="Original", color='blue')
plt.plot(RR_x_new, f(RR_x_new), label="Interpolated", color='red')
plt.legend()
plt.show()

Pt2_Interpol

Now to find the frequencies that make up the interpolated signal, use numpy’s fast fourier transform np.fft.fft() method, calculate sample spacing, convert sample bins to Hz and plot:

~

#Set variables
n = len(dataset.hart) #Length of the signal
frq = np.fft.fftfreq(len(dataset.hart), d=((1/fs))) #divide the bins into frequency categories
frq = frq[range(n/2)] #Get single side of the frequency range

#Do FFT
Y = np.fft.fft(f(RR_x_new))/n #Calculate FFT
Y = Y[range(n/2)] #Return one side of the FFT

#Plot
plt.title("Frequency Spectrum of Heart Rate Variability")
plt.xlim(0,0.6) #Limit X axis to frequencies of interest (0-0.6Hz for visibility, we are interested in 0.04-0.5)
plt.ylim(0, 50) #Limit Y axis for visibility
plt.plot(frq, abs(Y)) #Plot it
plt.xlabel("Frequencies in Hz")
plt.show()

Pt2_HRV_FFT

Nice! You can clearly see the LF and HF frequency peaks in the signal. The high peak left in the graph consists of the VLF (Very Low Frequency) and ULF (Ultra Low Frequency) band in the HRV. In the literature these are often connected to physical activity, thermal regulation and hormonal blood pressure regulation.

The last thing remaining is to integrate the area under curve at the LF (0.04 – 0.15Hz) and HF (0.16 – 0.5Hz) frequency bands. We need to find the data points corresponding to the frequency range we’re interested in. During the FFT we calculated the one-sided frequency range frq, so we can search this for the required data point positions.

~

lf = np.trapz(abs(Y[(frq>=0.04) & (frq<=0.15)])) #Slice frequency spectrum where x is between 0.04 and 0.15Hz (LF), and use NumPy's trapezoidal integration function to find the area
print "LF:", lf

hf = np.trapz(abs(Y[(frq>=0.16) & (frq<=0.5)])) #Do the same for 0.16-0.5Hz (HF)
print "HF:", hf

This returns:

~

LF: 38.8900414093
HF: 47.8933830871

These are the areas under the frequency plot at the frequency spectra of interest. Remember from the theory that HF is related to breathing and LF to short-term blood pressure regulation. The measures have also been implicated in increased mental activity.


Wrapping It All Up
We’ve come a long way and can extract many meaningful measures from a heart rate signal now. However, chances are the module will fail if you input other heart rate data, because it may be more noisy than our ideal data, or may contain artefacts. The third part of this series will deal with signal filtering, error detection and outlier rejection.

 

 


26 Comments

  • santosh pandure

    29th August 2016

    respected sir,
    can you give me your email id for communication. i need your help.

    Reply
    • Paul van Gent

      29th August 2016

      What can I help you with? Please ask the questions in the comments as they may help others as well.

      Reply
  • Anupam

    29th November 2016

    Dataset values are ranging from 400 to 700 but the arduino output is ranging from 40 to 70. Should I just multiply it by 10? Do you happen to have an optimized code of Pulse sensor for the Arduino?

    Reply
    • Paul van Gent

      29th November 2016

      That depends on how you’re reading from the Arduino. Check the code you are you using there. If you use an analog read it will have a range of 0-1023. Never change values in a dataset because they are not what you expect!

      Reply
  • Alex

    11th January 2017

    Thank you for the nice tutorial. I have a question, is it possible to use heart rate directly to calculate these features? or are there any other features that can be calculated directly from heart rate and not from IBI?

    Reply
    • Paul van Gent

      12th January 2017

      I’m not completely sure what you mean. Do you mean you have a BPM value and want to calculate the other measures? You can use the BPM value to get the average IBI over a section, but for most measures you need each peak position.

      Reply
      • Alex

        12th January 2017

        Yes exactly! Is it possible for frequency domain features: If I first convert BPM to RR intervals, detect peaks in that signal and then calculate frequency domain features? or this approach is wrong?

        Reply
        • Paul van Gent

          17th January 2017

          Hi Alex,

          Unfortunately I would not recommend this in most cases, but it depends on the resolution of the BPM data.
          The BPM value is usually an aggregate measure over multiple beats. Converting to RR-intervals means guessing what RR-intervals make up a segment of several beats. This state is unknown and has many possible permutations, estimating it will likely lead to unreliable results.

          You can quite easily create an arduino heart rate measurement system. The easiest is to use a Pulse Sensor or similar optical HR sensor. You could also create an electrical heart rate sensor with two contacts, a few op-amps and an optocoupler (always electrically de-couple circuitry attached to your body!)

          Reply
  • Alex

    18th January 2017

    Hi Paul,

    Thank you for your detailed response. Can you tell me, what steps I can follow from this tutorial to calculate HRV features if I have inter-beat-interval data?

    Reply
    • Paul van Gent

      18th January 2017

      If you have all inter-beat-intervals it is quite easy. You can do two things:

      – You can take a look at the formulas for the varying HRV measures in this tutorial or on other places on the web
      – If you want to use the code from this tutorial series, you can synthesize an R-R list from the IBI intervals. Example, if you have an IBI list of [500, 450, 500], this would mean you have data for 4 beats. These are at 0, 500, 950 and 1450ms respectively. Note that you need a per-beat-IBI, not an aggregate!

      Good luck 🙂

      Reply
  • Ken

    23rd January 2017

    May I ask a question?
    I’ve been analysing ECG data using python(so this page is really helpful for me, thank you). I found that other research articles or web pages about HRV always use PSD(Power Spectral Density) to calculate LF and HF(In this page, you’re using amplitude spectrum, aren’t you?). What I want to know is can we use amplitude spectrum to calculate LF(and HF)?

    Reply
    • Paul van Gent

      23rd January 2017

      Hi Ken,

      That is correct and I applaud you for noticing. Indeed I use the Amplitude Spectrum in this part. Doing a PSD can get quite complicated. In practice you can approximate the PSD with the absolute Fourier Transform magnitude squared. So if your Fourier function is F(x), then an approximation of the PSD is |F(x)|^2.
      Another approach is to use a periodogram, see this link.

      Both are quick and dirty estimators of the PSD, and both suffer from similar drawbacks. To convolute matters people often use the term PSD for the above or more complex methods interchangably.

      I’m stil writing part 4 in this series (have been for a while, I seem to be stuck in a time-vortex), where I actually go into this among other things. Before that, however, I need to increase my grasp of the methods as well :).

      Reply
      • Ken

        24th January 2017

        Thank you for the early reply, Paul!
        So in this case, squared Y means the rough values of PSD, right?
        example code:
        PSD = (np.fft.fft(f(RR_x_new))/n) ** 2 # Is this correct?

        I actually tried to use periodogram, but the output looked different from what I expected. I couldn’t change any optional parameters except fs at that time. So do we need to change any other options to calculate PSD?(Sorry for asking many questions. I am not familiar with wave analysis…)

        Reply
        • Paul van Gent

          24th January 2017

          You need to square every bin in the |FFT| spectrum. I’m not in the position to test your code right now. Plot them both and see if it seems right.

          However, note that the |F(x)|^2 is a very dirty way of estimating the PSD. One of the main problems is that adding more data does not add more variance, while a PSD refers to the spectrum energy per time unit. You can now see why it’s a dirty approximation. Doing a PSD the correct way is quite complex. However many papers that mention PSD seem to use the “absolute FFT squared” method, depending on what software they used to analyse the HRV data.

          Reply
          • Ken

            25th January 2017

            Oh I forgot to add the absolute function,
            PSD = np.abs((np.fft.fft(f(RR_x_new))/n)) ** 2 # This is the right one, I think.
            The output using this code seemed right(values of lf and hf are reasonable). But there is still difficulty using periodogram one(both lf and hf are nearly 0). So I am going to use fft this time. Thank you for your advice.

          • Paul van Gent

            26th January 2017

            You’re welcome. Glad it was of help.

  • Marcus Holzberger

    7th July 2017

    Hello Paul,
    first: very nice tutorial.
    I am not sure for which reason you calculate the rolling mean. You detect the Peak indexes from this. Is there any disadvantage to detect the peaks from the real signal? I do this with peakutils: https://pypi.python.org/pypi/PeakUtils.
    with this peakutil.indexes it is very easy to make own thresholds, so that you can also detect the peaks in the T-waves for example.

    Reply
    • Paul van Gent

      7th July 2017

      Hi Marcus,
      A standard way of detecting peaks mathematically is to find intersections with some threshold, then solve for the maximum between two intersections. The rolling mean functions as this threshold in the code.

      Peak detection with other libraries is of course also a possibility, but in a tutorial series that is mostly about signal analysis I prefer to be more explicit in how things are calculated, and not rely on external libraries.

      Be sure to keep an eye on the site, I’ve almost finished part 4 about optimising and improving the whole thing.

      Reply
      • Marcus Holzberge

        8th July 2017

        Ah Ok, in part 3 it becomes reasonable for me. I should have read this before. I’m not sure if my code with peakutils can handle this.

        Reply
        • Paul van Gent

          8th July 2017

          Good to hear part 3 helped.

          As a general tip I would always recommend you to develop your own solution first before jumping onto modules (given you have the time). If anything, this will give you a good grasp of the problem and the possible solutions. This will help prevent things breaking later on.

          Once you finish the series I recommend you take a look at the repo, there’s some optimisations and generalisations there: https://github.com/paulvangentcom/heartrate_analysis_python

          Reply
  • Dean

    7th October 2017

    Hi Marcus, may I ask a question?
    In code, Is NN50 = nn50 ??
    Thanks for your tutorial!!
    code:
    nn20 = [x for x in RR_diff if (x>20)]
    nn50 = [x for x in RR_diff if (x>50)]
    pnn20 = float(len(NN20)) / float(len(RR_diff))
    pnn50 = float(len(NN50)) / float(len(RR_diff))

    Reply
    • Paul van Gent

      10th October 2017

      Hi Dean, yes I think that should indeed refer to nn50

      Reply
  • ria

    16th November 2017

    hi
    i completed part 1 and it was amazing and in part 2 peaklist gives me a key error should i add this code with heartbeat.py or seperately?
    .

    Reply
  • John

    15th December 2017

    Hi Paul, thank you for the tutorial and the hard work to put it together.

    I’d appreciate if you’d allow me to suggest a more “pythonian” way to calculate pNNx

    def pNN(x,t=50):
    “”” The proportion of differences greater than t ms. t defaults to 50ms; x is list of RR differences. “””
    pnnt=[i for i in x if (i>t)]
    return float(len(pnnt))/float(len(RR_diff))

    Then call pNN20(RR_diff, 20), pNN50(RR_diff), pNN80(RR_diff, 80) etc. This way other percentages/ratios are also covered without being explicitly calculated.

    Thank you for this excellent piece of work!

    Reply
    • Paul van Gent

      16th December 2017

      Hi John, thanks for the suggestion. That is indeed more pythonian!
      – Paul

      Reply

Leave a Reply