Analyzing a Discrete Heart Rate Signal Using Python – Part 1

HB
This the first part in a four part series about how to analyse a continuous heart rate signal, extract those measures often used in scientific publications, and filter the signal and detect error.

Important: The code in this tutorial is licensed under the GNU 3.0 open source license and you are free to modify and redistribute the code, given that you give others you share the code with the same right, and cite my name (use citation format below). You are not free to redistribute or modify the tutorial itself in any way. By reading on you agree to these terms. If you disagree, please navigate away from this page.
Citation format
van Gent, P. (2016). Analyzing a Discrete Heart Rate Signal Using Python. A tech blog about fun things with Python and embedded electronics. Retrieved from: http://www.paulvangent.com/2016/03/15/analyzing-a-discrete-heart-rate-signal-using-python-part-1/

Github rep: Github repository located here


The following tutorial assumes intermediate knowledge of the Python programming language, FIR-filters and fast fourier transform methods. The tutorial should be suitable for those with intermediate levels of Python skill. It is divided into separate parts so that you can easily skip over those parts you understand. The data was collected with a simple data logger based on an ATTiny45, as described in another post.


Some theory and background
The heart rate contains a lot of information about you. Apart from the obvious number of beats per minute there is information about your health, physical fitness and even mental activity in there. If you got yourself a heart rate sensor and some data, you could of course buy an analysis package or try some of the open source things available, but not all of them might do the things you want or need. This was the case for me as well. So, why not try to make one yourself? If you’re reading this, chances are you want to try this. This tutorial series is for those interested in learning more about heart rate analysis and how to write a simple but effective analysis algorithm in Python using a few basic modules.
For the first part of this series there isn’t a lot of relevant theory. Just know that when talking about the shape of the heart rate signal, each beat is characterized by a QRS-complex as shown below in a.), I-III (Q-R-S). There is also a P-wave (IV), and a T-wave (V). The R component is the large peak (II). This is the one of interest to us:


In photoplethysmogram signals (PPG) the signal is slightly different. The PPG signal is shown in b.) of the image. Components here are the Diastolic peak (I), which is the point of highest blood pressure, and the Diastolic peak (III). The two waves are separated by what is called the Dicrotic Notch (II). In this case the Systolic Peak (I) is used for heart rate extraction.
For the development of a peak-detection algorithm this is a nice coincidence: for both signals we need to label the highest peak in each complex.


First things first
First let’s download the dataset and plot the signal, just to get a feel for the data and start finding ways of meaningfully analysing it. I use pandas for most of my data tasks, and matplotlib for most plotting needs.

import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv("data.csv") #Read data from CSV datafile
plt.title("Heart Rate Signal") #The title of our plot
plt.plot(dataset.hart) #Draw the plot object
plt.show() #Display the plot

step1_output
The signal looks nice and clean. Keep in mind that this will not always be the case, even with good sensors and especially not when measuring outside the lab! More on dealing with noise and automatically determining signal quality in part 3.


Detecting the first peaks
The first step is to find the position of all the R-peaks. To do this we need to determine Regions of Interest (ROI’s), namely for each R-peak in the signal. After we have these, we need to determine their maxima. There are a few ways to go about doing this:

  • Fit a curve on the ROI datapoints, solve for the x-position of the maximum;
  • Determine the slope between each set of points in ROI, find the set where the slope reverses;
  • Mark datapoints within ROI, find the position of the highest point.

The first provides an exact mathematical solution (more accurately: as exact as you want), but is also the most expensive computationally. With our example dataset the difference does not matter, but at the time I was developing this algorithm I had to go through almost half a billion datapoints quickly, and future datasets seemed only to get larger. Most of this code will also be re-written to C for use on an embedded ARM chip, so we need to keep it simple and efficient.
The latter two methods are computationally much less expensive, but also less precise. High precision with these methods relies much more strongly on a high sampling rate than it does with the curve fitting method. After all, the actual maximum of the curve will probably lie somewhere between two data points rather than on an actual data point, with a higher sampling rate this error margin will decrease. In Part 4 (currently in writing) we will also work with the curve fitting method for a more accurate approximation of the R-peaks, for now we’ll use the position of the highest point in the ROI as the position of the beat.
Now to work: first separate the different peaks from one another. For this we draw a moving average, mark ROI’s where the heart rate signal lies above the moving average, and finally find the highest point in each ROI as such:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
dataset = pd.read_csv("data.csv")
#Calculate moving average with 0.75s in both directions, then append do dataset
hrw = 0.75 #One-sided window size, as proportion of the sampling frequency
fs = 100 #The example dataset was recorded at 100Hz
mov_avg = dataset['hart'].rolling(int(hrw*fs)).mean() #Calculate moving average
#Impute where moving average function returns NaN, which is the beginning of the signal where x hrw
avg_hr = (np.mean(dataset.hart))
mov_avg = [avg_hr if math.isnan(x) else x for x in mov_avg]
mov_avg = [x*1.2 for x in mov_avg] #For now we raise the average by 20% to prevent the secondary heart contraction from interfering, in part 2 we will do this dynamically
dataset['hart_rollingmean'] = mov_avg #Append the moving average to the dataframe
#Mark regions of interest
window = []
peaklist = []
listpos = 0 #We use a counter to move over the different data columns
for datapoint in dataset.hart:
    rollingmean = dataset.hart_rollingmean[listpos] #Get local mean
    if (datapoint < rollingmean) and (len(window) < 1): #If no detectable R-complex activity -> do nothing
        listpos += 1
    elif (datapoint > rollingmean): #If signal comes above local mean, mark ROI
        window.append(datapoint)
        listpos += 1
    else: #If signal drops below local mean -> determine highest point
        maximum = max(window)
        beatposition = listpos - len(window) + (window.index(max(window))) #Notate the position of the point on the X-axis
        peaklist.append(beatposition) #Add detected peak to list
        window = [] #Clear marked ROI
        listpos += 1
ybeat = [dataset.hart[x] for x in peaklist] #Get the y-value of all peaks for plotting purposes
plt.title("Detected peaks in signal")
plt.xlim(0,2500)
plt.plot(dataset.hart, alpha=0.5, color='blue') #Plot semi-transparent HR
plt.plot(mov_avg, color ='green') #Plot moving average
plt.scatter(peaklist, ybeat, color='red') #Plot detected peaks
plt.show()

Firstpeaks
We have marked the highest point in each R-complex in our signal, not bad! However, this is an idealized signal. In part 3 of this series we will see how to deal with noise, measurement error, detection error and prevent our module from throwing errors at poor quality signal sections.
Some may ask: why the moving average, why not just use a horizontal line at around 700 as a threshold? This is a valid question and would work fine with this signal. Why we use a moving average has to do with less idealized signals. The amplitude of the R-peaks can change over time, especially when the sensor moves a bit. The amplitude of the smaller secondary peak can also change independently of the amplitude of the R-peak, sometimes having almost the same amplitude. One approach to reducing incorrectly detected peaks discussed in part 3 is by fitting the moving average at different heights and determining which fit seems best. More on this later.


Calculating heart rate
We know the position of each peak in time, so calculating the average ‘beats per minute’ (BPM) measure over this signal is straightforward. Just calculate the distance between the peaks, take the average and convert to a per minute value, like so:

RR_list = []
cnt = 0
while (cnt < (len(peaklist)-1)):
    RR_interval = (peaklist[cnt+1] - peaklist[cnt]) #Calculate distance between beats in # of samples
    ms_dist = ((RR_interval / fs) * 1000.0) #Convert sample distances to ms distances
    RR_list.append(ms_dist) #Append to list
    cnt += 1
bpm = 60000 / np.mean(RR_list) #60000 ms (1 minute) / average R-R interval of signal
print "Average Heart Beat is: %.01f" %bpm #Round off to 1 decimal and print

Also update the plot method to show the BPM in the legend:
plt.title("Detected peaks in signal")
plt.xlim(0,2500)
plt.plot(dataset.hart, alpha=0.5, color='blue', label="raw signal") #Plot semi-transparent HR
plt.plot(mov_avg, color ='green', label="moving average") #Plot moving average
plt.scatter(peaklist, ybeat, color='red', label="average: %.1f BPM" %bpm) #Plot detected peaks
plt.legend(loc=4, framealpha=0.6)
plt.show()

BPM2
Pretty cool! You have taken the first step in analyzing the heart rate signal. The number of beats per minute is a useful measure that is very often used in scientific studies and has many non-scientific uses as well, but the signal contains much, much more information. Part 2 of this series will deal with extracting more complex information from the heart rate signal.


Rounding up
Finally let’s tidy up our code and put it in callable functions. This will make our life much easier in the next part, and our code much more organized and re-usable. Note that probably the tidy thing to do is to make the functions part of a Class, but to keep the tutorial accessible also to those less experienced in Python (and perhaps not familiar or confident with classes), I’ve chosen to omit this from all code in this tutorial series.
Let’s put the BPM value and the lists we calculate in a dictionary that we can call, and can append with the measures we will calculate in part 2. Also let’s write a wrapper function process() so that we can call our analysis with as little code as possible:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
measures = {}
def get_data(filename):
    dataset = pd.read_csv(filename)
    return dataset
def rolmean(dataset, hrw, fs):
    mov_avg = dataset['hart'].rolling(int(hrw*fs)).mean()
    avg_hr = (np.mean(dataset.hart))
    mov_avg = [avg_hr if math.isnan(x) else x for x in mov_avg]
    mov_avg = [x*1.2 for x in mov_avg]
    dataset['hart_rollingmean'] = mov_avg
def detect_peaks(dataset):
    window = []
    peaklist = []
    listpos = 0
    for datapoint in dataset.hart:
        rollingmean = dataset.hart_rollingmean[listpos]
        if (datapoint < rollingmean) and (len(window) < 1):
            listpos += 1
        elif (datapoint > rollingmean):
            window.append(datapoint)
            listpos += 1
        else:
            maximum = max(window)
            beatposition = listpos - len(window) + (window.index(max(window)))
            peaklist.append(beatposition)
            window = []
            listpos += 1
    measures['peaklist'] = peaklist
    measures['ybeat'] = [dataset.hart[x] for x in peaklist]
def calc_RR(dataset, fs):
    RR_list = []
    peaklist = measures['peaklist']
    cnt = 0
    while (cnt < (len(peaklist)-1)):
        RR_interval = (peaklist[cnt+1] - peaklist[cnt])
        ms_dist = ((RR_interval / fs) * 1000.0)
        RR_list.append(ms_dist)
        cnt += 1
    measures['RR_list'] = RR_list
def calc_bpm():
    RR_list = measures['RR_list']
    measures['bpm'] = 60000 / np.mean(RR_list)
def plotter(dataset, title):
    peaklist = measures['peaklist']
    ybeat = measures['ybeat']
    plt.title(title)
    plt.plot(dataset.hart, alpha=0.5, color='blue', label="raw signal")
    plt.plot(dataset.hart_rollingmean, color ='green', label="moving average")
    plt.scatter(peaklist, ybeat, color='red', label="average: %.1f BPM" %measures['bpm'])
    plt.legend(loc=4, framealpha=0.6)
    plt.show()
def process(dataset, hrw, fs): #Remember; hrw was the one-sided window size (we used 0.75) and fs was the sample rate (file is recorded at 100Hz)
    rolmean(dataset, hrw, fs)
    detect_peaks(dataset)
    calc_RR(dataset, fs)
    calc_bpm()
    plotter(dataset, "My Heartbeat Plot")

Call it like this:
import heartbeat as hb #Assuming we named the file 'heartbeat.py'
dataset = hb.get_data("data.csv")
hb.process(dataset, 0.75, 100)
#We have imported our Python module as an object called 'hb'
#This object contains the dictionary 'measures' with all values in it
#Now we can also retrieve the BPM value (and later other values) like this:
bpm = hb.measures['bpm']
#To view all objects in the dictionary, use "keys()" like so:
print hb.measures.keys()

Note that we have kept the get_data() function separate from our wrapper. This way our module can also accept dataframe objects already stored in memory, useful for example when these are generated by another program. This keeps our module flexible.
Pt1_finalplot2


78 Comments

  • Tu

    January 17, 2017

    Is the data.csv file have inter beat interval values in millisecond ?

    Reply
    • Paul van Gent

      January 17, 2017

      No, it is the raw heart rate signal.

      Reply
      • Tu

        January 17, 2017

        You mean the heart beats ? can you please tell the unit?

        Reply
        • Paul van Gent

          January 17, 2017

          What you see is the change in sensor conductance in response to the discoloration of the skin as the heart beats. There is an arbitrary time scale, but that is not relevant to the tutorial. Follow the series and it should become clear!

          Reply
  • Manujaya

    April 21, 2017

    sir when i compile this code there is an error given

    raise ValueError(“window must be an integer”)
    ValueError: window must be an integer”
    can you please help me to solve this problem. This is really need for my project
    thank you

    Reply
    • Paul van Gent

      April 21, 2017

      Hi Manujaya. The error should tell you all you need to fix it. I suggest you start with a few more basic Python​ tutorials. This one assumes at least an intermediate familiarity.

      Reply
      • edge-python

        April 2, 2018

        use in Calculate moving average: window=int(hrw*fs)

        Reply
  • Pingback: Start coding for Bobbi – the open source ECG monitor! | Pavlov | A dash of code, a pinch of science

  • Kamil

    May 25, 2017

    Could you tell me how to process the ECG signal, which I have stored as a ndarray? It’s a standard numpy ndarray with one row (but there isn’t any key like ‘hart’). I wanted to pass it to the processing functions.

    Reply
  • Eina

    June 14, 2017

    Hai Mr. van Gent,
    I’m doing this project for my thesis. I have some question that I want to ask.
    1. Is the data in csv raw (not in Hz)? If yes, why it’s so high? is it like the frequency of the pulse or the pulse it self? Because I’m trying to implement your code in my data that I get.
    2. If you get the raw pulse and you change it to frequency (in Hz), how did you change it? Because I haven’t found it yet.
    Thanks

    Reply
    • Paul van Gent

      June 14, 2017

      Hi Eina.
      1. Yes it’s raw sensordata. I’m not sure what you mean here by ‘why is it so high’. I would suggest you follow the tutorial and plot the data, then you see why it’s sensordata. What sensor are you using and what data comes out of it?
      2. What do you mean with ‘raw pulse’? If you mean the raw signal, that’s what the tutorial series is about. If you mean you already have BPM but want Hz, well, BPM=beats per minute, think of Hz as beats-per-second.
      Let me know.
      -Paul

      Reply
  • Marcus Holzberger

    July 7, 2017

    Very nice . Thank you

    Reply
  • Santosh

    July 8, 2017

    Hai Mr. van Gent,
    i am research student i am working on real time heart rate monitoring and analysis signal and calculate RR interval ,can i do real time data analysis code using in python
    for access real time data using pulse sensor please help me how can access? .

    Reply
    • Paul van Gent

      July 8, 2017

      Santosh, I’m not sure what you mean exactly. Do you want to read the sensor? Do you want help with the algorithm? Let me know
      -Paul

      Reply
      • Santosh

        November 17, 2017

        Hai Mr. van Gent
        i am working disease prediction using HRV , so my question is how to predict diseases using real time HRv and which algorithm is best for prediction

        Reply
    • Wadaane

      July 21, 2017

      Hello,
      Me too, I also wanted to do real time analyse and graph, but matplotlib animations are too slow.
      I gave up the idea to draw all datapoint, I think to reduce the number of samples to draw.
      You get me ?
      Like 10 beats take almost 4000 samples at 500Hz, I’ll use the 4000 samples for analysing, but only draw half, or quarter, like draw every fourth datapoint.
      I didn’t try it yet, but I think that’ll speed things up.

      Reply
      • Paul van Gent

        July 21, 2017

        That probably won’t work, the library isn’t fast enough for anything real-time. You could look at some c or c++ libs. An alternative is to buffer 5 sec, analyse&plot, and repeat

        Reply
  • Anam

    July 8, 2017

    Hello Paul, could you explain why you took 0.75s as your window size? What proportion are you talking about?

    Reply
    • Paul van Gent

      July 8, 2017

      Hi Anam. I see I’ve made a mistake in the code. It’s not 0.75s (unless your sampling rate is 60Hz), but rather 0.75*sampling rate in both directions. The reason for this is that the larger this threshold, the less like the HR signal the rolling average will be, the smaller, the more similar. I would recommend you plot the signal + rolling average with different window sizes. See what happens, and why in this case 0.75 (or similar sizes) work.
      -Paul

      Reply
      • Anam

        July 8, 2017

        Yep my signal is actually quite noisy and with 0.75, it actually considered the T-peaks to be R as well! I experimented with the value a little and 0.45-0.5 helped. Another problem I’m having is that my ECG sensor gives me the signal shape in a negative range of voltages sometimes (I think I need to read up on lead configurations a little) so I actually have to multiply the moving average by a value lesser than 1 instead of 1.2 or 1.1 like you’ve used. So I guess I’m just stuck with good ol’ trial and error!

        Reply
        • Paul van Gent

          July 8, 2017

          Before tweaking I recommend you pass through part 2 and part 3 as well. There are some bits about signal filtering there, which is (always) a good idea, especially if you are having troubles with noise.
          Alternatively, if you don’t care about how it works, look at my repo: https://github.com/paulvangentcom/heartrate_analysis_python

          Reply
        • Paul van Gent

          July 8, 2017

          Regarding the negative range: you can consider shifting it all up by some amount so it’s all positive, or cut off the negative ranges (if these don’t contain peak data, which I suspect based on your explanation).

          Reply
  • Becky

    August 31, 2017

    Hi,
    I tried the final version on github for my data-set but was receiving an error and just decided to walk through the tutorial and make sure I was following. I know there is more to come but wanted to point out a couple of adjustments I had to make at this point for things to work:
    1. In the peak detection loop, I had to add in a check for ‘nan’. I also added in the option to start the peak detection at a later sample because a had a lot of nan values at the start that were throwing things off.
    if ((datapoint < rollingmean) and (len(window) < 1)) or (math.isnan(datapoint)) or (listpos<3000):
    2. Negative values seemed to be an issue so I subtracted off the min if it was negative:
    ECGmin=np.nanmin(dataset.ECG)
    if ECGmin<1: dataset.ECG=dataset.ECG-ECGmin
    I'm just curious if there was something I missed in adding these work arounds or thought I would share if they are helpful to others.
    This library is great, thanks for sharing!

    Reply
    • Paul van Gent

      August 31, 2017

      Hi Becky. Thanks for your suggestions!
      1. That is correct, NaN’s are not handled. I Recommend interpolating your signal to fill in the NaN values, rather than handling them separately in the detection. I’ll put an interpolation preprocessing handler on the to-do list for github
      2. Correct, I recently ran into this when using a different ECG device as well, as well as a device where the signal needed to be flipped in its entirety. The fixes are there but not merged to github yet, on the to-do list.

      Reply
  • Meera

    September 6, 2017

    Hi,
    Your project is amazing. Thank you. Is it also possible to apply the moving average filter in real-time? As in when I’m acquiring the signal at 20 Hz i would like to see my raw data, filtered data and beats per minute.

    Reply
    • Paul van Gent

      September 11, 2017

      Hi Meera. 20Hz is much too low to get any meaningful results, I would look at 100Hz minimum. Python is not fast enough for this kind of real-time thing. You could, however, generate one plot each second..

      Reply
  • Amelia

    September 15, 2017

    Hello Paul, your work is superb… I came across it while I was working on a project of wireless ecg transmission, and thought of using the same to verify whether the ecg has been received at the receiver side correctly- using the total number of beats and bpm…..Unfortunately I had some trouble with the python language and sorry to ask this but the .hart parameter – what exactly is that – when I run the code I often get errors saying the hart parameter is not available for the dataset etc.
    <the portions I am refering to are:

    plt.plot(dataset.hart)

    ….
    mov_avg = pd.rolling_mean(dataset.hart, window=(hrw*fs))
    ….
    etc…. Is it something that is available only to .csv files with both time and hr columns?
    (Sorry if the question is inapproprriate w.r.t to the topic…but it would be really helpful if you could answer this)

    Reply
    • Paul van Gent

      September 19, 2017

      Hi Amelia. Thanks for the kind words!
      the .hart parameter refers to a specific column in the dataframe, so if you name your dataframe “df”, then you can access the column “hart” by calling df.hart. Does that make sense?
      As a reference, take a look at the github version, this drops the Pandas dependency and adds some optimizations. You might find using this one + documentation easier than following the tutorial if you’re not that familiar with Python.

      Reply
  • Amelia

    September 15, 2017

    (also my sampling frequency is 244 Hz….will that sampling frequency have any negative effects on the calculation?
    -also how large the sample dataset can be (i.e. the minimum maximum allowed dataset size) for the code to work well..?

    Reply
    • Paul van Gent

      September 19, 2017

      That should be fine. Generally a higher sampling rate is better as the signal will contain more data (resulting in more precision) to base measures on.

      Reply
  • amulya

    October 11, 2017

    hi
    at step 2 plotting peaks….yea……im getting an error like
    validate
    raise ValueError(“window must be an integer”)
    ValueError: window must be an integer
    what do i do

    Reply
    • Paul van Gent

      October 12, 2017

      What’s happening is that the Pandas function ‘rolling_mean’ expects the window size to be an integer. You can ensure an integer either by setting the window size parameters appropriately (lines 9,10), or force it like so:
      mov_avg = pd.rolling_mean(dataset.hart, int(window=(hrw*fs)))
      -Paul

      Reply
      • emily

        October 24, 2017

        i seem to have a issue : window in an undefined function.

        Reply
        • Paul van Gent

          October 25, 2017

          Emily, please share the details. With this I have no idea what’s happening. You can either post it here or mail me your code and a screenshot of the full error message to info@paulvangent.com
          -Paul

          Reply
    • M. Chowdhury

      October 30, 2017

      mov_avg = pd.rolling_mean(dataset.hart, window=int(hrw*fs))
      worked for me, even though I got some warning. You can try and update us.

      Reply
  • amulya

    October 24, 2017

    ohhh okay thats a lot of help thank you so much and for this blog.

    Reply
  • General

    November 1, 2017

    Amazing tutorial THANK YOU SO MUCH!

    Reply
  • meredith

    November 3, 2017

    the code
    ***
    dataset = pd.read_csv(“data.csv”)
    #Calculate moving average with 0.75s in both directions, then append do dataset
    hrw = 0.75 #One-sided window size, as proportion of the sampling frequency
    fs = 100 #The example dataset was recorded at 100Hz
    mov_avg = pd.rolling_mean(dataset.hart, int(window=(hrw*fs)))
    #Impute where moving average function returns NaN, which is the beginning of the signal where x hrw
    avg_hr = (np.mean(dataset.hart))
    mov_avg = [avg_hr if math.isnan(x) else x for x in mov_avg]
    mov_avg = [x*1.2 for x in mov_avg] #For now we raise the average by 20% to prevent the secondary heart contraction from interfering, in part 2 we will do this dynamically
    dataset[‘hart_rollingmean’] = mov_avg #Append the moving average to the dataframe
    #Mark regions of interest
    window = []
    peaklist = []
    listpos = 0 #We use a counter to move over the different data columns
    for datapoint in dataset.hart:
    rollingmean = dataset.hart_rollingmean[listpos] #Get local mean
    if (datapoint < rollingmean) and (len(window) do nothing
    listpos += 1
    elif (datapoint > rollingmean): #If signal comes above local mean, mark ROI
    window.append(datapoint)
    listpos += 1
    else: #If signal drops below local mean -> determine highest point
    maximum = max(window)
    beatposition = listpos – len(window) + (window.index(max(window))) #Notate the position of the point on the X-axis
    peaklist.append(beatposition) #Add detected peak to list
    window = [] #Clear marked ROI
    listpos += 1
    ybeat = [dataset.hart[x] for x in peaklist] #Get the y-value of all peaks for plotting purposes
    plt.title(“Detected peaks in signal”)
    plt.xlim(0,2500)
    plt.plot(dataset.hart, alpha=0.5, color=’blue’) #Plot semi-transparent HR
    plt.plot(mov_avg, color =’green’) #Plot moving average
    plt.scatter(peaklist, ybeat, color=’red’) #Plot detected peaks
    plt.show()
    ***
    i did this it says window in mov_avg = pd.rolling_mean(dataset.hart, int(window=(hrw*fs)))
    TypeError: ‘window’ is an invalid keyword argument for this function
    please help im new to python but i need to code for learning purposes i tried my best.

    Reply
    • Paul van Gent

      November 8, 2017

      Hi Meredith, what version of pandas are you using? I believe the “rolling_mean()” function was planned to change to “rolling()” in newer versions.

      Reply
      • Paul van Gent

        November 11, 2017

        If you’re using the VS command prompt, make sure you use the correct instruction set. Use the 64-bit VS prompt if you’re trying to build a 64-bit library, etc.

        Reply
  • meredith

    November 10, 2017

    hi paul thanks for replying my pandas version is ‘0.18.1’

    Reply
    • Paul van Gent

      November 11, 2017

      Try calling rolling() in stead of rolling_mean.
      – Paul

      Reply
  • meredith

    November 13, 2017

    it says module pandas has no attribute rolling…..so it does not contain the function?
    thanks paul

    Reply
    • amrita

      June 28, 2018

      hey did you solve the rolling problem ?

      Reply
      • palkab

        June 28, 2018

        I’ve updated to code to reflect a possible solution using pandas. Also take a look at the repo since I dropped the dependency on pandas completely during development.

        – Paul

        Reply
  • ann

    November 15, 2017

    hi paul,
    in 2nd code i just changed hrw= any integer without fractions and i get the output otherwise no, why does this happen?

    Reply
    • Paul van Gent

      December 4, 2017

      Hi Ann. This is an issue with the current implementation. As soon as my paper about the algorithm is published, I’ll update everything (and put part 4 online). For now use floor division (“//” in stead of “/”), or wrap the division in int(). It doesn’t work because the rolling window function expects an integer.

      Reply
  • Sree

    November 20, 2017

    Hi,
    Great Tutorial!
    How would I modify this code to read in input from audio source (say I have a heart sound recording).
    Could I use:
    from scipy.io import wavfile as wav
    rate, data = wav.read(‘heart_sound.wav’)
    In this case would my data be equivalent to dataset.hart?
    Thanks again for the great tutorial!

    Reply
    • Paul van Gent

      December 4, 2017

      Hi Sree, We talked on Github. I will try to implement a quick fix for you asap.

      Reply
  • anu

    November 21, 2017

    can i please have your mail paul cause i could send screenshots of the signals i have cause there’s a question i have

    Reply
  • Samer

    December 3, 2017

    Hi Paul!
    I am loading an already recorded raw ECG signal and filtering it:
    x = []
    with open(‘C:/Users/PC/Downloads/SamirECG(Win).txt’,’r’) as csvfile:
    dat = csv.reader(open(‘C:/Users/PC/Downloads/SamirECG(Win).txt’,’r’))
    for row in dat:
    x.append(float(row[0]))
    x = np.transpose(x)
    dataset = x[50:2000]
    f2 = np.convolve(dataset, np.ones((7,)) / 7, mode = ‘valid’)
    Things are working well regarding this issue, but when I try to run the first part of your code, which does the moving average calculation:
    hrw = 0.75 #One-sided window size, as proportion of the sampling frequency
    fs =500 #The example dataset was recorded at 100Hz
    mov_avg = pd.rolling_mean(f2.hart, window=(hrw*fs))
    I keep getting the following error:
    AttributeError: ‘numpy.ndarray’ object has no attribute ‘hart’
    Could you please help me fix this problem?
    Your help is appreciated,
    thanks!

    Reply
    • Paul van Gent

      December 4, 2017

      Hi Samer. The implementation on this site uses Pandas DataFrame objects as data handlers. These, unlike numpy arrays that you’re using, can use named data retrieval. In the case of pandas: f2.hart, or similarly, f2[‘hart’].values, would return the values in the column with the header ‘hart’.
      The quickest way for you to get moving is to clone the numpy-based code from my github (https://github.com/paulvangentcom/heartrate_analysis_python). This code natively handles numpy arrays. Alternatively, you can open your csv using pandas and put the ECG data in a column named ‘hart’.
      Note that if you’re using raw ECG values, you need to transpose the entire signal up so that there are no negative values. This is addressed in the final part of the tutorial which will go online early 2018.
      – Paul

      Reply
  • John

    December 15, 2017

    Hi Paul:
    (Please, see my thanks for your work on the comments section of part 2!)
    Many reading your code have been complaining about an integer error on parameter ‘window’ in pandas.rolling_mean. There seems to be a glitch of some sort. I was getting the error too.
    Here is what I’ve noticed:
    1. forcing window=int(hrw*fs) gives the same error
    2. typing “mov_avg = pd.rolling_mean(dataset.hart, window=int(hrw*fs))” on ipython works. Same goes for “mov_avg=dataset[‘hart’].rolling(window=75,center=False).mean()”
    3. calling the functions from the main body of heartbeat.py also works fine when window=int(hrw*fs).
    My recommendation to ones having this problem is calling the functions like this:
    >>import heartbeat
    >>dataset=get_data(‘data.csv’)
    >>process(dataset,0.75,100)
    >>bpm=measures[‘bpm’]
    >>print measures.keys()
    I’m using Python2.7 and tested the above on Spyder 3.1.4 under Anaconda2 distribution.
    I’m not really sure where the heart of the problem is. Just seem to have found a workaround.
    Thank you (and any reading),
    John

    Reply
    • Paul van Gent

      December 16, 2017

      Thanks. I hop others will benefit from this as well. Pretty soon I hope my software paper about this is published and I can release part 4, which contains many changes including dropping pandas. You can sneak peak in my github for few changes.
      – Paul

      Reply
  • Nat

    January 22, 2018

    Hi, Paul. loving this tutorial. It’s really good. Can you eyeball the timeframe until possible pip install? I’m also wondering why is “dropping dependencies on pandas ” on your to do? I’m curious about the reasoning behind it 🙂 Keep up the good work!

    Reply
    • Paul van Gent

      January 25, 2018

      Hi Nat. Thanks for the comment. I’m dropping pandas by default because I want to reduce dependencies as much as possible and make the package more versatile. Not everyone might have pandas, want to use it, or need to use it. Right now the github version defaults to Numpy, but of course you can also pass it data from a dataframe.
      The pip install timeframe will likely be mid-2018. As you may have read in some of my other comments here I’m working on a paper and project integration of the software, after which I’ll have more time for these kinds of things (and then I can finally release part 4). Sometimes developing these open source things within a scientific project can take a lot more time than you’d like, but then again I’m thankful the opportunity is here to keep the package open source.
      p.s. a full version in embedded C (Arduino!) is now in the final testing phase and will come online soon as well, in case you’re planning wearable projects.
      – Paul

      Reply
  • Namrata

    January 31, 2018

    Hi Paul..
    could you please provide the information about the device from which you took the readings.. and did you convert the input audio files into csv format?

    Reply
    • Paul van Gent

      January 31, 2018

      Hi Namrata. I’ve used a PPG sensor (optical), there’s plenty out there you can use. I’ve not used audio recordings. There’s multiple heart sounds per cardiac cycle, analysing heart reate by audio is less trivial because of this. There is some experimental code on my github for this, but use it sparingly: it needs to be tuned much further.

      Reply
  • Thibault

    February 25, 2018

    Hi,
    I study the sensor max30101. I found your tutorial, and he really help me to understand how exploit raw data to get Bpm.
    1- In my case, I receive 1 data every 1ms. I took arround 1k5 data to be precise as much as possible. I found that, the distance betweens two peaks is arround 30ms and do this calcul to get bpm : 1(min) / 30e-3(ms convert to min). Is that right ?
    2- I can’t understand this line : ms_dist = ((RR_interval / fs) * 1000.0)
    why you multiple all by 1000?
    Thank you for your tutorial, and your time !

    Reply
    • Paul van Gent

      February 26, 2018

      Hi Thibault.
      1 – distance between two peaks of 30 ms seems very low. This implies a heart rate of 2000 bpm. Your calculation is correct, it is “time units in minute / average R-R interval in same time units”. I use “60000 / mean(R-R) in ms” to get it in ms.
      2 – What I do is rescale the R-R intervals to be in ms-units. I divide the RR_interval by the sample rate (in Hz) to get R-R in seconds, then multiply by 1000.0 to get it in ms.
      – Paul

      Reply
      • Thibault

        February 26, 2018

        Hi Paul.
        1- It’s clearly too slow… I don’t know if it’s my measures which are false or my units when I determine distance between each peaks.
        2- Great ! it’s much clear now !
        Thank you for your explanations !

        Reply
        • Paul van Gent

          February 26, 2018

          Can you send me a sample recording of 1 or 2 minutes from your data set? You can send it to info@paulvangent.com
          – Paul

          Reply
  • Priyanka

    March 28, 2018

    Hey
    I need to convert my raw ECG waveform into a data set or a text file. How do I do it?
    I am using Python on Debian.

    Reply
    • Paul van Gent

      March 28, 2018

      What is a ‘raw ECG waveform’? What data format is it, how is it collected, etc?

      Reply
      • Priyanka

        March 29, 2018

        It is obtained directly from ECG cable connected to patient’s body. I need to use the waveform on Raspberry PI hence I want to convert it to .txt or .csv file

        Reply
        • Paul van Gent

          March 29, 2018

          That depends on how you read the sensor values. How is it hooked up to the Raspberry? What type of sensor is it?

          Reply
          • Priyanka

            April 6, 2018

            We have designed a breakout board for ECG using analog front end modules. It is an ECG sensor from Analog Devices. I need to display it on raspberry pi through spi port.

          • Paul van Gent

            April 11, 2018

            This should get you started with SPI communication. Important to keep in mind is to pull your chip select pin high when you want to transfer data. I’m abroad on holiday these weeks so I cannot make and test something for you as I don’t have any equipment with me.
            You can also look at ‘WiringPi’ to talk over SPI with a sensor.
            Plotting the data should be straightforward with matplotlib. Real-time plotting is difficult in python on raspberry, I wouldn’t put too much effort into that. You could make something that updates once or twice a second, for example.
            – Paul

  • Preyanka

    April 18, 2018

    Thank you for your code…it helped me a lot in doing my project and i hope could you please help me to find the P and T wave from the above dataset

    Reply
    • Paul van Gent

      April 23, 2018

      Hi Preyanka. I’ll add some details to the first part. There are no P and T waves in the PPG signal (technically there are no Q-R-S waves either). The signal consists of a systolic and diasystolic peak, separated by a dicrotic notch.
      If you’re using ECG data, take a look at some other algorithms out there that are for QRS (Pan-Tompkins) and P-T detection.
      – Paul

      Reply
  • ccnalu

    June 15, 2018

    for Basic analysis example@GitHub heartrate_analysis_python, run the validate pipeline code as following,
    import heartbeat as hb
    data = hb.get_data(‘data.csv’)
    fs = 100.0 #example file ‘data.csv’ is sampled at 100.0 Hz
    measures = hb.process(data, fs)
    print(measures[‘bpm’]) #returns BPM value
    print(measures[‘lf/hf’] # returns LF:HF ratio
    #Alternatively, use dictionary stored in module:
    print(hb.measures[‘bpm’]) #returns BPM value
    print(hb.measures[‘lf/hf’] # returns LF:HF ratio
    #You can also use Pandas if you so desire
    import pandas as pd
    df = pd.read_csv(“data.csv”)
    measures = hb.process(df[‘hr’].values, fs)
    print(“measures[‘bpm’])
    print(“measures[‘lf/hf’])
    then get the result as
    KeyError: ‘lf’
    please help me how to fix it, thanks.

    Reply
    • Paul van Gent

      June 20, 2018

      Hi ccnalu. You need to run the process() function with calc_fft=True if you want to be able to access the frequency-domain measures. Note that at this stage they are not super reliable yet.
      – Paul

      Reply
  • Arjun

    June 20, 2018

    Hi Paul,
    Much thanks for the code.
    I’m trying to detect Blood Pressure from PPG Signal as part of my project.
    Is there a way I could extract features from the PPG signal (Time domain…15fps) using Python libraries?

    Reply
    • Paul van Gent

      June 22, 2018

      Hi Arjun. There is not an easy way to extract blood pressure from a PPG without calibrated sensors or a standardised measurement site. The relationship depends heavily on vascular properties which vary with the measurement site. The best thing you can do is buy a commercially available finger cuff for BP.
      – Paul

      Reply
  • BruceLee

    July 2, 2018

    http://www.paulvangent.com/wp-content/uploads/2016/03/data.csv

    I cant find the heart sounds data by this website(404).

    Reply
    • palkab

      July 4, 2018

      Should be fixed now, BruceLee!

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.