This the second part in a four part series about how to use Python for heart rate analysis. In this part you will learn about more complex information embedded in the heart rate signal, and how to extract it using Python.

**Important**: The code in this tutorial is licensed under the GNU 3.0 open source license and you are free to modify and redistribute the code, given that you give others you share the code with the same right, and cite my name (use citation format below). You are **not** free to redistribute or modify the tutorial itself in any way. By reading on you agree to these terms. If you disagree, please navigate away from this page.

**Citation format
**van Gent, P. (2016). Analyzing a Discrete Heart Rate Signal Using Python.

*A tech blog about fun things with Python and embedded electronics.*Retrieved from: http://www.paulvangent.com/2016/03/21/analyzing-a-discrete-heart-rate-signal-using-python-part-2/

**Github rep:**Github repository located here

Find the other parts at these pages:

- Part 1: Opening the data, detecting the first peaks and calculating the BPM;
- Part 2: Extracting complex measures from the heart rate signal;
- Part 3: Automatic error detection, outlier rejection and signal filtering;
- Part 4: Detecting and rejecting noisy signal parts.

The following tutorial assumes intermediate knowledge of the Python programming language, FIR-filters and fast fourier transform methods. Highschool mathematics and basic statistics will help in understanding the theory and background, but are not required for the coding.

The tutorial should be suitable for those with intermediate levels of Python skill. It is divided into separate parts so that you can easily skip over those parts you understand. The data was collected with a simple data logger based on an ATTiny45, as described in another post.

**Some Theory and Background
**The heart rate signal contains a lot of information, not just about your heart but also about your breathing, short-term blood pressure regulation, body temperature regulation and hormonal blood pressure regulation (long term). It has also (though not always consistently) been linked to mental effort, and not surprisingly so since your brain is a very hungry organ, using up to 25% of your total glucose and 20% of your oxygen consumption. If its activity increases your heart needs to work harder to keep it supplied.

The measures we are interested in can be split into

*time-series data*en

*frequency domain data*. If you are familiar with fourier transforms the frequency part will make sense. If not, the wikipedia page has a good explanantion, and also a very nice visualisation of the process. The basic idea is that you take a signal that repeats over time (such as the heart rate signal), and determine what frequencies make up the signal. You ‘transform the signal from the time domain to the frequency domain’. Here is another visualisation I particularly love, which clearly shows how a repeating signal can be approximated as the sum of different sine waves repeating over time.

*Time Series Data*

For the time-series portion of the heart rate signal we are mostly concerned with the intervals between the heart beats and how they vary over time. We want the position of all R-complexes (R1, R2,…Rn), the intervals between them (RR1, RR2,…RRn, defined as ) and the differences between adjacent intervals (RRdiff-1,…RRdiff-n, defined as ).

To visualize:

The time series measures often found in the scientific literature are:

- BPM, the amount of heart beats per minute, we calculated this in the previous part;
- IBI (inter-beat interval), the mean distance of intervals between heartbeats, we implicitly calculated this in the previous part as part of the BPM calculation;
- SDNN, the standard deviation of intervals between heartbeats:

- SDSD, the standard deviation of successive differences between adjacent R-R intervals:

- RMSSD, the root mean square of successive differences between adjacent R-R intervals:

- pNN50/pNN20, the proportion of differences greater than 50ms / 20ms.

The IBI, SDNN, SDSD, RMSSD en pNNx (and also frequency domain measures) are often grouped under “Heart Rate Variability” (HRV) measures, because they give information about how the heart rate varies over time.

*Frequency Domain Data
*On the frequency side of the heart rate signal the most often found measures are called the HF (High Frequency), MF (Mid Frequency) and LF (Low Frequency) bands, an eternal testament to the level of creative naming found in science. The MF en HF bands are often taken together and just labeled ‘HF’. LF and HF roughly correspond to 0.04-0.15Hz for the LF band and 0.16-0.5Hz for the HF band. The LF band seems related to short-term blood pressure variation, the HF band to breathing rate.

The frequency spectrum is calculated performing a Fast Fourier Transform over the R-R interval dataseries. This method is, as the name implies, fast compared to the Discrete Fourier Transform method. The larger the dataset, the larger the speed difference between the methods.

We will calculate the measure for HF and LF by first to re-sampling the signal so we can estimate the spectrum, then transforming the re-sampled signal to the frequency domain, then integrating the area under the curve at the given intervals.

**Time Domain Measures – Getting Started
**Looking at the above time-series measures, we need a few ingredients to easily calculate all of them:

- A list of the positions of all R-peaks;
- A list of the intervals between all subsequent R-R pairs (RR1, RR2,.. RR-n);
- A list of the differences between all subsequent intervals between R-R pairs (RRdiff1,… RRdiffn);
- A list of the
*squared differences*between all subsequent differences between R-R pairs.

We already have the list of all R-peak positions from the **detect_peaks()** function from the first part, which is contained in **dict[‘peaklist’]**. We also have a list of R-R pair differences from the **calc_RR()** function, which is in **dict[‘RR_list’]**. Great! No additional code written, already 50% of the way there.

To get the final two ingredients we use **dict[‘RR_list’]** and calculate both the differences and the squared differences between adjacent values:

```
RR_diff = []
RR_sqdiff = []
RR_list = measures['RR_list']
cnt = 1 #Use counter to iterate over RR_list
while (cnt < (len(RR_list)-1)): #Keep going as long as there are R-R intervals
RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1])) #Calculate absolute difference between successive R-R interval
RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2)) #Calculate squared difference
cnt += 1
print RR_diff, RR_sqdiff
```

**Calculating Time Domain Measures**

Now that the ingredients are there, it’s easy to calculate all measures:

```
ibi = np.mean(RR_list) #Take the mean of RR_list to get the mean Inter Beat Interval
print "IBI:", ibi
sdnn = np.std(RR_list) #Take standard deviation of all R-R intervals
print "SDNN:", sdnn
sdsd = np.std(RR_diff) #Take standard deviation of the differences between all subsequent R-R intervals
print "SDSD:", sdsd
rmssd = np.sqrt(np.mean(RR_sqdiff)) #Take root of the mean of the list of squared differences
print "RMSSD:", rmssd
nn20 = [x for x in RR_diff if (x>20)] #First create a list of all values over 20, 50
nn50 = [x for x in RR_diff if (x>50)]
pnn20 = float(len(NN20)) / float(len(RR_diff)) #Calculate the proportion of NN20, NN50 intervals to all intervals
pnn50 = float(len(NN50)) / float(len(RR_diff)) #Note the use of float(), because we don't want Python to think we want an int() and round the proportion to 0 or 1
print "pNN20, pNN50:", pnn20, pnn50
```

That’s it for the time series measures. Let’s wrap them up in callable functions. We expand the

**calc_RR()**function to calculate our extra ingredients and append them to the dictionary object, and also merge

**calc_bpm()**with the other time series measurements into a new function

**calc_ts_measures()**and append them to the dictionary:

```
def calc_RR(dataset, fs):
peaklist = measures['peaklist']
RR_list = []
cnt = 0
while (cnt < (len(peaklist)-1)):
RR_interval = (peaklist[cnt+1] - peaklist[cnt])
ms_dist = ((RR_interval / fs) * 1000.0)
RR_list.append(ms_dist)
cnt += 1
RR_diff = []
RR_sqdiff = []
cnt = 0
while (cnt < (len(RR_list)-1)):
RR_diff.append(abs(RR_list[cnt] - RR_list[cnt+1]))
RR_sqdiff.append(math.pow(RR_list[cnt] - RR_list[cnt+1], 2))
cnt += 1
measures['RR_list'] = RR_list
measures['RR_diff'] = RR_diff
measures['RR_sqdiff'] = RR_sqdiff
def calc_ts_measures():
RR_list = measures['RR_list']
RR_diff = measures['RR_diff']
RR_sqdiff = measures['RR_sqdiff']
measures['bpm'] = 60000 / np.mean(RR_list)
measures['ibi'] = np.mean(RR_list)
measures['sdnn'] = np.std(RR_list)
measures['sdsd'] = np.std(RR_diff)
measures['rmssd'] = np.sqrt(np.mean(RR_sqdiff))
NN20 = [x for x in RR_diff if (x>20)]
NN50 = [x for x in RR_diff if (x>50)]
measures['nn20'] = NN20
measures['nn50'] = NN50
measures['pnn20'] = float(len(NN20)) / float(len(RR_diff))
measures['pnn50'] = float(len(NN50)) / float(len(RR_diff))
#Don't forget to update our process() wrapper to include the new function
def process(dataset, hrw, fs):
rolmean(dataset, hrw, fs)
detect_peaks(dataset)
calc_RR(dataset, fs)
calc_ts_measures()
```

Call it like this:

```
import heartbeat as hb #Assuming we named the file 'heartbeat.py'
dataset = hb.get_data('data.csv')
hb.process(dataset, 0.75, 100)
#The module dict now contains all the variables computed over our signal:
hb.measures['bpm']
hb.measures['ibi']
hb.measures['sdnn']
#etcetera
#Remember that you can get a list of all dictionary entries with "keys()":
print hb.measures.keys()
```

Analysing the signal and calculating all measures in a single thread takes about 25ms on my computer. That’s pretty quick!

**Frequency Domain Measures – Getting Started
**The calculation of the frequency domain measures is a bit more tricky. The main reason is that we

*do not*want to transform the heart rate signal to the frequency domain (doing so would only return a strong frequency equal to BPM/60, the heart beat expressed in Hz). Rather, we want to transform the R-R intervals to the frequency domain. Difficult to understand? Think about it like this: your heart rate varies over time as your heart speeds up and slows down in response to the changing demands of your body. This variation is expressed in the changing distances between heart beats over time (the R-R intervals we calculated earlier). The distances between R-R peaks vary over time with their own frequency. To visualize, plot the R-R intervals we calculated earlier:

From the plot you can clearly see that the R-R intervals do not change abrubtly per heart beat, but vary over time in a sine-wave like pattern (more precisely: a combination of different sine waves). We want to find the frequencies that make up this pattern.

However, any fourier transform method depends on evenly spaced data, and our R-R intervals are most certainly not evenly spaced in time. This is because the position in time of the intervals is dependent on their length, which is different for each interval. I hope this makes sense.

To find our measures, we need to:

- Create an evenly spaced timeline with the R-R intervals on it;
- Interpolate the signal, which serves to both create an evenly spaced time-series and increase the resolution;
- This interpolation step is also called re-sampling in some studies.

- Transform the signal to the frequency domain;
- Integrate the area under the LF and HF portion of the spectrum.

**Calculating Frequency Domain Measures
**First we create an evenly spaced timeline for the R-R intervals. To do this we take the sample position of all R-peaks, which is housed in the list

**dict[‘peaklist’]**that we calculated in part 1. We then assign the y-values to these sample positions from the list

**dict[‘RR_list’]**, which contains the duration of all R-R intervals. Finally we interpolate the signal.

```
from scipy.interpolate import interp1d #Import the interpolate function from SciPy
peaklist = measures['peaklist'] #First retrieve the lists we need
RR_list = measures['RR_list']
RR_x = peaklist[1:] #Remove the first entry, because first interval is assigned to the second beat.
RR_y = RR_list #Y-values are equal to interval lengths
RR_x_new = np.linspace(RR_x[0],RR_x[-1],RR_x[-1]) #Create evenly spaced timeline starting at the second peak, its endpoint and length equal to position of last peak
f = interp1d(RR_x, RR_y, kind='cubic') #Interpolate the signal with cubic spline interpolation
```

Note that the time series does not start at the first peak, but at the sample position of the second R-peak. Because we work with the intervals, the first interval is available at the second peak.

We can now use the created function

**f()**to find the y value of any x-position within our signal range:

```
print f(250)
#Returns 997.619845418, the Y value at x=250
```

Simlarly, we can pass our entire timeseries

**RR_x_new**to the function and plot it:

```
plt.title("Original and Interpolated Signal")
plt.plot(RR_x, RR_y, label="Original", color='blue')
plt.plot(RR_x_new, f(RR_x_new), label="Interpolated", color='red')
plt.legend()
plt.show()
```

Now to find the frequencies that make up the interpolated signal, use numpy’s fast fourier transform

**np.fft.fft()**method, calculate sample spacing, convert sample bins to Hz and plot:

```
#Set variables
n = len(dataset.hart) #Length of the signal
frq = np.fft.fftfreq(len(dataset.hart), d=((1/fs))) #divide the bins into frequency categories
frq = frq[range(n/2)] #Get single side of the frequency range
#Do FFT
Y = np.fft.fft(f(RR_x_new))/n #Calculate FFT
Y = Y[range(n/2)] #Return one side of the FFT
#Plot
plt.title("Frequency Spectrum of Heart Rate Variability")
plt.xlim(0,0.6) #Limit X axis to frequencies of interest (0-0.6Hz for visibility, we are interested in 0.04-0.5)
plt.ylim(0, 50) #Limit Y axis for visibility
plt.plot(frq, abs(Y)) #Plot it
plt.xlabel("Frequencies in Hz")
plt.show()
```

Nice! You can clearly see the LF and HF frequency peaks in the signal. The high peak left in the graph consists of the VLF (Very Low Frequency) and ULF (Ultra Low Frequency) band in the HRV. In the literature these are often connected to physical activity, thermal regulation and hormonal blood pressure regulation.

The last thing remaining is to integrate the area under curve at the LF (0.04 – 0.15Hz) and HF (0.16 – 0.5Hz) frequency bands. We need to find the data points corresponding to the frequency range we’re interested in. During the FFT we calculated the one-sided frequency range

**frq**, so we can search this for the required data point positions.

```
lf = np.trapz(abs(Y[(frq>=0.04) & (frq<=0.15)])) #Slice frequency spectrum where x is between 0.04 and 0.15Hz (LF), and use NumPy's trapezoidal integration function to find the area
print "LF:", lf
hf = np.trapz(abs(Y[(frq>=0.16) & (frq<=0.5)])) #Do the same for 0.16-0.5Hz (HF)
print "HF:", hf
```

This returns:

```
LF: 38.8900414093
HF: 47.8933830871
```

These are the areas under the frequency plot at the frequency spectra of interest. Remember from the theory that HF is related to breathing and LF to short-term blood pressure regulation. The measures have also been implicated in increased mental activity.

**Wrapping It All Up
**We’ve come a long way and can extract many meaningful measures from a heart rate signal now. However, chances are the module will fail if you input other heart rate data, because it may be more noisy than our ideal data, or may contain artefacts. The third part of this series will deal with signal filtering, error detection and outlier rejection.

## santosh pandure

August 29, 2016respected sir,

can you give me your email id for communication. i need your help.

## Paul van Gent

August 29, 2016What can I help you with? Please ask the questions in the comments as they may help others as well.

## Anupam

November 29, 2016Dataset values are ranging from 400 to 700 but the arduino output is ranging from 40 to 70. Should I just multiply it by 10? Do you happen to have an optimized code of Pulse sensor for the Arduino?

## Paul van Gent

November 29, 2016That depends on how you’re reading from the Arduino. Check the code you are you using there. If you use an analog read it will have a range of 0-1023. Never change values in a dataset because they are not what you expect!

## Alex

January 11, 2017Thank you for the nice tutorial. I have a question, is it possible to use heart rate directly to calculate these features? or are there any other features that can be calculated directly from heart rate and not from IBI?

## Paul van Gent

January 12, 2017I’m not completely sure what you mean. Do you mean you have a BPM value and want to calculate the other measures? You can use the BPM value to get the average IBI over a section, but for most measures you need each peak position.

## Alex

January 12, 2017Yes exactly! Is it possible for frequency domain features: If I first convert BPM to RR intervals, detect peaks in that signal and then calculate frequency domain features? or this approach is wrong?

## Paul van Gent

January 17, 2017Hi Alex,

Unfortunately I would not recommend this in most cases, but it depends on the resolution of the BPM data.

The BPM value is usually an aggregate measure over multiple beats. Converting to RR-intervals means guessing what RR-intervals make up a segment of several beats. This state is unknown and has many possible permutations, estimating it will likely lead to unreliable results.

You can quite easily create an arduino heart rate measurement system. The easiest is to use a Pulse Sensor or similar optical HR sensor. You could also create an electrical heart rate sensor with two contacts, a few op-amps and an optocoupler (always electrically de-couple circuitry attached to your body!)

## Alex

January 18, 2017Hi Paul,

Thank you for your detailed response. Can you tell me, what steps I can follow from this tutorial to calculate HRV features if I have inter-beat-interval data?

## Paul van Gent

January 18, 2017If you have all inter-beat-intervals it is quite easy. You can do two things:

– You can take a look at the formulas for the varying HRV measures in this tutorial or on other places on the web

– If you want to use the code from this tutorial series, you can synthesize an R-R list from the IBI intervals. Example, if you have an IBI list of [500, 450, 500], this would mean you have data for 4 beats. These are at 0, 500, 950 and 1450ms respectively. Note that you need a per-beat-IBI, not an aggregate!

Good luck 🙂

## Ken

January 23, 2017May I ask a question?

I’ve been analysing ECG data using python(so this page is really helpful for me, thank you). I found that other research articles or web pages about HRV always use PSD(Power Spectral Density) to calculate LF and HF(In this page, you’re using amplitude spectrum, aren’t you?). What I want to know is can we use amplitude spectrum to calculate LF(and HF)?

## Paul van Gent

January 23, 2017Hi Ken,

That is correct and I applaud you for noticing. Indeed I use the Amplitude Spectrum in this part. Doing a PSD can get quite complicated. In practice you can approximate the PSD with the absolute Fourier Transform magnitude squared. So if your Fourier function is F(x), then an approximation of the PSD is |F(x)|^2.

Another approach is to use a periodogram, see this link.

Both are quick and dirty estimators of the PSD, and both suffer from similar drawbacks. To convolute matters people often use the term PSD for the above or more complex methods interchangably.

I’m stil writing part 4 in this series (have been for a while, I seem to be stuck in a time-vortex), where I actually go into this among other things. Before that, however, I need to increase my grasp of the methods as well :).

## Ken

January 24, 2017Thank you for the early reply, Paul!

So in this case, squared Y means the rough values of PSD, right?

example code:

PSD = (np.fft.fft(f(RR_x_new))/n) ** 2 # Is this correct?

I actually tried to use periodogram, but the output looked different from what I expected. I couldn’t change any optional parameters except fs at that time. So do we need to change any other options to calculate PSD?(Sorry for asking many questions. I am not familiar with wave analysis…)

## Paul van Gent

January 24, 2017You need to square every bin in the |FFT| spectrum. I’m not in the position to test your code right now. Plot them both and see if it seems right.

However, note that the |F(x)|^2 is a very dirty way of estimating the PSD. One of the main problems is that adding more data does not add more variance, while a PSD refers to the spectrum energy per time unit. You can now see why it’s a dirty approximation. Doing a PSD the correct way is quite complex. However many papers that mention PSD seem to use the “absolute FFT squared” method, depending on what software they used to analyse the HRV data.

## Ken

January 25, 2017Oh I forgot to add the absolute function,

PSD = np.abs((np.fft.fft(f(RR_x_new))/n)) ** 2 # This is the right one, I think.

The output using this code seemed right(values of lf and hf are reasonable). But there is still difficulty using periodogram one(both lf and hf are nearly 0). So I am going to use fft this time. Thank you for your advice.

## Paul van Gent

January 26, 2017You’re welcome. Glad it was of help.

## Marcus Holzberger

July 7, 2017Hello Paul,

first: very nice tutorial.

I am not sure for which reason you calculate the rolling mean. You detect the Peak indexes from this. Is there any disadvantage to detect the peaks from the real signal? I do this with peakutils: https://pypi.python.org/pypi/PeakUtils.

with this peakutil.indexes it is very easy to make own thresholds, so that you can also detect the peaks in the T-waves for example.

## Paul van Gent

July 7, 2017Hi Marcus,

A standard way of detecting peaks mathematically is to find intersections with some threshold, then solve for the maximum between two intersections. The rolling mean functions as this threshold in the code.

Peak detection with other libraries is of course also a possibility, but in a tutorial series that is mostly about signal analysis I prefer to be more explicit in how things are calculated, and not rely on external libraries.

Be sure to keep an eye on the site, I’ve almost finished part 4 about optimising and improving the whole thing.

## Marcus Holzberge

July 8, 2017Ah Ok, in part 3 it becomes reasonable for me. I should have read this before. I’m not sure if my code with peakutils can handle this.

## Paul van Gent

July 8, 2017Good to hear part 3 helped.

As a general tip I would always recommend you to develop your own solution first before jumping onto modules (given you have the time). If anything, this will give you a good grasp of the problem and the possible solutions. This will help prevent things breaking later on.

Once you finish the series I recommend you take a look at the repo, there’s some optimisations and generalisations there: https://github.com/paulvangentcom/heartrate_analysis_python

## Dean

October 7, 2017Hi Marcus, may I ask a question?

In code, Is NN50 = nn50 ??

Thanks for your tutorial!!

code:

nn20 = [x for x in RR_diff if (x>20)]

nn50 = [x for x in RR_diff if (x>50)]

pnn20 = float(len(NN20)) / float(len(RR_diff))

pnn50 = float(len(NN50)) / float(len(RR_diff))

## Paul van Gent

October 10, 2017Hi Dean, yes I think that should indeed refer to nn50

## ria

November 16, 2017hi

i completed part 1 and it was amazing and in part 2 peaklist gives me a key error should i add this code with heartbeat.py or seperately?

.

## Paul van Gent

December 4, 2017Hi Ria, could you share your code with me? info@paulvangent.com

## John

December 15, 2017Hi Paul, thank you for the tutorial and the hard work to put it together.

I’d appreciate if you’d allow me to suggest a more “pythonian” way to calculate pNNx

def pNN(x,t=50):

“”” The proportion of differences greater than t ms. t defaults to 50ms; x is list of RR differences. “””

pnnt=[i for i in x if (i>t)]

return float(len(pnnt))/float(len(RR_diff))

Then call pNN20(RR_diff, 20), pNN50(RR_diff), pNN80(RR_diff, 80) etc. This way other percentages/ratios are also covered without being explicitly calculated.

Thank you for this excellent piece of work!

## Paul van Gent

December 16, 2017Hi John, thanks for the suggestion. That is indeed more pythonian!

– Paul

## Nuria

March 9, 2018#Set variables

Hello Paul,

the post is marvelous. I have a doubt in the part where you transform the signal to the frequency domain. Can you explain more deeply what the code do?

n = len(dataset.hart) #Length of the signal

frq = np.fft.fftfreq(len(dataset.hart), d=((1/fs))) #divide the bins into frequency categories

frq = frq[range(n/2)] #Get single side of the frequency range

#Do FFT

Y = np.fft.fft(f(RR_x_new))/n #Calculate FFT

Y = Y[range(n/2)] #Return one side of the FFT

## Paul van Gent

March 21, 2018Hi Nuria. This is where the fast fourier transform is performed. I create the frequency bins (which frequencies belong to each datapoint-datapoint interval), then take a one-sided part of this (fft creates a signal mirrored across x=0). The fft itself is a standard numpy function.

## Eduardo

April 19, 2018Hi Paul, one question about normal range for LF/HF values: I have applied your method to my data and I have obtain a completely different value range, the thing is that I don’t know if Im doing something wrong or it is due a bad adaptation of your code, the thing is that I have a bvp signal from a plethismograph, adapting your code I obtain the same RRI_DF array and then I apply:

#Set variables

n = len(BVP_DF[“spData”]) #Length of the original bvp signal

frq = np.fft.fftfreq(len(BVP_DF[“spData”]), d=((1/64)))

frq = frq[range(int(n/2))] #Get single side of the frequency range

#Do FFT

Y = np.fft.fft(f(RR_x_new))/n #Calculate FFT

Y = Y[range(int(n/2))] #Return one side of the FFT

The thing is that my results are much lower than the you are showing above, my numbers for HF y LF are: LF: 1.07319855346

HF: 1.84067854321

And I don’t know if these could be correct or if they are to low for a real LF/HF value.

Thanks for your article, really interesting!

## Paul van Gent

April 23, 2018Hi Eduardo. That seems correct. The thing with LFHF is that you use the ratio, so you need to divide: LF/HF. This ratio is what is used in the literature. I’ll make some adjustments to the tutorial to reflect this. I do use a simplistic method of approximating the power spectral density here (squared FFT). There are better ways.

Stay tuned for a long-ago-announced part four. I keep saying it’s coming, and it is, but I need to await a publication procedure before I can put it online.

– Paul

## Raphael

May 10, 2018Hi Paul

You you interpolate the RR intervals to convert to continuous RR interval before extracting the frequency domain features. You are doing this using a third order (i.e. cubic) spline. I assume that the way you calculate the frequency features needs a continuous signal. Would it not also be possible to compute frequency features without interpolation?

However, why are you using this cubic interpolation instead of a linear interpolation? I think with a cubic interpolation the signal can be overestimated or underestimated, especially around the RR values.

## Paul van Gent

May 14, 2018Hi Raphael,

Apologies for not replying sooner, somehow I missed your comment. I upsample because of the analysis: when using a FFT to extract frequency domain features, the width of the frequency bins depends on the sampling rate of the underlying signal. Here upsampling means we can extract a bit more information.

I use a cubic spline because I want to approximate the natural variation in the heart rate as you breathe. This variation happens gradually in a sinusoidal fashion, so using a cubic spline will approximate the underlying signal much better than a linear interpolation. The latter would leave the shape described by the sample datapoints intact and just upsample the number of points, in other words it would not add any information.

Your question does make sense though. Always be careful when interpolating to make sure it makes sense. Adding information that is useless will only create problems.

– Paul

## Heedong

June 29, 2018Hello Paul Your code has been a great help to the project we are doing right now.

I would like to run the code of the whole process up to the contents of Part 2 by deep running, but is there a decent model? We are thinking of RNN-LSTM now, but there is no specific plan.

## palkab

June 29, 2018Hi Heedong. Why? Using deep learning on such a problem is likely not the way to go.

An exact solution like what is done by the algorithm on the blog and on the github page is faster, requires only a fraction of an MB of ram and likely preciser.

Something like an LSTM will already require several hundred MB of ram, larger models will require even more, and likely a GPU.

Deep learning is powerful for sure, but not the answer to everything!

## Yang

June 29, 2018Hello Paul Your code has been a great help to the project we are doing right now.

I would like to run the code of the whole process up to the contents of Part 2 by deep running, but is there a decent model? We are thinking of RNN-LSTM now, but there is no specific plan.