How fast is C++ compared to Python?

anjay Fevaha uhuy
4 min readDec 17, 2020

There are millions of reasons to love Python (especially for data scientists). But how is Python different from more professional low-level programming languages like C or C++? I guess this is a question that many data scientists or Python users asked or will ask themselves one day. There are many differences between Python and languages like C++. For this article, I am going to show you how fast C++ is compared to Python with a super simple example.
Image for post
Photo by author.
To show the difference, I decided to go with a simple and practical task instead of an imaginary task. The task that I am going to accomplish is to generate all possible DNA k-mers for a fixed value of “k”. If you don’t know about DNA k-mers, I explain it in plain language in the next section. I chose this example because many genomic-related data processing and analysis tasks (e.g. k-mers generation) are considered computationally intensive. That’s a reason why many data scientists in the field of bioinformatics are interested in C++ (in addition to Python).
Image for post
Photo by author.
A Short Introduction to DNA K-mers
A DNA is a long chain of units called nucleotides. In DNA, there are 4 types of nucleotides shown with letters A, C, G, and T. Humans (or more precisely Homo Sapiens) have 3 billion nucleotide pairs. For example, a small portion of human DNA could be something like: ACTAGGGATCATGAAGATAATGTTGGTGTTTGTATGGTTTTCAGACAATT
In this example, if you choose any 4 consecutive nucleotides (i.e. letters) from this string, it will be a k-mer with a length of 4 (we call it a 4-mer). Here are some examples of 4-mers derived from the example.
ACTA, CTAG, TAGG, AGGG, GGGA, etc.
The Challenge
For this article, let’s generate all possible 13-mers. Mathematically it is a permutation with a replacement problem. Therefore, we have ⁴¹³ (=67,108,864) possible 13-mers. I use a simple algorithm to generate results in C++ and Python. Let’s take a look at the solutions and comparing them.
Comparing Solutions
To compare C++ and Python for this specific challenge easily, I used exactly the same algorithm for both languages. Both codes are intentionally designed to be simple and similar. I avoided using complex data structures or third-party packages or libraries. The first code is written in Python.
If you run the Python code, it will take 61.23 seconds to generate all 67 million 13-mers. To have a fair comparison, I commented out the lines that display k-mers (lines 25 and 37). If you like to display k-mers while they are being generated, you can uncomment those two lines. Note: it takes a long time to display all of them. Please use CTRL+C to abort the code if you need it.
Now, let’s take a look at the same algorithm in C++.
After compiling, if you run the code, it takes about 2.42 seconds to generate all 67 million 13-mers. It means Python takes 25 times more time to run the same algorithm compared to C++. I repeated the experiment for 14-mers and 15-mers (you need to change lines 12 in the Python code and 22 in the C++ code). Table 1 summarizes the results.
Image for post
Table 1) Comparing Python and C++ runtimes for generating 13-, 14-, and 15-mers.
Clearly, C++ is much faster than Python. It is not a surprise to most programmers and data scientists, but the example shows that the difference is significant. Remember, in this example, we did not use CPU or GPU parallelization, which must be done for these types of problems (embarrassingly parallel problems). Also, in this example, we did not involve memory heavily. If we stored the results (for example for some specific reasons), then the memory management could even make a more significant difference between C++ and Python runtimes.
This example and thousands of other challenges suggest that even data scientists should know about languages like C++ if they are working with a large amount of data or exponentially growing processes.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

--

--