You should pick the right tool for the job.
The specific predictive modeling problem that you are working on should dictate the specific programming language, libraries and even machine learning algorithms to use.
But, what if you are just getting started and looking for a platform to learn and practice machine learning?
In this post, you will discover that Python is the growing platform for applied machine learning, likely to outpace and topple R in terms of adoption and perhaps capability.
After reading this post you will know:
- That search volume for Python machine learning is growing fast and has already outpaced R.
- That the percentage of Python machine learning jobs is growing and has already outpaced R.
- That Python is used by nearly 50% of polled practitioners and growing.
Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Python for Machine Learning is Growing
Let’s look at 3 areas where we can see Python for machine learning growing:
- Search Volume.
- Job Ads.
- Professional Tool Usage.
Python Machine Learning Search Volume is Growing
Search volume is probably indicative of students, engineers and other practitioners searching for information to get started or go deeper into the topic.
Google provides a tool called Google Trends that gives insight into the search volume of keywords over time.
We can investigate the growth of “Python machine learning” from 2004 to 2016 (the last 12 years). Below is a graph of the change in search volume for this period:
We can see that the trend upward started in Perhaps 2012 with a steeper rise starting in 2015, likely boosted by Python Deep Learning tools like TensorFlow.
We can also contrast this to the search volume for R machine learning and we can see that from about the middle of 2015, Python machine learning has been beating out R.
Blue denotes “Python Machine Learning” and red denotes “R Machine Learning”.
Python Machine Learning Jobs are Growing
Indeed is a job search website and like Google trends, they show the volume of job ads that match keywords.
We can investigate the demand for “python machine learning jobs” for the last 4 years.
We can see time along the x-axis and the percentage of job postings that match the keyword. The graph shows almost linear growth from 2012 to 2015 with a hockey-stick like increase in 2016.
We can also compare the job ads for python and R.
Blue shows “Python machine learning” and orange shows “R machine learning”.
We see a more pronounced story compared to Google search volume. The percentage of job ads available to indeed.com shows that demand for Python machine learning skills has been dominating R machine learning skills since at least 2012 with the gap only widening in recent years.
KDNuggets Survey Results: More People Using Python for Machine Learning
We can get some insight into the tools used by machine learning practitioners by reviewing the results for the KDnuggets Software Poll Results.
Here’s a quote from the 2016 results:
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R.
— Gregory Piatetsky
The poll tracks the tools used by machine learning and data science professionals, where a participant can select more than one tool (which is the norm I would expect)
Here is the growth of Python for machine learning over the last 4 years:
Below is a plot of this growth.
We can see a near linear growth trend where Python s used by just under 50% of profesionals in 2016.
It is important to note that the number of participants in the poll has also grown from many hundreds to thousands in recent years and participants are self-selected.
What is interesting is that scikit-learn also appears separately on the poll, accounting for 17.2%.
For more information see: KDnuggets 2016 Software Poll Results.
O’Reilly Survey Results: More People Using Python for Machine Learning
O’Reilly performs an annual Data Science Salary Survey.
They collect a lot of data from professional data scientists and machine learning practitioners and present the results in very nice reports. For example, here is the 2016 Data Science Salary Survey report [View the PDF Report].
The survey tracks tool usage of practitioners, and as with the KDNuggets data.
Quoting from the key findings from the 2016 report, we can see that Python plays an important role in data science salary.
Python and Spark are among the tools that contribute most to salary.
— Page 1, 2016 Data Science Salary Survey report.
Reviewing the survey results, we can see a similar growth trend in use of the use of the Python ecosystem for machine learning over the last 4 years.
2014 42% (interpreted from graph)
Again, we can plot this growth.
It’s interesting that the 2016 results are very similar to those from the KDNuggets poll.
You can find quotes to support any position on the Internet.
Take quotes with a grain of salt. Nevertheless, quotes can be insightful, raising and supporting points.
Let’s first take a look at some cherry-picked quotes from news sites and blogs about the growth of Python for machine learning.
Python has emerged over the past few years as a leader in data science programming. While there are still plenty of folks using R, SPSS, Julia or several other popular languages, Python’s growing popularity in the field is evident in the growth of its data science libraries.
— Katharine Jarmul, Introduction To Data Science: How To “big Data” With Python, Dataconomy
Our research shows that Python is one of the most popular languages for data science analyses, in use by more than one-third (36%) of organizations.
— Dave Menninger, Big Data Grows Up at Strata+Hadoop World 2016, SmartDataCollective
… the last few years have seen a proliferation of cutting-edge, commercially usable machine learning frameworks, including the highly successful scikit-learn Python library and well-publicized releases of libraries like Tensorflow by Google and CNTK by Microsoft Research.
— Josh Schwartz, Machine Learning Is No Longer Just for Experts, Harvard Business Review
Note that scikit-learn, TensorFlow and CNTK are all Python machine learning libraries.
Python is versatile, simple, easier to learn, and powerful because of its usefulness in a variety of contexts, some of which have nothing to do with data science. R is a specialized environment that looks to optimize for data analysis, but which is harder to learn. You’ll get paid more if you stick it out with R rather than working with Python
— Roger Huang, Data science sexiness: Your guide to Python and R, and which one is best, TheNextWeb
Below are some cherry picked quotes regarding the use of Python for machine learning taken from Quora questions.
Python if a popular scientific language and a rising star for machine learning. I’d be surprised if it can take the data analysis mantle from R, but matrix handling in NumPy may challenge MATLAB and communication tools like IPython are very attractive and a step into the future of reproducibility. I think the SciPy stack for machine learning and data analysis can be used for one-off projects (like papers), and frameworks like scikit-learn may be mature enough to be used in production systems.
— Aswath Muralidharan, Production Engineer. In response to the Quora question “What are the top 5 programming languages for Machine Learning?”
I’d also recommend Python as it is a fantastic all-round programming language that is incredibly useful for drafting code fragments and exploring data (with the IPython shell), great for documenting steps and results in the analytical process chain (IPython Notebook), has a huge selection of libraries for almost any machine learning objective and can even be optimized for production system implementation. In my opinions there are languages that are superior to Python in any of these categories – but none of them offers this versatility.
— Benedikt Koehler, Founder & CEO DataLion. In response to the Quora question “What is the best language to use while learning machine learning for the first time?”
[…] It is because the language can make a productive environment for people that just want to get something done quickly. It is fairly easy to wrap C libraries, and C++ is doable. This gives Python access to a wide range of existing code. Also the language doesn’t get in the way when it comes time to implement things. In many ways it makes coding “fun again” for a wide range of tasks.
— Shawn Masters, VP of Engineering. In response to the Quora question “Will Python become as popular as Java, given that Python is used in Machine Learning?”
In my opinion, Python truly dominates this category. A quick search of almost any artificial intelligence, machine learning, NLP, or data analytics topic, plus ‘Python’, will return examples of useful, actively maintained libraries.
— Ryan Hill, programmer. In response to the Quora question “Which programming language has the best repository of machine learning libraries?”
In this post, you discovered that Python is the growing platform for applied machine learning.
Specifically, you learned that:
- The number of people interested in Python for machine learning is larger than R and is growing.
- The number of jobs posted for Python machine learning skills is larger than R and growing.
- The number of polled data science professionals that use Python is growing year over year.
Has this influenced your decision to get started with the
Python ecosystem for machine learning?
Share your thoughts in the comments below.