Python Virtual Environment for Data Scientist in 3 steps.

Python virtual environments — you need to keep them separate

Python is one of the popular choices for beginners who want to learn Data Science. Due to its simplified syntax with more emphasis on natural language. Moreover, the demand in many industries for professional that do data science with python compared to other available languages has made it more popular in recent times.

When you start your coding journey, the learning curve can be intimidating. Especially for those who are coming from a non-computer science background. Due to this most of the introductory courses in python for data science, do not stress the importance of virtual environments in python.

As you move into data science with python, you will install different libraries, to help you do your data science task. The requirement and dependencies for every project will be different. Therefore, there will be a need to isolate your different data science projects with python virtual environments.

For example, if you are working with TensorFlow version 1 for one of your project. Now, you want to use TensorFlow 2 for another project. In this case, if you install both the packages or libraries in the same environment it may break the environment. That can lead to both of the projects not working.

This article will introduce you to the python virtual environment. That is one of the best practices to manage different packages and their versions in Data Science projects. Moreover, this will help you to keep track of your packages in the same folder where you have your data science projects. Instead of making a common dump environment, where you install all of your libraries. Later, causing issues with your Data Science Projects.

Step 1: Installing pip. Generally, python installation on windows comes with pip installed in it. In case pip is not installed you can install pip with the command below. The command pip install virtualenv — will install the library that will help in creating the virtual environments for python.

Step 2: Create a virtual environment. As discussed earlier, before creating a virtual environment, we want to install the python environment in the folder where our data science projects are. For this, we will create a demo director ‘demo_dir’. Navigate into the demo_dir. The command python3 -m venv demo_env — will create the virtual environment with the name ‘demo_env’.

Create a directory and virtual environment

Step 3: Activate the virtual environment. Now the virtual environment is created. we need to activate the virtual environment to install the project-specific libraries and isolate them from the main python installation and other projects. The command demo_env\Scripts\activate.bat — will activate the virtual environment. Now we can install any specific library to this environment in isolation.

Activate the virtual environment

Step 1: Let’s install a virtual environment package on Linux and Mac machine. The command sudo apt-get install python3-venv — will install the virtual environment package on the machine.

Install virtual environment library

Step 2: Create a virtual environment in the folder where your Data Science project is located. For this let’s create a ‘demo_dir’ folder and navigate into the folder. The command python3 -m venv demo_env — will create the virtual environment.

Create a directory and virtual environment

Step 3: Activate the virtual environment. Now the virtual environment is created. we need to activate the virtual environment to install the project-specific libraries and isolate them from the main python installation and other projects. The command demo_env\Scripts\activate.bat — will activate the virtual environment. Now we can install any specific library to this environment in isolation.

Activate the virtual environment

Finally, when we are done with the project, we can deactivate the virtual environment in all the above mentioned operating systems with the command ‘deactivate’. This will close the virtual environment.

deactivate in all the operating systems

TO summarize, virtual environments are one of the best practices that can save a lot of time for a data scientist. Especially, when juggling in between different data science projects that involve different package dependencies. With python virtual environment with ‘venv’ package, this can be created in the same folder that has the data science project, to ensure the environment is isolated. That is easy to create, maintain and dispose of when the project is over.

References

Package to install venv: https://pypi.org/project/virtualenv/

Nice YouTube video to summarize the blog: https://www.youtube.com/watch?v=Kg1Yvry_Ydk

Install pip in Ubuntu: https://linuxize.com/post/how-to-install-pip-on-ubuntu-20.04/