Set up your computer

Anaconda

It is recommended to install Python via the Anaconda Distribution. We will use the Conda Package Management System included in the Anaconda Distribution. From the documentation:

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer.

After installing Anaconda, run python --version in a terminal (if you're on Windows, use the "Anaconda Prompt"). If the output contains "Python 3.8" and then you're ready for the next step.

GitHub

The course material is hosted on the code-sharing platform GitHub (where you're currently reading this). If you're not already registered at GitHub, make a user account now: https://github.com/join. It is recommended to use the platform for your own projects during the course. As student, you can apply for the GitHub Student Developer Pack, which includes offers and benefits from GitHub partners: https://education.github.com/students.

Kaggle

Kaggle is an online community of "data scientists", arranging data science competitions and hosting a large number of data sets. We'll make use of Kaggle in DAT158, both for course projects and as a source of data. Make an account here: https://www.kaggle.com. o

Install and test the course environment

After installing Anaconda, run through the following steps (on Windows, use the "Anaconda Prompt").

Install Git

Check if Git is already installed:

git --version

If Git is not installed, you will receive an error message similar to the following:

-bash: git: command not found
'git' is not recognized as an internal or external command, operable program or batch file.

In this case, run the following command:

conda install git

Download the course repository:

git clone https://github.com/skaliy/dat158-ml-course21.git
cd dat158-ml-course21

Configure the Python environemnt

conda env update

Activate the environment

conda activate dat158

Install a Jupyter kernel

python -m ipykernel install --user --name dat158 --display-name "dat158"

Test your installation

Go through the notebook notebooks/0.0-test.ipynb:

jupyter notebook

You can alternatively use JupyterLab:

jupyter lab

The following video gives an introduction to Jupyter Notebook:

Updating

The code and environment will be updated throughout the course. Run the following commands regularly:

  • Update code: git pull
  • Update environment:
    conda activate dat158
    conda env update
    

Troubleshooting

  • If you're using GNU/Linux or MacOS and the conda activate dat158 command fails, run source ~/.bash_profile and try again.
  • If you're on a Mac and the conda env update command fails with a gcc error, install Xcode through the App store and use it to install command line tools.

DataCamp for Classrooms

This class is supported by DataCamp. Here you will find several short courses with expert videos and hands-on-the-keyboard exercises that can be used as a supplement to DAT158. You get free access to all DataCamp content throughout the semester if you register through this link with your student mail (@stud.hvl.no).

Tip: The Datacamp course Introduction to Data Science in Python provides a beginner-friendly introduction to basic Python for data science.