Google Colab Notebook Tutorial | How to import a dataset

Introduction

In this article, we are going to learn about Google colaboratory notebooks in as much detail as possible. This tutorial is going to cover a variety of things related to Google colab, the details of which you can find in the Table of Contents Below.

  1. What Google colab offers
  2. How to setup Google colab
  3. Enabling GPU and some basic functions
  4. Importing files/datasets to Google colab notebook

If you have already used Google colab and know a few things feel free to skip any part that you’re already familiar with and jump right to the part you are interested in.

What Google Colab Offers

The reason why Google collab has become so popular nowadays is mainly because they offer free GPU for training for up to 12 hours continuously and even after that you can reconnect for a new session and the thing is that most people do not have a GPU in their workstation. Those who do – it’s not manufactured by Nvidia. The current GPU that collab offers is nvidia tesla T4.

nvidia_t4_specs

How to setup Google Colab

  1. First of all you need a gmail account.
  2. Go to your google drive.
  3. Once you are there, right click, click on more, click on connect more apps.
Click-on-more
Go-to-connect-more-apps

4. Type colaboratory in the search bar and the click on install.

search-bar-picture
install-guide

5. After installation close this tab.

acknowledge-popup

6. After that right click and go to “more” and open colaboratory. It will create a new colab notebook for you.

setup-colab-notebook

7. To rename your notebook:

rename

8. Buttons to add more code cells, to add more text cells and to delete cells are as follows. It’s as simple as that.

add-new-blocks

Enabling GPU and some basic functions

To check which devices are connected to your notebook currently, run the commands mentioned below:

from tensorflow.python.client import device_lib
​
device_lib.list_local_devices ()

Currently we only have a CPU connected to its memory.

current-cpu

To avail Google’s free GPU service, click on “edit” and then “notebook settings” to enable GPU. Save it as you go. It will take some time to connect (5-6 seconds).

edit-cpu1
add-gpu

Run the above code again. Now it will show that a GPU device is connected as well. It’s name is Tesla T4.

cureent-cpu-gpu

To install a library or a framework write the command !pip install and the name of the framework or library. Let’s try tensorflow.

command

Now this command might not always work, in case it doesn’t, just go to Google and write “how to install ‘library name’ on google collaboratory notebooks”. The first few searches will probably give you a single line command that you can run on your notebook to install that library on your Google collab notebooks.

To clone a github repository through your Google Drive simply just go to that repository and copy the link, come back to your notebook and write this command with your link. !wget clone “your repo link” With this your repository will be cloned in your file.

github-link

To check the content of your current folder use the command ‘!ls’. As you can see the repository has been cloned in our current directory.

ls-command

Let’s now learn how to navigate to the Google Drive folders. The LS command shows you the content of your current directory. To move back one directory, this is the command that you would run.

cd-command

To see the contents of your current directory use the same command “!ls”. This is what you will get.

ls-command

Let’s go to the content folder with this command. cd content. Let’s see what we have inside it. It has Drive simple data and the file you imported from git just now.

ls-command

Go to sample data with command cd sample_data to put some datasets in there. First go into the file and check its contents. If you want to remove something just type RM and the name of the file. To see if the file has been removed or not, type the LS command again. It should have been removed.

into-sample-data

To see the contents of a file use this command !cat "name of file". It displays what was inside that file

display-contents

Importing files or datasets into Google colab notebook

Let us now learn how to import our dataset into our Google collaboratory notebook. For that, go to Google Apps website with this link.

Copy this line of code and also import pandas library.

importing-files

Run this cell**

importing-pandas

Go back, copy this line of code, paste it in one cell and run it.

uploaded = files.upload() 

Choose files and upload them, in this case it is an Expenses.csv file and try to read that file with the command shown.

uploaded-file-contents

To import an online dataset into your Google collaborate notebook the easiest way to do that is to write this command and add the link to it.

!wget "the link to import dataset" 

Let’s add this link for example and run the cell. The name of the data set as you can see is Titanic.csv.

online-dataset

Now try to read it with the following command.

read-online-dataset

The data set has been successfully imported to our Google colaboratory notebook.

Before we continue, I would like to discuss a common error that arises when you’re using colab.

common-error

It means that all the GPUs that Google offers are currently busy and you can solve this problem simply by trying a few minutes later.

To see the parameters that a function of any library like tensor-flow, keras and scikit learn uses, this is how you can do it.

see-parameters

The documentation for that function will be shown to you which will include the parameters that the function takes as well as what each parameter means and in some cases it will also include an example of how that function can be used.

function-parameters

That’s all folks. Have a good time coding with Google Colab.

Leave a Reply

Discover more from Junaid Khalid

Subscribe now to keep reading and get access to the full archive.

Continue reading