1 Introduction

Before taking your bilingual journey through data science, I want to highlight some useful resources which can improve your overall learning experience.

1.1 Choose an IDE

Making data science in the command line is ok, but you can save time and improve a lot your workflow by using an integrated development environment (IDE). You will have the ability to create scripts, running code interactively, making plots, debugging code and more within a single application.

In my opinion, the RStudio IDE is the best solution out there for R at present.

For Python, good choices are Spyder, PyCharm and Jupyter.

1.2 Learn by others and improve yourself

1.2.1 Kaggle

Kaggle is an online platform hosting datasets and competitions aimed at solving real-life problems. This is one of the best places to getting your hands dirty in data science!

Throughout the book, we will use many (?) Kaggle datasets. You can install the CLI of Kaggle API and run the following command on bash to download a dataset:

kaggle datasets download kamilpytlak/personal-key-indicators-of-heart-disease ./data --unzip

1.2.2 TidyTuesday

As described in the main page, TidyTuesday is a weekly social data project aimed at applying your R skills, getting feedback, exploring other’s work and connecting with the greater R community. Every week (yes, on Tuesday), a new dataset is posted on the GitHub page and people are encouraged to produce useful insights from it (usually) through figures. Once you’re done with your visualization, you can post it on Twitter by using the hashtags #TidyTuesday and #RStats. It is also recommended to share your code and adding alt text to your visualizations.

1.2.3 Big Books

There will be lots of links to external resources throughout the book. Anyway, I suggest you to take a look at these two meta-books which can help you to point to the right direction in your learning path:

1.3 Integrate R and Python

If you want to integrate Python code in your R projects, I suggest you to take a look at the R package reticulate.

For example, it is possible to import Python libraries and use their functions to create R objects. In the following code chunk, you can see how we import Numpy and Pandas (using R syntax), create two Numpy arrays (R vectors) and use them to make a Pandas DataFrame (R data.frame):

library(reticulate)

np <- import("numpy")
pd <- import("pandas")

a1 <- np$array(c(1, 2, 3, 4))
a2 <- np$array(5:8)
pd$DataFrame(list(a1 = a1, a2 = a2), index = letters[1:4])

You can also use Python interactively through the R console (reticulate::repl_python() function) or sourcing Python scripts (reticulate::source_python()) and more.