**Introduction**

Stata is a popular data analytics tool used by researchers for statistical analysis. Nowadays, there are numerous tools available for performing data analysis. The popular open-source programming tools like R and Python. Even though R and Python are open source and easy to implement, they are still not mature. What I felt after learning R and Python is that R has numerous libraries, but the syntax are not consistent across libraries, which sometimes makes it hard for research work. In case of python, the syntax are consistent but many statistical analysis methods or modelling approach are still not available or are in the phase of development. Thus, I have to often depend on paid software products which are feature rich and mature. Stata is one of them that I often use for research related analysis. Now Stata 17 offers integration with Python which makes the analysis process super easy and fun.

Letās say you are doing some analysis in python and want to do some statistical analysis. You searched the internet and realized that the implementation of the statistical model is not available in Python, or not the exact implementation available that you want, then you have to approach a paid software and perform the analysis.

Stataās new Jupyter notebook support makes it super easy. Now, you can send data from python to Stata and vice versa. For example, you can send a part of the data from python to Stata, conduct analysis and return the output to Python for further analysis or vice versa. This can be done entirely from Jupyter notebook.

**Aim of the Article**

The aim of the article is to illustrate how we could utilise Python and Stata together to perform statistical analysis directly from Jupyter notebook.

**Article Outline**

- Stata in Ipython Notebook
- Loading a Dataset Into Python and Transferring it to Stata for Analysis
- Transferring Predictions from Stata to Python

## 1. Stata in Ipython Notebook

**Loading Stata**

To use the Stata in Ipython Notebook. First, you need to set up Python. Here, Iām using anaconda distribution and Python version 3.7. You need to ensure that you have Stata 17, which provides integration of Stata and Python in Ipython notebook/ Jupyter Notebook.

To start with the Ipython notebook you need to install ** stata-setup** package/library using pip.

*pip install stata-setup*

Next, open an Ipython notebook, and you need to import ** stata_setup** module. Further, we need to use

**and supply the directory where the Stata exist in your local machine, also specify the edition of Stata. Here in my case Iām using the Basic Edition so, ābeā.**

*stata_setup.config( )*Once you run it, you will see the following Stata page, indicating you are now connected to Stata desktop.

```
import stata_setup
stata_setup.config("D:\Application Installation\STATA", "be")
```

**Load auto Data in Stata**

First, Iām going to set the white tableau scheme permanently, which is a wonderful plot scheme.

You can enable it by installing `schemepack`

package developed by Asjad Naqvi. Follow the link for installation instructions: Link.

In jupyter notebook, to send any instruction to Stata we need to initiate the command with a** Ā %%stata**Ā magic command.

```
%%stata
set scheme white_tableau, perm
```

Once, we set the tableau scheme; next we start analysing data. Letās load the auto data.

Here, we used the system defaultĀ ** auto**Ā data and summarize it.

```
%%stata
sysuse auto, clear
summarize
```

**Generating a Scatter Plot**

Letās generate a scatter plot betweenĀ ** mpg**Ā andĀ

**Ā for Domestic and Foreign cars separately using theĀ**

*weight***Ā command.**

*twoway*```
%%stata
twoway (scatter mpg weight, msize(vlarge)), by(foreign)
```

**2. Loading a DataSet Into Python and Transferring it to Stata for Analysis**

Letās load the inbuiltĀ ** tips**Ā data from Pythonās Seaborn library.

```
import seaborn as sns
tips = sns.load_dataset("tips")
tips.head()
```

We can also check the value counts for categorical data.

`tips["time"].value_counts()`

Before we send this data to Stata we need to ensure that there are no other data in Stata memory. Thus, it is good practice to clear the memory usingĀ ** clearĀ **command.

```
%%stata
clear
```

### Transferring Data from Python to Stata

To transfer the tips data to Stata we need to use *-d datasetname*

We can now useĀ ** list in 1/5**Ā to print top five observations

```
%%stata -d tips
list in 1/5
```

Letās summarize the data usingĀ ** summarizeĀ **command. It only produced summary for the continuous data, i.e.,

**andĀ**

*Ā total_bill, tipĀ*

*size.*```
%%stata
summarize
```

Letās see the data format/type using theĀ ** describeĀ **command.

```
%%stata
describe
```

You can observe that ** sex, smoker, day **and

**are in string format.The next step is to encode the labels and transform them into categorical variables (**

*time***and**

*sex, smoker, day***).**

*time***Label sex**

We label the sex ā 0: Male and 1: Female and save it into another variable calledĀ ** sex_enc**.

```
%%stata
label define sex_lab 0 "Male" 1 "Female"
encode sex, gen(sex_enc) label(sex_lab)
tab sex_enc
```

**Label smoker**

We label the smoker status ā 0: No and 1: Yes and save it into another variable calledĀ ** smoker_enc**.

```
%%stata
label define smoker 0 "No" 1 "Yes"
encode smoker, gen(smoker_enc) label(smoker)
tab smoker_enc
```

**Label time**

We label the time ā 0: Lunch and 1: Dinner and save it into another variable calledĀ ** time_enc**.

```
%%stata
label define time_lab 0 "Lunch" 1 "Dinner"
encode time, gen(time_enc) label(time_lab)
tab time_enc
```

**Label Day**

We label the Day status ā 0: Sat, 1: Sun, 2: Thur and 3: Fri and save it into another variable calledĀ ** day_enc**.

```
%%stata
label define day_lab 0 "Sat" 1 "Sun" 2 "Thur" 3 "Fri"
encode day, gen(day_enc) label(day_lab)
tab day_enc
```

**Chi-square Test of Independence**

Once we label all categorical variables, letās check whether the categorical variables are acting as it should act in Stata. Letās conduct a Chi-square test of independence and check whether sex and smoker are related. The test statistics (p>0.05) revealed that sex and smoker are independent.

```
%%stata
tab sex_enc smoker_enc, chi2
```

**Fit a Linear Regression Model**

Letās fit a linear regression usingĀ ** regĀ **Stata command. It worked as expected.

```
%%stata
reg tip total_bill ib(0).smoker_enc ib(0).sex_enc ib(0).time_enc ib(0).day_enc
```

**Compute margins**

Letās generate a margin plot by supplying the total bill from 3 to 50 at an interval of 5, while holding other variables constant.

```
%%stata
quietly margins, at(total_bill=(3(5)50))
marginsplot
```

## 3. Transferring Predictions from Stata to Python

Sometimes we may need to transfer some estimates from Stata to Python to perform any computation on that. Say, we want to transfer the margin estimate computed previously to Python. We can use the ** -doutd** and save it to

**We will use this**

*preddata.***in next step.**

*preddata*For now, letās calculate the margin again and save it in Stata asĀ ** predictions**. Now, if we print the predictions, we can see the name of the columns total_bill asĀ

**Ā and margins asĀ**

*_at1***Letās rename the columns asĀ**

*_margin.Ā***andĀ**

*total_billĀ***.**

*pr_tip*```
%%stata -doutd preddata
quietly margins, at(total_bill=(3(5)50)) saving(predictions, replace)
use predictions, clear
list _at1 _margin in 1/5
rename _at1 total_bill
rename _margin pr_tip
```

If we now access the two columns fromĀ ** preddataĀ **and print the first 5 observations in Ipython notebook. It will print the data as pandas dataframe.

`preddata[['total_bill', 'pr_tip']].head()`

Stata is a wonderful software for performing statistical analysis. Similarly, Python is a wonderful general purpose programming language. We can use both of them parallelly to harness the power to solve both statistical and machine learning related problems.

*Click hereĀ **for the **data and code*

*I hope youāve found this article useful!*