Performance of Freq of Categorical Data — SAS vs Python

Photo by Glenn on Unsplash

Applications/Resources Used:

1. Normal Proc Freq

SAS

FILENAME Gov_C '/folders/myfolders/Assignments/governors_county.csv';PROC IMPORT DATAFILE=Gov_C DBMS=CSV OUT=WORK.Gov_C_SAS;
GETNAMES=YES;
RUN;
proc freq data=Gov_C_SAS;
tables state;
run;

Python

import pandas as pddata = pd.read_csv(“../input/us-election-2020/governors_county.csv”);
datax = data[‘state’].value_counts().sort_index()
datay = pd.DataFrame({
'state': datax.index,
'Frequency': datax.values,
'Percent': ((datax.values/datax.values.sum())*100).round(2),
'Cumulative Frequency': datax.values.cumsum(),
'Cumulative Percent': ((datax.values.cumsum()/datax.values.sum())*100).round(2)
})

2. Sort the rows from most frequent to least frequent

proc freq data=Gov_C_SAS order=freq;
tables state;
run;
datax = data[‘state’].value_counts()
datay = pd.DataFrame({
‘state’: datax.index,
‘Frequency’: datax.values,
‘Percent’: ((datax.values/datax.values.sum())*100).round(2),
‘Cumulative Frequency’: datax.values.cumsum(),
‘Cumulative Percent’: ((datax.values.cumsum()/datax.values.sum())*100).round(2)
})
datay

3. Additional options NOCUM, NOPERCENT

proc freq data = Gov_C_SAS;
tables state /nopercent nocum;
run;
datax = data[‘state’].value_counts().sort_index()
datay = pd.DataFrame({
‘state’: datax.index,
‘Frequency’: datax.values
})
datay

4. Creating a Cross Tab

proc freq data=Gov_C_SAS;
tables county*state / norow nocol nopercent;
run;
datab = pd.crosstab(data.county, data.state, margins=True, margins_name=”Total”)
datab

That’s it with the comparison. We saw that the pandas work quite fast as compared to SAS (University Edition), to get various frequencies on categorical columns. Do let me know your thoughts below.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store