DA with SAS – Practice Questions

Q1 Import a CSV file students.csv into SAS and create a dataset named students_data. Display the first 10 observations. ▶

SAS Code

/* Step 1: Import CSV file into SAS dataset */
PROC IMPORT DATAFILE="students.csv"
    OUT=students_data
    DBMS=CSV
    REPLACE;
    GETNAMES=YES;
RUN;

/* Step 2: Display first 10 observations */
PROC PRINT DATA=students_data (OBS=10);
RUN;

Explanation

PROC IMPORT — Reads the CSV file and creates dataset students_data
DBMS=CSV — Specifies the file type as CSV
GETNAMES=YES — Uses the first row as variable names
PROC PRINT (OBS=10) — Displays only the first 10 rows

Q2 Create a SAS dataset employee_data using DATALINES with variables Emp_ID, Name, Department, Salary. Enter at least 5 records and display the dataset. ▶

SAS Code

/* Step 1: Create dataset using DATALINES */
DATA employee_data;
    INPUT Emp_ID Name $ Department $ Salary;
    DATALINES;
101 Amit HR 30000
102 Neha IT 45000
103 Raj Sales 35000
104 Priya Finance 40000
105 Karan IT 50000
;
RUN;

/* Step 2: Display the dataset */
PROC PRINT DATA=employee_data;
RUN;

Explanation

DATA employee_data; — Creates a new dataset
INPUT — Defines variables ($ for character variables like Name & Department)
DATALINES — Used to manually enter data inline
PROC PRINT — Displays the dataset

Q3 Display the contents and structure of dataset sales_data using appropriate SAS procedure. ▶

SAS Code

/* Display structure and metadata of dataset */
PROC CONTENTS DATA=sales_data;
RUN;

Explanation

PROC CONTENTS — Shows full structure of the dataset
Displays variable names, data types (numeric/character), length, labels, formats, number of observations and variables

Tip: For a shorter output showing only variables, use PROC CONTENTS DATA=sales_data SHORT;

Q4 From dataset student_marks, display only students whose Marks are greater than 60. ▶

SAS Code

/* Display students with Marks > 60 */
PROC PRINT DATA=student_marks;
    WHERE Marks > 60;
RUN;

Explanation

WHERE Marks > 60; — Filters dataset to show only students scoring above 60
PROC PRINT — Displays the filtered records

Alternative (DATA step)

DATA filtered_students;
    SET student_marks;
    IF Marks > 60;
RUN;
PROC PRINT DATA=filtered_students;
RUN;

Q5 Sort dataset customer_data by Customer_ID and display the sorted dataset. ▶

SAS Code

/* Step 1: Sort dataset */
PROC SORT DATA=customer_data
    OUT=sorted_customer_data;
    BY Customer_ID;
RUN;

/* Step 2: Display sorted dataset */
PROC PRINT DATA=sorted_customer_data;
RUN;

Explanation

PROC SORT — Sorts the dataset
BY Customer_ID; — Sorts in ascending order by Customer_ID
OUT=sorted_customer_data — Stores result in a new dataset

Tip: For descending order use BY DESCENDING Customer_ID;

Q6 Remove duplicate records from dataset customer_data based on Customer_ID. ▶

SAS Code

/* Remove duplicate records */
PROC SORT DATA=customer_data
    OUT=unique_customer_data
    NODUPKEY;
    BY Customer_ID;
RUN;

PROC PRINT DATA=unique_customer_data;
RUN;

Explanation

NODUPKEY — Removes duplicate records based on the BY variable
BY Customer_ID; — Keeps only the first occurrence of each Customer_ID
OUT=unique_customer_data — Stores the deduplicated result

Q7 Merge two datasets student_info and student_marks using Student_ID and display the merged dataset. ▶

SAS Code

/* Step 1: Sort both datasets by Student_ID */
PROC SORT DATA=student_info;
    BY Student_ID;
RUN;

PROC SORT DATA=student_marks;
    BY Student_ID;
RUN;

/* Step 2: Merge datasets */
DATA merged_data;
    MERGE student_info student_marks;
    BY Student_ID;
RUN;

/* Step 3: Display merged dataset */
PROC PRINT DATA=merged_data;
RUN;

Explanation

Both datasets must be sorted by the common variable (Student_ID) before merging
MERGE — Combines datasets horizontally
BY Student_ID; — Matches records based on Student_ID

Q8 Combine datasets classA and classB into a single dataset using SET. ▶

SAS Code

/* Combine datasets (vertical stack) */
DATA combined_class;
    SET classA classB;
RUN;

PROC PRINT DATA=combined_class;
RUN;

Explanation

SET classA classB; — Appends (stacks) classB below classA vertically
DATA combined_class; — Creates a new combined dataset

Q9 From dataset employee_data, keep only variables Emp_ID, Name, Salary and display the dataset. ▶

SAS Code

/* Keep only selected variables */
DATA filtered_employee;
    SET employee_data;
    KEEP Emp_ID Name Salary;
RUN;

PROC PRINT DATA=filtered_employee;
RUN;

Explanation

KEEP Emp_ID Name Salary; — Retains only these columns
All other variables are dropped from the output dataset

Q10 Rename variable Salary to Monthly_Salary in dataset employee_data. ▶

SAS Code

/* Rename variable */
DATA employee_data_renamed;
    SET employee_data;
    RENAME Salary = Monthly_Salary;
RUN;

PROC PRINT DATA=employee_data_renamed;
RUN;

Explanation

RENAME Salary = Monthly_Salary; — Changes the variable name
A new dataset is created with the updated variable name

Q11 Identify missing values in variable Marks from dataset exam_data and replace them with 0. ▶

SAS Code

/* Replace missing values with 0 */
DATA updated_exam_data;
    SET exam_data;
    IF Marks = . THEN Marks = 0;
RUN;

PROC PRINT DATA=updated_exam_data;
RUN;

Explanation

Marks = . — Represents missing numeric values in SAS (dot notation)
IF Marks = . THEN Marks = 0; — Replaces missing values with 0

Q12 From dataset customer_details containing variable Full_Name, extract the first name using SAS functions. ▶

SAS Code

DATA customer_firstname;
    SET customer_details;
    First_Name = SCAN(Full_Name, 1);
RUN;

PROC PRINT DATA=customer_firstname;
RUN;

Explanation

SCAN(Full_Name, 1) — Extracts the first word (first name) from the Full_Name string

Q13 Extract the first three characters from a text variable using SUBSTR. ▶

SAS Code

/* Extract first three characters */
DATA new_data;
    SET your_dataset;
    First_Three = SUBSTR(text_variable, 1, 3);
RUN;

PROC PRINT DATA=new_data;
RUN;

Explanation

SUBSTR(text_variable, 1, 3) — Extracts 3 characters starting from position 1
First_Three — New variable storing the extracted result

Q14 Remove spaces from a character variable using COMPRESS. ▶

SAS Code

/* Remove spaces from character variable */
DATA new_data;
    SET your_dataset;
    Clean_Text = COMPRESS(text_variable, ' ');
RUN;

PROC PRINT DATA=new_data;
RUN;

Explanation

COMPRESS(text_variable, ' ') — Removes all spaces from the variable
Clean_Text — New variable without spaces

Q15 Convert dataset monthly_sales from long format to wide format using PROC TRANSPOSE. ▶

SAS Code

/* Sort first */
PROC SORT DATA=monthly_sales;
    BY ID;
RUN;

/* Convert long format to wide format */
PROC TRANSPOSE DATA=monthly_sales
    OUT=wide_sales;
    BY ID;
    ID Month;
    VAR Sales;
RUN;

PROC PRINT DATA=wide_sales;
RUN;

Explanation

BY ID; — Groups data by identifier (Customer/Product)
ID Month; — Creates new columns based on month values
VAR Sales; — Values to populate the new columns
OUT=wide_sales — Output dataset in wide format

Note: Always sort the dataset by ID before using PROC TRANSPOSE.

Q1 Using dataset employee_salary, calculate the mean, minimum, and maximum of Salary. ▶

SAS Code

PROC MEANS DATA=employee_salary MEAN MIN MAX;
    VAR Salary;
RUN;

Explanation

PROC MEANS — Calculates descriptive statistics
MEAN MIN MAX — Specifies which statistics to display
VAR Salary; — Specifies the variable to analyze

Q2 Generate a frequency table for Department variable using PROC FREQ. ▶

SAS Code

PROC FREQ DATA=employee_salary;
    TABLES Department;
RUN;

Explanation

PROC FREQ — Produces frequency tables
TABLES Department; — Shows count and percentage for each unique department

Q3 Using dataset student_marks, generate frequency distribution for Subject. ▶

SAS Code

PROC FREQ DATA=student_marks;
    TABLES Subject;
RUN;

Q4 Using dataset sales_data, calculate mean and sum of Quantity and Price. ▶

SAS Code

PROC MEANS DATA=sales_data MEAN SUM;
    VAR Quantity Price;
RUN;

Q5 Using dataset employee_salary, calculate average salary for each department. ▶

SAS Code

PROC MEANS DATA=employee_salary MEAN;
    CLASS Department;
    VAR Salary;
RUN;

Explanation

CLASS Department; — Groups results by department
Outputs average salary per department without needing to sort first

Q6 Using dataset student_data, generate frequency distribution for Course variable. ▶

SAS Code

PROC FREQ DATA=student_data;
    TABLES Course;
RUN;

Q7 Using dataset student_scores, perform a t-test to compare marks of male and female students. ▶

SAS Code

PROC TTEST DATA=student_scores;
    CLASS Gender;
    VAR Marks;
RUN;

Explanation

PROC TTEST — Performs an independent samples t-test
CLASS Gender; — Defines the two groups to compare
VAR Marks; — The continuous variable being compared

Q8 Using dataset exam_result, perform Chi-square test between Gender and Pass/Fail. ▶

SAS Code

PROC FREQ DATA=exam_result;
    TABLES Gender*Result / CHISQ;
RUN;

Explanation

Gender*Result — Creates a cross-tabulation (contingency table)
/ CHISQ — Requests the Chi-square test statistic and p-value

Q9 Create a new variable Total_Sales = Quantity × Price in dataset sales_data. ▶

SAS Code

DATA sales_updated;
    SET sales_data;
    Total_Sales = Quantity * Price;
RUN;

PROC PRINT DATA=sales_updated;
RUN;

Q10 Create variable Discount_Price = Price × 0.9 and display the dataset. ▶

SAS Code

DATA sales_updated;
    SET sales_data;
    Discount_Price = Price * 0.9;
RUN;

PROC PRINT DATA=sales_updated;
RUN;

Q11 Using dataset employee_salary, calculate mean and standard deviation of salary. ▶

SAS Code

PROC MEANS DATA=employee_salary MEAN STD;
    VAR Salary;
RUN;

Q12 Perform simple linear regression using dataset advertising_data — Sales as dependent, TV as independent variable. ▶

SAS Code

PROC REG DATA=advertising_data;
    MODEL Sales = TV;
RUN;
QUIT;

Explanation

PROC REG — Performs OLS regression
MODEL Sales = TV; — Sales is the dependent variable, TV is the predictor
QUIT; — Required to exit interactive PROC REG

Q13 Perform multiple regression using Sales as dependent variable and TV and Radio as independent variables. ▶

SAS Code

PROC REG DATA=advertising_data;
    MODEL Sales = TV Radio;
RUN;
QUIT;

Explanation

Adding multiple predictors after = performs multiple linear regression
Output includes coefficients, p-values, and R² for the full model

Q14 Export SAS output results to a PDF file using ODS. ▶

SAS Code

ODS PDF FILE="output.pdf";

PROC PRINT DATA=employee_salary;
RUN;

ODS PDF CLOSE;

Explanation

ODS PDF FILE= — Opens a PDF output destination
All procedures between ODS PDF and ODS PDF CLOSE are written to the file
ODS PDF CLOSE; — Closes and saves the PDF

Q15 Generate a summary report showing average salary by department. ▶

SAS Code

PROC MEANS DATA=employee_salary MEAN;
    CLASS Department;
    VAR Salary;
RUN;

Explanation

CLASS Department; — Groups results by each department
Output is a clean summary table — average salary per department