Module 1 Data Handling, Preparation and Transformation
Q1 Import a CSV file students.csv into SAS and create a dataset named students_data. Display the first 10 observations.
SAS Code
/* Step 1: Import CSV file into SAS dataset */
PROC IMPORT DATAFILE="students.csv"
    OUT=students_data
    DBMS=CSV
    REPLACE;
    GETNAMES=YES;
RUN;

/* Step 2: Display first 10 observations */
PROC PRINT DATA=students_data (OBS=10);
RUN;

Explanation

  • PROC IMPORT — Reads the CSV file and creates dataset students_data
  • DBMS=CSV — Specifies the file type as CSV
  • GETNAMES=YES — Uses the first row as variable names
  • PROC PRINT (OBS=10) — Displays only the first 10 rows
Q2 Create a SAS dataset employee_data using DATALINES with variables Emp_ID, Name, Department, Salary. Enter at least 5 records and display the dataset.
SAS Code
/* Step 1: Create dataset using DATALINES */
DATA employee_data;
    INPUT Emp_ID Name $ Department $ Salary;
    DATALINES;
101 Amit HR 30000
102 Neha IT 45000
103 Raj Sales 35000
104 Priya Finance 40000
105 Karan IT 50000
;
RUN;

/* Step 2: Display the dataset */
PROC PRINT DATA=employee_data;
RUN;

Explanation

  • DATA employee_data; — Creates a new dataset
  • INPUT — Defines variables ($ for character variables like Name & Department)
  • DATALINES — Used to manually enter data inline
  • PROC PRINT — Displays the dataset
Q3 Display the contents and structure of dataset sales_data using appropriate SAS procedure.
SAS Code
/* Display structure and metadata of dataset */
PROC CONTENTS DATA=sales_data;
RUN;

Explanation

  • PROC CONTENTS — Shows full structure of the dataset
  • Displays variable names, data types (numeric/character), length, labels, formats, number of observations and variables
Tip: For a shorter output showing only variables, use PROC CONTENTS DATA=sales_data SHORT;
Q4 From dataset student_marks, display only students whose Marks are greater than 60.
SAS Code
/* Display students with Marks > 60 */
PROC PRINT DATA=student_marks;
    WHERE Marks > 60;
RUN;

Explanation

  • WHERE Marks > 60; — Filters dataset to show only students scoring above 60
  • PROC PRINT — Displays the filtered records
Alternative (DATA step)
DATA filtered_students;
    SET student_marks;
    IF Marks > 60;
RUN;
PROC PRINT DATA=filtered_students;
RUN;
Q5 Sort dataset customer_data by Customer_ID and display the sorted dataset.
SAS Code
/* Step 1: Sort dataset */
PROC SORT DATA=customer_data
    OUT=sorted_customer_data;
    BY Customer_ID;
RUN;

/* Step 2: Display sorted dataset */
PROC PRINT DATA=sorted_customer_data;
RUN;

Explanation

  • PROC SORT — Sorts the dataset
  • BY Customer_ID; — Sorts in ascending order by Customer_ID
  • OUT=sorted_customer_data — Stores result in a new dataset
Tip: For descending order use BY DESCENDING Customer_ID;
Q6 Remove duplicate records from dataset customer_data based on Customer_ID.
SAS Code
/* Remove duplicate records */
PROC SORT DATA=customer_data
    OUT=unique_customer_data
    NODUPKEY;
    BY Customer_ID;
RUN;

PROC PRINT DATA=unique_customer_data;
RUN;

Explanation

  • NODUPKEY — Removes duplicate records based on the BY variable
  • BY Customer_ID; — Keeps only the first occurrence of each Customer_ID
  • OUT=unique_customer_data — Stores the deduplicated result
Q7 Merge two datasets student_info and student_marks using Student_ID and display the merged dataset.
SAS Code
/* Step 1: Sort both datasets by Student_ID */
PROC SORT DATA=student_info;
    BY Student_ID;
RUN;

PROC SORT DATA=student_marks;
    BY Student_ID;
RUN;

/* Step 2: Merge datasets */
DATA merged_data;
    MERGE student_info student_marks;
    BY Student_ID;
RUN;

/* Step 3: Display merged dataset */
PROC PRINT DATA=merged_data;
RUN;

Explanation

  • Both datasets must be sorted by the common variable (Student_ID) before merging
  • MERGE — Combines datasets horizontally
  • BY Student_ID; — Matches records based on Student_ID
Q8 Combine datasets classA and classB into a single dataset using SET.
SAS Code
/* Combine datasets (vertical stack) */
DATA combined_class;
    SET classA classB;
RUN;

PROC PRINT DATA=combined_class;
RUN;

Explanation

  • SET classA classB; — Appends (stacks) classB below classA vertically
  • DATA combined_class; — Creates a new combined dataset
Q9 From dataset employee_data, keep only variables Emp_ID, Name, Salary and display the dataset.
SAS Code
/* Keep only selected variables */
DATA filtered_employee;
    SET employee_data;
    KEEP Emp_ID Name Salary;
RUN;

PROC PRINT DATA=filtered_employee;
RUN;

Explanation

  • KEEP Emp_ID Name Salary; — Retains only these columns
  • All other variables are dropped from the output dataset
Q10 Rename variable Salary to Monthly_Salary in dataset employee_data.
SAS Code
/* Rename variable */
DATA employee_data_renamed;
    SET employee_data;
    RENAME Salary = Monthly_Salary;
RUN;

PROC PRINT DATA=employee_data_renamed;
RUN;

Explanation

  • RENAME Salary = Monthly_Salary; — Changes the variable name
  • A new dataset is created with the updated variable name
Q11 Identify missing values in variable Marks from dataset exam_data and replace them with 0.
SAS Code
/* Replace missing values with 0 */
DATA updated_exam_data;
    SET exam_data;
    IF Marks = . THEN Marks = 0;
RUN;

PROC PRINT DATA=updated_exam_data;
RUN;

Explanation

  • Marks = . — Represents missing numeric values in SAS (dot notation)
  • IF Marks = . THEN Marks = 0; — Replaces missing values with 0
Q12 From dataset customer_details containing variable Full_Name, extract the first name using SAS functions.
SAS Code
DATA customer_firstname;
    SET customer_details;
    First_Name = SCAN(Full_Name, 1);
RUN;

PROC PRINT DATA=customer_firstname;
RUN;

Explanation

  • SCAN(Full_Name, 1) — Extracts the first word (first name) from the Full_Name string
Q13 Extract the first three characters from a text variable using SUBSTR.
SAS Code
/* Extract first three characters */
DATA new_data;
    SET your_dataset;
    First_Three = SUBSTR(text_variable, 1, 3);
RUN;

PROC PRINT DATA=new_data;
RUN;

Explanation

  • SUBSTR(text_variable, 1, 3) — Extracts 3 characters starting from position 1
  • First_Three — New variable storing the extracted result
Q14 Remove spaces from a character variable using COMPRESS.
SAS Code
/* Remove spaces from character variable */
DATA new_data;
    SET your_dataset;
    Clean_Text = COMPRESS(text_variable, ' ');
RUN;

PROC PRINT DATA=new_data;
RUN;

Explanation

  • COMPRESS(text_variable, ' ') — Removes all spaces from the variable
  • Clean_Text — New variable without spaces
Q15 Convert dataset monthly_sales from long format to wide format using PROC TRANSPOSE.
SAS Code
/* Sort first */
PROC SORT DATA=monthly_sales;
    BY ID;
RUN;

/* Convert long format to wide format */
PROC TRANSPOSE DATA=monthly_sales
    OUT=wide_sales;
    BY ID;
    ID Month;
    VAR Sales;
RUN;

PROC PRINT DATA=wide_sales;
RUN;

Explanation

  • BY ID; — Groups data by identifier (Customer/Product)
  • ID Month; — Creates new columns based on month values
  • VAR Sales; — Values to populate the new columns
  • OUT=wide_sales — Output dataset in wide format
Note: Always sort the dataset by ID before using PROC TRANSPOSE.
Module 2 Statistical Analysis and Reporting
Q1 Using dataset employee_salary, calculate the mean, minimum, and maximum of Salary.
SAS Code
PROC MEANS DATA=employee_salary MEAN MIN MAX;
    VAR Salary;
RUN;

Explanation

  • PROC MEANS — Calculates descriptive statistics
  • MEAN MIN MAX — Specifies which statistics to display
  • VAR Salary; — Specifies the variable to analyze
Q2 Generate a frequency table for Department variable using PROC FREQ.
SAS Code
PROC FREQ DATA=employee_salary;
    TABLES Department;
RUN;

Explanation

  • PROC FREQ — Produces frequency tables
  • TABLES Department; — Shows count and percentage for each unique department
Q3 Using dataset student_marks, generate frequency distribution for Subject.
SAS Code
PROC FREQ DATA=student_marks;
    TABLES Subject;
RUN;
Q4 Using dataset sales_data, calculate mean and sum of Quantity and Price.
SAS Code
PROC MEANS DATA=sales_data MEAN SUM;
    VAR Quantity Price;
RUN;
Q5 Using dataset employee_salary, calculate average salary for each department.
SAS Code
PROC MEANS DATA=employee_salary MEAN;
    CLASS Department;
    VAR Salary;
RUN;

Explanation

  • CLASS Department; — Groups results by department
  • Outputs average salary per department without needing to sort first
Q6 Using dataset student_data, generate frequency distribution for Course variable.
SAS Code
PROC FREQ DATA=student_data;
    TABLES Course;
RUN;
Q7 Using dataset student_scores, perform a t-test to compare marks of male and female students.
SAS Code
PROC TTEST DATA=student_scores;
    CLASS Gender;
    VAR Marks;
RUN;

Explanation

  • PROC TTEST — Performs an independent samples t-test
  • CLASS Gender; — Defines the two groups to compare
  • VAR Marks; — The continuous variable being compared
Q8 Using dataset exam_result, perform Chi-square test between Gender and Pass/Fail.
SAS Code
PROC FREQ DATA=exam_result;
    TABLES Gender*Result / CHISQ;
RUN;

Explanation

  • Gender*Result — Creates a cross-tabulation (contingency table)
  • / CHISQ — Requests the Chi-square test statistic and p-value
Q9 Create a new variable Total_Sales = Quantity × Price in dataset sales_data.
SAS Code
DATA sales_updated;
    SET sales_data;
    Total_Sales = Quantity * Price;
RUN;

PROC PRINT DATA=sales_updated;
RUN;
Q10 Create variable Discount_Price = Price × 0.9 and display the dataset.
SAS Code
DATA sales_updated;
    SET sales_data;
    Discount_Price = Price * 0.9;
RUN;

PROC PRINT DATA=sales_updated;
RUN;
Q11 Using dataset employee_salary, calculate mean and standard deviation of salary.
SAS Code
PROC MEANS DATA=employee_salary MEAN STD;
    VAR Salary;
RUN;
Q12 Perform simple linear regression using dataset advertising_data — Sales as dependent, TV as independent variable.
SAS Code
PROC REG DATA=advertising_data;
    MODEL Sales = TV;
RUN;
QUIT;

Explanation

  • PROC REG — Performs OLS regression
  • MODEL Sales = TV; — Sales is the dependent variable, TV is the predictor
  • QUIT; — Required to exit interactive PROC REG
Q13 Perform multiple regression using Sales as dependent variable and TV and Radio as independent variables.
SAS Code
PROC REG DATA=advertising_data;
    MODEL Sales = TV Radio;
RUN;
QUIT;

Explanation

  • Adding multiple predictors after = performs multiple linear regression
  • Output includes coefficients, p-values, and R² for the full model
Q14 Export SAS output results to a PDF file using ODS.
SAS Code
ODS PDF FILE="output.pdf";

PROC PRINT DATA=employee_salary;
RUN;

ODS PDF CLOSE;

Explanation

  • ODS PDF FILE= — Opens a PDF output destination
  • All procedures between ODS PDF and ODS PDF CLOSE are written to the file
  • ODS PDF CLOSE; — Closes and saves the PDF
Q15 Generate a summary report showing average salary by department.
SAS Code
PROC MEANS DATA=employee_salary MEAN;
    CLASS Department;
    VAR Salary;
RUN;

Explanation

  • CLASS Department; — Groups results by each department
  • Output is a clean summary table — average salary per department