💬 Request a Quote, It's FREE!!!

In SAS Viya for Learners, go to 'Build Models'.  This will take you to Model Studio. Start a new project for use with this data set, if you do not already have one created. On the 'New Project' screen, -Give it a

Note: The requirements for Project 2 are almost identical to those for Project 1.  In Project 2, our data set is considerably larger than for Project 1 in both width (number of fields) and depth (number of rows).

In SAS Viya for Learners, go to 'Build Models'.  This will take you to Model Studio.

Start a new project for use with this data set, if you do not already have one created.

On the 'New Project' screen,

-Give it a name -Leave Type as 'Data Mining and Machine Learning' -Leave Template as 'Blank Template', then: Import the .csv file telecom_customer_data into Viya.  The data is located in Canvas under 'Modules'-->'Period 3'-->'Major Project Data'.  Also make sure to download the data dictionary.

Once the data is imported, it will be available like any of the other Viya data sets, though our churn data is in your personal partition.  Under 'Data' (upper left), press the 'Browse' button.  You will be taken to a new window.  Select the data set called 'telecom_customer_data ' and press 'OK'.  The data table will then be open for examination.

Go down to the 'Advanced' button and click it.  You will be taken to another page for more options.  Click on 'Partition Data' and set the following:

Method = 'Stratify' Training = 50 Validation = 50 Test = 0 Click 'Save' and you will be returned to the New Project screen.  Click 'Save' again to create the new project.

The data will load, and the Model Studio project will open on the Data tab by default.

Scroll down to the variable 'churn'.  Click the check box next to the variable name, or anywhere on that row.  The box will be checked.  On the right panel, verify that 'Role' is set to 'Target' and that 'Level' is set to 'Binary'.

Click on ‘Pipelines’ in the upper left.  You should see a node representing your data in your workspace.  Right-click on the Data node and select ‘Run’.  The appearance of the green check mark indicated that Viya has finished processing.

Add a Data Exploration node.  Execute it with the default settings.  When it has finished, you will have access to summaries of each of the input variables for your project.  Using this data, please make prepare the first part of the assignment for Project 1.

Deliverables:

For the categorical variables (‘Class Variables’ in Viya), please prepare a table (or more than one table, if needed) listing them.  For each categorical variable, note the number of levels for each one.  Also, note the percentages of each variable that are missing (NULL in database parlance) and the value for the mode, as well as the percentages of each mode value for the variables listed.  Also, were missing values (NULLs) replaced with some sort of text value such as 'missing', 'unknown', etc.?  Technically, these replaced missing values/NULLs are not really 'missing', but they can present problems of their own if we find that this variable has predictive ability in our model(s).

Discuss your findings.  Is missing data a problem?  Do any variables have too many missing values?  Do enough variables have enough missing values, in aggregate, so that you are concerned because it will mean losing too much data for the model-building process?  Do you believe that any categorical values are too close to being univariate to be of use?

For the numeric variables (‘Interval Variables’ in Viya), prepare another table listing those.  For each numeric variable, note the percentage of each which are missing (NULL).  Do you think that missing values are a problem?  Also, if you combine the results of the relative proportions of missing values for both categorical and numeric values, is there more concern about the missing data?  Missing data is a huge problem in data mining, and it needs to be addressed over the entire data set.

Does it appear that missing values in any of the numeric variables have been replaced with a number of some sort?  Replacing missing values with a zero (0) or -1 tends to be commonly done.  Be careful of variables that count/measure something - they can be heavily populated with zeroes but are still good as inputs.

Also, in that table, note the shapes of the histograms produced by Viya for each of the variables – are they left skewed, right-skewed, normal, quasi-normal, uniform, etc.?  Also, suggest a transformation for each, if there is a need to transform the data.

End Part A – data exploration

WhatsApp