代写ITEC 320 Assignment 2

- 首页 >> CS

ITEC 320

Please submit your assignment via Canvas, as a single Word file. 

Import the InternetUsers dataset into RapidMiner and save it in the data folder of your Local Repository. (This data is the Chapter 3 dataset from the textbook.) Do not change any of the formatting of the dataset when importing.

1. The analysts working with this data do not need all of these variables, and are mainly interested in experienced users who regularly use Google.  Build a process in RapidMiner that does the following:

-Selects only the attributes in columns A through I (Gender, Race, Birth_Year, Marital_Status, Years_on_Internet, Hours_Per_Day, Preferred_Browser, Preferred_Search_Engine, Preferred_Email)
-Removes all rows with Years_on_Internet less than 5
-Keeps only rows with Preferred_Search_Engine = Google

Show a screenshot of the Process panel.  (You do not need to include the Parameters panel.)

2. Run your process from part a.  Show a screenshot of the Statistics output in the Results view, with Hours_Per_Day expanded (that is, with the histogram and deviation visible for the Hours_Per_Day attribute).

3. Using all of the rows (“examples”) in the dataset, generate a correlation matrix for the attributes: Birth_Year, Years_on_Internet, Hours_Per_Day.  Show a screenshot of the results.

4. Briefly explain the results from the previous question.  What does this matrix tell us about the relationships between these three attributes?  (Note that a higher Birth_Year means the person is younger.)

 


站长地图