代做Midterm Exam: Web Scraping and FinTech Data Analysis代写留学生R语言

2024.04.22 - 首页 >> Matlab编程

Midterm Exam: Web Scraping and FinTech Data Analysis

NOTE: Use all available resources. You can find a solution to most of the coding problems on the Internet. Google it if you are stuck in the middle.

NOTE2: Do your best to conduct the task!!!

PART I. (10 points)

Complete the task below. Copy your code onto the answer sheet.

Visit www.freelancer.com and choose any category. Extract the titles, descriptions, tags, and prices from the first page of search results. Create four corresponding vectors, named ‘titles’, ‘descriptions’, ‘tags’, and ‘prices’, with 'prices' specifically formatted as a numeric vector. Finally, assemble these vectors into a single consolidated data frame, named ‘freelancer’.

Please note, it might be required to clean the data and use NA values to ensure the vectors are of equal length prior to merging them into a single data frame.

PART II. (15 points)

Analyze the dataset named ‘fintech,’ obtained from a prominent FinTech company based in Hong Kong. This company specializes in lending money to loan applicants. Upon receiving an application, the company's decision engine automatically classifies each loan application into one of three categories: 'approve', 'reject', or 'manual review.' For applications marked as 'manual review,' reviewers undertake a subjective assessment to further reclassify each case as either 'approve' or 'reject.’

Field description:

Field	Description
id	Loan application id
loan_amount	Requested amount in HKD
tenor	Requested repayment periods in months
age
month_of_service	The employment period of current job
residential_status	Own, Rent, Others
monthly_repayment	Monthly repayment for other existing loans
monthly_income	Average monthly income for the last three months
self-employed
bankrupted	Whether the applicant has a record of bankruptcy
housewife
currently_employed	Whether the applicant is employed in a full-time job
channel	Loan application channel
language	tc (traditional Chinese), EN (English)
manual_review	t = manually reviewed, f = otherwise
approved	t = approved, f = rejected
manual_approved	t = manually approved, f = otherwise
credit score	Higher is better
friends_facebook	No. of Facebook friends (0 indicates either no friend or the account was not provided)
time_application	Time of the day when the application was submitted
location	The location where the application was submitted
default	Whether the repayment was overdue as of June 2017

Q1. What is the average monthly income of the whole sample? What is the average monthly income of the currently employed? (1 point)

Q2. Generate the histogram of “loan_amount.” Can you find any interesting patterns? Can you guess the reason why the graph has such a shape? (1 point)

Q3. Replace the value of “friends_facebook” with NA if the value is 0. What is the average number of Facebook friends of those who have provided their Facebook accounts? (2 points)

Q4. Generate the scatterplot of “month_of_service” and “credit_score”. Can you find any relationship between them? What about “monthly_income” and “credit_score”? Confirm the relationship with the correlation tests. (2 points)

Q5. Make a new variable, named “automatic_approved,” which has the value “t” if approved by the decision engine, “f” if rejected by the decision engine, and “NA” if reviewed manually. How many cases are approved or rejected by their decision engine? How many are classified as “manual review”? (2 points)

Hint:

fintech.df$automatic_approved=ifelse(fintech.df$approved=="t" & is.na(fintech.df$manual_approved)==TRUE, "t", fintech.df$automatic_approved)

fintech.df$automatic_approved=ifelse(fintech.df$approved=="f" & is.na(fintech.df$manual_approved)==TRUE, "f", fintech.df$automatic_approved)

Q6. Compare the automatically approved cases and the automatically rejected cases. Conduct statistical tests on variables available in the dataset to answer the following subquestions. (5 points)

1) Are they different in “loan_amount”?

2) Are they different in “tenor”?

3) Are they different in “age”?

4) Are they different in “month_of_service”?

5) Are they different in “residential_status”?

6) Are they different in “monthly_income”?

7) Are they different in “bankrupted”?

8) Are they different in “currently_employed”?

9) Are they different in “channel”?

10) Are they different in “language”?

11) Are they different in “credit_score”?

12) Are they different in “friends_facebook”?

13) Are they different in “location_application”?

Q7. Based on the analysis results above, provide the logic behind the decision engine to judge “approve”. (2 points)

Guideline

Submit 1) your answer sheet, 2) R-code used for the analysis, and 3) the signed declaration as a zip file to the LeranUs System. Please include your student number and name in the header of the answer sheet. Make your answer sheet formatted as follows: Times New Roman, 12-point font, double-spaced only (not 1.5), 1-inch margins all around 8.5 x 11-inch paper (or A4).