# PSYM201代做、代做R编程设计

- 首页 >> Algorithm 算法Page 1 of 11

PSYM201 Advanced Statistics

Large Data Set Analysis

Purpose

This assignment represents 20% of your overall mark for the module. The aim is to develop your skills in

analysing the kind of big, messy data sets that you are likely to encounter in the course of your own

research, and in reporting the results of your analyses in a clear, concise format so that others can

understand what you did. These are essential skills that you will need for your research dissertation.

Guidelines

You have a choice of three data sets to analyse, introduced below. Note that you should only report your

analysis of one of these data sets. The three data sets are roughly aligned with the three MSc research

programmes, though you are free to analyse whichever one you prefer. For each data set we have

provided five research questions for you to answer (using appropriate statistical analyses), but you are

also expected to come up with at least two additional analyses of your own.

The results of your analyses, including an explanation of the statistical methods you used, should be

written up in the format of a ‘Results’ section from a scientific paper. Two real examples of this, from

different scientific journals, are given at the end of this document. You are advised to consult the practical

answers and the literature for other examples of how to report the particular analyses you have chosen.

The text of your report must not exceed two sides of A4; note that any text beyond this page limit will

be ignored by the marker. You may find it helpful to include a ‘Statistical analysis’ paragraph at the start

outlining your general methods of analysis (see examples); otherwise you should incorporate these

details in the rest of the text. At the end of the text you are encouraged to present figures and tables to

illustrate the main results of interest, up to a maximum of 5 figures and 5 tables (note that figures and

tables and their associated captions are not included in the page count). Finally, you must copy and paste

your R code (unlimited length) at the end of the document, so that your analyses can be reproduced.

The general guidance is to be clear and concise. Provide the minimum amount of detail for the reader to

understand how you analysed the data, so that they could reproduce your analysis using the same data

set. Here are some more detailed recommendations, adapted from the journal Animal Behaviour:

The text should complement material given in tables or figures but should not directly repeat it. Give full

details of statistical analysis either in the text or in tables or figure captions. Include the type of test, the

precise data to which it was applied, the value of the relevant statistic, the sample size and/or degrees of

freedom, and the probability level.

Descriptive statistics can be given as means and standard errors/standard deviations or as medians and

interquartile ranges/confidence limits, with their associated sample sizes. For significance tests, give the

name of the test, the test statistic and its value, the degrees of freedom or sample size (whichever is the

convention for the test) and the P value. The exact format for presenting statistics (e.g. APA style, Animal

Behaviour style) is up to you, but you must be consistent.

P values should be quoted as an exact probability value wherever possible, rather than relative to a

threshold significance value (e.g. P = 0.468 rather than P > 0.05). Where data have been transformed for

parametric significance tests, the nature of the transformation (e.g. log, square-root, logit) and the

reason for its selection should be stated.

Ensure that all tables and figures are referred to in the text and that they are numbered consecutively, in

the order that they are first mentioned. Each figure and each table should have a caption, comprising a

brief title and a description of what is shown. Explain all symbols and abbreviations used.

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 2 of 11

Remember to pay attention to the golden rules of statistics (Lecture 2):

plot your data

use an appropriate analysis for the type of data (continuous? count? binary?)

don’t overcomplicate your analyses

think about psychological significance—not just statistical significance

beware of non-independence

and the golden rules of graphs (Lecture 4):

try to capture the patterns in your data as simply as possible

plot the data, not just a fitted model

show not just average values, but also some indication of variation (e.g. error bars)

indicate connections between measurements that are paired/repeated

ensure that axis ranges are not misleading

make it visually appealing—but avoid 3D

Marking criteria

There are two components of your mark, with equal weighting: (1) the technical correctness of your

analyses; (2) the clarity with which you communicate your statistical approach and the results.

Introduction to the data sets

A brief description of each data set is given on the next three pages, along with five questions to get you

started. These questions, along with a few (at least two) additional questions of your own, should be

addressed (using statistical analysis) in your report. Please note that there may be more than one way

of answering each question, so don’t worry if you discover that some of your colleagues are analysing

the data in a different way. Remember to check the assumptions of your analyses and transform the

data if necessary.

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 3 of 11

Data set 1: MATHS

big-MATHS-data.csv

Variable descriptions in maths-variables.xlsx

These data are from a questionnaire study investigating the relationships between mathematics anxiety

and mathematics performance. A group of 152 adults were asked to recall their experiences of learning

mathematics at school, indicating to what extent they agreed with each of 15 statements on a 7-point

Likert scale (from 1 = strongly disagree to 7 = strongly agree). The items measured the respondents’ self-

perceived ability in maths (Q1–Q3) and whether doing well, doing badly or not trying their best in maths

led to feelings of pride, stupidity or disappointment from the perspective of themselves (Q4–Q6), their

friends (Q7–Q9), their parents/guardians (Q10–Q12) or their teachers (Q13–Q15). The data also include a

measure of how much the respondents engaged with mathematics at school (mathsEngagement), the

gender of their most memorable maths teacher (TeacherGender), plus various demographic variables:

their age, gender, highest educational qualification, ethnicity, nationality, school type (state/private) and

religion.

Starting questions:

1. Were respondents’ experiences of learning mathematics at school related to their gender, ethnicity

or religious affiliation?

2. Did their experiences of learning mathematics predict their level of engagement with mathematics?

3. Highest educational qualification is significantly predicted by Q13 (“When I did well in maths at school

my teacher was proud of me”)—is this relationship mediated by engagement with mathematics?

4. Did teacher behaviour (Q13–Q15) differ depending on their own gender or the pupil’s gender?

5. Pupils with a maths engagement score of 24 or less are considered unmotivated. Which (if any) of

the demographic variables predict whether someone falls into this category?

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 4 of 11

Data set 2: PHEASANTS

big-PHEASANT-data.csv

Variable descriptions in pheasant-variables.xlsx

These data are from an ongoing study of pheasant ecology and cognition by Dr Joah Madden and his

research team (http://pecexeter.weebly.com/). Several hundred pheasants (Phasianus colchicus) were

sexed at 1 day old and split into 30 groups of about 30 pheasants each, with each group housed in a

separate pen. The ratio of males to females in each group was experimentally manipulated to create the

following three rearing treatments:

10 ‘female-biased’ groups in which there were, on average, two females per male;

10 ‘equal’ groups with a balanced sex ratio (equal numbers of males and females);

10 ‘male-biased’ groups in which there were, on average, two males per female.

(In practice the intended rearing ratios were not entirely accurate, due to unpredictable bird deaths and

errors in sex identification on Day 1; but for your analysis please treat the sex ratios as 33.3%, 50% and

66.7% males respectively.) All of the birds were raised under identical housing and feeding conditions for

the first 8 weeks of life. At 8 weeks, a variety of measures were taken including the bird’s mass (in grams)

and the length (in mm) of its wing, tail and tarsus. For males, the colour of the wattle was also scored on

a scale from 1 (pale red) to 3 (deep red). The birds were then tagged and released into woodland in mid-

Devon. After 6 months, the research team recorded whether each bird was still alive (from direct

observation or camera traps) or had not been sighted. Their expectation was that the type of rearing the

bird received would have a direct influence on its general condition and, specifically, on certain sex- and

testosterone-linked measures such as weight and tail length.

Starting questions:

1. Is tail length at 8 weeks affected by the rearing treatment?

2. Does the effect of rearing treatment on tail length differ between males and females?

3. Do differences in weight between treatments explain the observed variation in tail length?

4. Is wattle colour affected by rearing treatment in males?

5. Does survival differ between males and females?

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 5 of 11

Data set 3: OUTGROUP

big-OUTGROUP-data.csv

Variable descriptions in outgroup-variables.xlsx

These data are from a study of the relationships between out-group discrimination and self-esteem. The

self-esteem hypothesis for discrimination proposes that by establishing positive distinctness for the in-

group, in-group members are establishing positive self-esteem for themselves. To investigate this, 205

participants from the UK (n = 69), the USA (n = 68) and Australia (n = 68) were asked to act as employers

recruiting for a position. They were each given 12 résumés: 4 résumés from each of the same three

countries, with a mixture of male and female applicants. The résumés were presented in the same

random order to each participant. Participants were asked to choose which applicant they would prefer

to employ (first_choice). They were also asked to decide what salary they would award to all 12 applicants

(standardised between 10 and 120) and whether they would impose a probationary period of 6 months

for each applicant, which would make it easier to sack the applicant if they turned out not to be good at

the job. Each participant also answered 3 questions relating to their self-esteem on a scale from 1 (lowest)

to 5 (highest), both before (selfestT1) and after (selfestT2) reading the résumés and making their

decisions.

Starting questions:

1. Did self-esteem before the task differ between genders and nationalities?

2. Were participants biased towards their in-group(s) in their first choice of applicant?

3. Did discriminating in favour of in-groups lead to an increase in self-esteem ratings?

4. Did the effect of discriminating on self-esteem differ between genders or nationalities?

5. Were participants more likely to impose a probationary period on out-group members?

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 6 of 11

Example reports

Example 1

Taken from: Liu et al. (2017). Gallery and acoustic traits related to female body size mediate male mate

choice in a bark beetle. Animal Behaviour. doi: 10.1016/j.anbehav.2017.01.002.

Statistical Analysis

Data were analysed using SPSS. Residuals were examined for assumptions of normality and homogeneity of

variance. Comparisons of body weight between trapped and excavated insects for each sex and the data from

sound playback were analysed by one-way ANOVA. Time spent in the tunnel during the dual-choice

experiment was analysed with a two-way ANOVA, with dust and tunnel size as fixed effects. Body weights

of excavated females and males, fecundity of large and small females and time spent in large and small tunnels

for males in no-choice assays were analysed using paired t tests. Data from mate selection tests were analysed

using a chi-square test.

RESULTS

Male Mate Selection and Fitness Assay

Field observation

In the field, the females trapped in flight weighed on average 27.7 ± 0.2 mg (range 10.3–61.2 mg). Females

were approximately 2 mg heavier than the trapped males (25.9 ± 0.2 mg, range 9.3–57.9 mg; ANOVA: F1,2076

= 28.105, P < 0.001; Fig. 1). In the 37 pairs excavated from galleries, only 11 males were paired with females

that were smaller than themselves (χ21 = 6.081, P = 0.014). The excavated females (mean ± SE: 34.2 ± 2.1

mg, range 14.4–63.0 mg) were approximately 8 mg heavier than the males with which they were paired (26.3

± 1.5 mg, range 13.8–49.2 mg; t36 = 3.913, P < 0.0001; Fig. 1), and body weight of the paired males was

significantly correlated with that of the paired females (F1,35 = 6.616, P = 0.015; Fig. 2). Furthermore, paired

excavated females were significantly heavier than trapped females (F1,1058 = 23.497, P < 0.0001). In contrast,

body weights of the excavated and trapped males did not differ (F1,1082 = 0.155, P = 0.694; Fig. 1).

Mate selection behaviour

Dual-choice experiments in the laboratory confirmed the phenomenon observed in the field: 26 of 32 (81%)

and 19 of 23 (82%) males from large and small male treatments, respectively, selected the gallery with large

females more often than the gallery with smaller females (large male: χ21 = 12.5, P < 0.0001; small male: χ21

= 9.783, P = 0.002).

Fecundity

The chosen large females were more fecund than the unchosen small females (t21 = 4.620, P < 0.0001; Fig.

3).

Factors Mediating Mate Selection

Female size

When two females were placed at opposite ends of tunnels of the same size, males entered the tunnel of the

larger female more often than that of the smaller female (χ21 = 11.025, P = 0.001; Fig. 4a).

Tunnel size

The binary choice experiment showed that males entered large tunnels more frequently than small tunnels,

regardless of whether dust made by beetles was present or absent (dust present: χ21 = 43.548, P < 0.0001; dust

absent: χ21 = 11.250, P = 0.001; Fig. 4b). Time spent in the tunnel was affected by dust and tunnel size (dust:

F1,256 = 6.008, P = 0.015; tunnel size: F1,256 = 11.830, P = 0.001; dust*tunnel size: F1,256 = 0.004, P = 0.951).

Males walked through large tunnels more quickly, based on the amount of time they spent in each treatment

(dust present: F1,131 = 5.476, P = 0.021; dust absent: F1,78 = 5.326, P = 0.024), and males spent much more

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 7 of 11

time walking through the large tunnel when dust made by beetles was present (large tunnel: F1,153 = 5.522, P

= 0.020; small tunnel: F1,56 = 1.056, P = 0.308; Fig. 4c). Similarly, in the no-choice experiment, males walked

more rapidly through the larger tunnel (19 s) than through the small tunnel (167 s; t39 = ?7.271, P < 0.0001;

Fig. 4d).

Figure 1. Body size comparison of each sex trapped in the field and excavated from galleries. Bars indicate

mean ± standard errors. *P ≤ 0.05 (ANOVA).

Figure 2. Regression analysis between body weights of paired males and females.

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 8 of 11

Figure 3. Fecundity of females chosen and not chosen by males. Bars indicate mean ± standard errors. *P ≤

0.05 (paired t test).

Figure 4. Influence of traits on mate selection by males. (a) Female size; (b) tunnel size; (c) time spent walking

throughout the tunnel in dual-choice experiments; (d) time spent walking throughout the tunnel in no-choice

experiments. Bars indicate mean ± standard errors. *P ≤ 0.05 (ANOVA).

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 9 of 11

Example 2

Taken from: Heisz et al. (2013). Females scan more than males: a potential mechanism for sex differences

in recognition memory. Psychological Science 24, 1157–1163. doi: 10.1177/0956797612468281.

Statistical Analysis

Omnibus analysis of variance (ANOVA) tests were conducted separately for the 4-day and 1-day experiments.

We assessed sex differences in visual processing at encoding using ANOVAs conducted on the total number

of fixations at encoding and the proportion of fixations to each of the inner features. Each analysis had a

between-subjects factor of participants’ sex (female or male); proportion analyses had an additional within-

subjects factor of feature (eyes, nose, or mouth). We also assessed sex differences in both memory

performance and visual processing during the recognition test using ANOVAs conducted on d′ values and the

number of fixations; each analysis had a between-subjects factor of participants’ sex (female or male) and a

within-subjects factor of number of exposures—4-day experiment: two, four, or six; 1-day experiment: two,

three, or four. All analyses for the 1-day experiment included an additional between-subjects factor of face

sex (female or male), which did not significantly contribute to any of the effects.

RESULTS

Typical sex differences in memory are modulated by learning and test conditions

As a starting point, we examined whether the typical female advantage in recognition memory was affected

by repeated exposure to the faces or by the context under which the new material was learned (4-day

experiment vs. 1-day experiment). We observed the typical female recognition advantage over males in the 1-

day experiment (Fig. 1), which was revealed by a significant main effect of participants’ sex, F(1, 56) =

4.90, p < .05, ηp2 = .08. Repeated exposures increased memory performance in both experiments, as evidenced

by a significant linear contrast of exposure in recognition—4-day experiment: F(1, 18) = 90.40, p < .001, ηp2 =

.83; 1-day experiment: F(1, 56) = 23.71, p < .001, ηp2 = .29. For the 4-day experiment, in which faces were

repeated across multiple days, the typical female advantage was observed only for faces with the least amount

of prior exposure (i.e., two prior exposures), t(18) = 1.81 (one-tailed), p < .05, d = 0.85.

Females make more fixations at encoding, and these increased fixations produce the memory

difference

During initial encoding, females made more fixations than males (see Fig. 2)—4-day experiment: F(1, 18) =

15.99, p < .01, ηp2 = .47; 1-day experiment: F(1, 56) = 4.14, p < .05, ηp2 = .07. There were no sex differences

in the distribution of fixations across the inner features of the faces during initial encoding (Table 1), all Fs <

1. However, across repeated exposures, females directed a greater proportion of fixations to the eyes of female

faces compared with the eyes of male faces; this pattern was not observed for male or female faces among

male participants.

To examine the relation between the number of fixations made at encoding and subsequent recognition

performance, we first conducted an ANOVA on mean recognition performance with a between-subjects factor

of participants’ sex (collapsed across exposure and experiment) and observed a main effect, which

demonstrated the typical female advantage, F(1, 78) = 5.67, p < .05, ηp2 = .07. We then included the number

of fixations made at encoding as a covariate. Removing the influence of number of fixations at encoding

eliminated the female advantage in recognition memory, F(1, 77) = 2.28, n.s.

The relation between fixations at encoding and memory is a general individual difference

Ignoring the factor of sex, we tested whether the observed relation between scanning during encoding and

subsequent recognition memory reflected a general individual difference. Pearson product–moment

correlation tests were conducted on mean number of fixations at encoding and subsequent recognition

performance at test, collapsing across the factor of exposure (see Fig. 3). We observed a significant correlation

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 10 of 11

between fixations at encoding and recognition-memory performance at test for the 4-day experiment, r(18) =

.49, p < .05, r2 = .23, and the 1-day experiment, r(58) = .38, p < .01, r2 = .14.

Eye movements did not reveal sex differences in the effect of multiple exposures

During the recognition test, the number of previous exposures affected the number of fixations equally for

females and males. Both females and males showed a progressive decrease in fixations to the faces across

repeated exposures (see Table 2), as evidenced by the significant linear contrast of exposure—4-day

experiment: F(1, 18) = 19.05, p < .001, ηp2 = .51; 1-day experiment: F(1, 56) = 6.43, p < .05, ηp2 = .10.

Fig. 1. Recognition-test performance for repeatedly

presented faces as a function of the number of prior

exposures and participants’ sex. Results are shown

separately for the (a) 4-day experiment and (b) 1-day

experiment. Error bars represent standard errors of

the mean. Asterisks indicate significant differences

between males and females (*p < .05).

Fig. 2. Mean number of fixations during the

initial encoding of previously unfamiliar faces as

a function of experiment and participants’ sex.

Error bars represent standard errors of the mean.

Asterisks indicate significant differences between

males and females (*p < .05, **p < .001).

PSYM201 Advanced Statistics ? Large Data Set Analysis

Page 11 of 11

Fig. 3. Scatter plots (with best-fitting regression lines) showing recognition memory at test as a function of

mean number of fixations during encoding. Results are shown separately for the (a) 4-day experiment and

(b) 1-day experiment.