Perception

I wished I started off using  Cognitive Psychology: A Student’s Handbook by Eysenck & Keane (click image), which I recommend for its readability and theory evaluation.

 

Instead I used Cognitive Psychology by Braisby & Gellatly. I was attracted by its looks, but it is difficult in places and ultimately frustrated me.

 


 

Perception lecture in a sentence:

Theories of perception seem contradictory but they could be describing different parts of the perceptual process which achieve different goals and dealt with by different parts of the brain

 

DEFINITIONS

Perception: analysis of sensory information, basic cognitive processes that analyse info from the senses

Sensation: detection of energy by senses (vs. perception making sense of info)

Bottom up processing: sensory info is starting point – flow of info from sensory receptors to brain

Top down processing: starting with existing knowledge in the brain guiding sensory info

 

VISION

Eye Vision and Perception Diagram

– the lens & cornea focus light onto retina

– retina has receptor cells sensitive to light

– info flows from retina to primary visual cortex

– then ventral stream to inferotemporal cortex and dorsal stream to parietal cortex

 

2 distinct (but interconnected) streams of info from retina to brain:

Ventral stream: goes to brain areas involved in pattern & object recognition – perception for recognition – ‘what is it?’ Knowledge-based using stored representations

Dorsal stream: goes to brain areas that analyse position & movement of objects – perception for action – ‘where is it?’ Short-term memory available to it.

 


 

THEORIES OF PERCEPTION

GESTALT (1920s)

–  studies principles by which individual elements tend to be organised together

organising laws examples: closure, good continuation, proximity, similarity

– these determine which individual components of an image should be grouped together

– Law of Pragnanz: organisations will occur that possess the best, simplest and most stable shape (Koffka 1935)

 

BUT:

realism problem – images used were simple 2d geometric patterns

 

ALTHOUGH:

can manipulate these images to give insights into perception

 

GIBSON – DIRECT PERCEPTION (1950s)

– perception can’t be studied using graphics in a lab

– bottom up approach; perception for action

– perception doesn’t need complex cognition: objects give info – affordances – about actions/ use directly

– no role for memory: don’t need memory to interact with world

– example: features of a tree suggest that it can be climbed

– emphasis on how real objects structure light: ambient optic array

invariants: features that give us concrete information about the nature of environment

– invariants (e.g. texture gradient) are ‘picked up’ from the optic array giving us cues about position, orientation and shape

– invariants can be picked up from motion, which produces variants e.g. flow patterns

 

BUT:

– if we don’t use experience, how can we learn from mistakes?

– doesn’t tell us about the cognitive processes involved in perception

– bottom up general crit: we can only perceive something as a bus by using stored info on bus characteristics

 

ALTHOUGH:

– usefully emphasizes shortcomings of virtual world vs real world in research

 

MARR’S THEORY OF PERCEPTION (80s)

– bottom up: retinal image is starting point

– information from senses is enough to allow perception to happen but perceptual analysis is involved in object recognition

– end point of perception was objection recognition not action

– perception is handled by individual modules e.g. one handling colour, another shape

– analysis happens in 4 stages with each creating a more detailed description

grey level description: greyscale description based on intensity of light at points on the retinal image

primal sketch: edges and texture identified and outline generated

2.5D sketch: description of how surfaces related to each other and to the observer

3D object-centered description: object descriptions allowing object to be recognised from any angle

 

SUPPORT:

– created computer algorithm based on theory that was successful in finding edges of objects

 

BUT:

– this doesn’t mean this is how human perception works

– didn’t predict finding that there’s a separation of visual pathways into action and object recognitions

 

ALTHOUGH:

– the approach (not the details) has stood the test of time

 

 

CONSTRUCTIVIST APPROACH (e.g. GREGORY 1980 on)

– top down: perception uses stored knowledge + info from the senses

– what we see an object as depends on what we know

– sensory info is incomplete so we use perceptual hypotheses about what an object is & pick the one best supported by sensory info (like scientist)

– we can pick a wrong hypothesis because we’re swayed by our knowledge: visual illusions

 

SUPPORT:

– makes sense and is attractive explanation for perception

 

BUT: how do we generate hypotheses and also decide on the right one?

– how do we ‘know’ an illusion is wrong and still perceive it as wrong?

 

RECONCILIATION OF ACTION/RECOGNITION APPROACHES BASED ON STREAMS

Both perception for recognition & action needed to recognise objects and perform actions on environment:

 

Gibson’s perception for action theory matches characteristics of dorsal stream

  • dorsal can provide info for affordances (actions connected to objects)
  • is fast which it needs to be if it’s to drive action
  • short-term storage available and Gibson saw no role for memory

 

Marr & Cons perception for recognition matches characteristics of ventral stream

  • ventral specialised in analysing fine detail Marr described for discriminating
  • is slower than dorsal stream
  • draws on memory to identify objects (top down)

 

 


 

ARTICLE: GOODALE & MILNER (2006) –  “ONE BRAIN – TWO VISUAL SYSTEMS” (Seminar reading)

 

DEFINITIONS

Optic Ataxia: impaired visual control of arm reaching, hand & grip orientation [dorsal down]

Visual agnosia: can’t recognise or interpret visual info [ventral down]

– was previously thought there was a single visual system

– the Ungerleider & Miskin ‘what’ ventral vs ‘where’ dorsal

– G&M suggest ‘what’ vs ‘how’

Ventral stream: gives real time representation of world & this stored for future ref

Dorsal stream: acts entirely in real-time, guiding our actions as we make them

– subject ‘DF’ with visual form agnosia couldn’t size objects but could orient hand & fingers

 

NEUROPHYSIOLOGIAL

Mountcastle et al (1975) & Sakata (1995?)bbased on Mountcastle

  • found particular neurons activated in monkeys when picking up that were not ‘what’ or ‘where’ neurons
  • Sakata found neurons were related to the shape and size of objects grasped

– fMRI shows that dorsal and ventral streams really do exist

 

ATAXIA & AGNOSIA

– Optic ataxia subjects can’t rotate hand or orient fingers when grasping diff objects, but can locate objects

– ‘DF’ (agnosia) couldn’t orient fingers to pretend to pick up object taken away because she uses ventral which has no visual memory

– Optic ataxia subjects are the opposite: real-time fail; moments later success: ventral takes over

 

OTHER EVIDENCE

– using hollow face illusion normal subjects can flick nose in correct place, even if fooled by hollow face

 


 

ARTICLE: MILNER & GOODALE (2008) – “TWO VISUAL SYSTEMS – REVIEWED” (Seminar reading)

DEFINITIONS

– consider the 2 streams in terms of their output (not input)

– both streams process info about shape & location of objects

– ventral transforms inputs into characteristics and spatial relations

– dorsal mediates real-time control of skilled actions e.g. reaching & grasping objects

– ventral: vision for perception; dorsal vision for action

– link between perception is indirect going via e.g. memory & planning

Ventral: identify objects and select high-level [abstract] course of action inc. hand posture

Dorsal: real-time bottom-up control of action; implementation of action

– dorsal also involved in pre-specification of movement parameters

– visual info used by dorsal stream is not perceptual & can’t be accessed/ experienced consciously

 

EVIDENCE

– normal subjects fooled by Muller-Lyer illusion in length and choosing grip size to pick up rod  v|^

– but correct in programming actual grip size

– means difference between visual processes for action selection and motor programming

– grip size in flight is not affected by Ebbinghaus illusion  oOo

– if object/ illusion removed actions guided by memory from perception created from ventral

 

BUT

– danger of trying to fit the theories into what we know about the streams

– vagueness in the theories is not a good basis for trying to fit them to the streams

– theory reconciliation causes us to see the streams as separate but they are linked

– so Norman (2002): perception is both for recognition and action

 

COMBINING TOP DOWN WITH BOTTOM UP

Re-entrant pathways: 2 way communication between 2 different regions of the brain

Hupe et al (1998) – perceptual hypotheses might be checked via re-entrant pathways

  1. bottom up processing gives a low-level description
  2. perceptual hypothesis created at higher brain level
  3. hypotheses checked by another comparison with low level description

 

Di Lollo (2000) – errors on backward masking test (image a then b) provides support

 


 

OBJECT RECOGNITION

INTRO

Objects can be recognised from many angles so recognition may be based on 3D object-centred description

 

OBJECT CENTERED/ EARLY/SIMPLE 2D THEORIES

Template matching: many templates in long-term memory that we compare target objects to

Feature recognition: describing an object in terms of key features

 

BUT:

– can’t cope with variation even in simple patterns e.g. letters

– focused on recognising 2D patterns but we see in 3D so not useful

 

OBJECT CENTERED VS VIEWER CENTERED DESCRIPTIONS

– theories based on simple 2D patterns fail when applied to 3D objects

– 3D objects can be seen from any viewpoint so theories need to cope with this

– Marr: viewer-centered description needs to be turned into 3D object-centered description

– description needs to be centered on the object, not our 2D retinal image [OK?]

 

OBJECT CENTERED THEORIES

MARR & NISHIHARA’S THEORY (1978)

– based on generating 3D models with components represented by a generalised cone

– concavities (bends inwards) are the basis of dividing objects into components

– we have a catalogue of 3D models made up of 3D descriptions of objects we’ve seen

– the catalogue is a hierarchy with more detail at lower levels

– the 3D target object is compared to the catalogue from the highest level down >> match

 

SUPPORT:

allows for recognition of object from many angles

hierarchy allows recognition of entire object and has detailed info about parts

 

Warrington & Taylor (1978)

– people with damage to specific area of right hemisphere could recognise object in typical view but not unusual view

– indicates they can’t transform unusual view to 3D model

 

BIEDERMAN’S THEORY (1987)

– built on Marr & Nishihara’s theory, using 36 3D geons instead of generalised cones

geons: basic shapes e.g. cylinders, cubes as the building blocks for describing objects

– concept of non-accidental properties that’s used in creating 2D images of objects

 

SUPPORT:

concept of concavities supported because deleting them from images make recognition difficult

 

BUT:

Geons lose a lot of info – can’t account for how we discriminate bet. e.g. 2 pitbull dogs even though they’d have the same 3D models

Bulthoff & Edelman (1992) – people couldn’t recognise complex objects shown with a novel viewpoint

 



 

FACE RECOGNITION

 

INTRODUCTION

Object cenetered theories dont work for facial recognition – within category recognition

– good recognition for familiar faces, diff angles, contexts & after long periods of time (e.g.Bahrick et al 1975)

– poor recognition unfamiliar faces esp. diff viewpoints, expression, context (e.g.Bruce1982)

 

BRUCE AND YOUNG (1986) – FUNCTIONAL MODEL FOR FACE RECOGNITION

BRUCE AND YOUNG 1986 - FUNCTIONAL MODEL FOR FACE RECOGNITION
BRUCE AND YOUNG 1986 – FUNCTIONAL MODEL FOR FACE RECOGNITION

– involves sequence of stages based on face recognition units (FRUs)

– FRUs: contain stored info about familiar faces

  1. We encode a person’s face when we meet a person (target)
  2. Encoded info might activate a FRU
  3. If matched, FRU allows access to person identity nodes (PIN)

– PIN: info about the person’s identity e.g. job

– name can only be recalled when PIN for face has been activated

– a ‘cognitive system’ checks the info received from recognition system (vs. e.g. knowledge)

– this cognitive system might override recognition system (e.g. Obama doesn’t live in UK)

 

SUPPORT:

studies show diff types of infom are accessed sequentially

Hay et al (1991) – people couldn’t name celebrities without knowing their job (i.e. PIN before name)

Bruce & Johnston (1990) – identifying faces as familiar is faster than job and job faster than name

 


 

BURTON & BRUCE (1993) – IAC/ CONNECTIONIST MODEL OF FACE RECOGNITION

BURTON & BRUCE 1993 - IAC CONNECTIONIST MODEL OF FACE RECOGNITION

– IAC: interactive activation and competition network

– extends and supersedes Bruce & Young model; also sequential

– has units, organised into pools

  1. seeing a face activates a FRU – there is 1 FRU for each familiar person
  2. FRU activation increases activation in a PIN – there is 1 PIN per known person
  3. PIN activation activates relevant SIU (semantic info unit e.g. job categories)

– links bet. units/ pools competitive, excitatory & bidirectional

– e.g. seeing a politician will excite a PIN for that politician, but also SIU ‘politician’ and so PINs of other politicians – most excited links/units wins?

 


 

IS FACIAL RECOGNITION SPECIAL?

 

INTRODUCTION

Debate about whether recognition of faces is a natural ability or something that we learn & become experts in from birth

 

NEUROLOGICAL EVIDENCE

– evidence that face processing involves specific areas of the brain

– faces activate an area in fusiform gyrus in posterior temporal lobes (esp. right hemisp)

– non-faces activate a different area

BUT: this doesn’t mean face recognition processing is different than for other objects

 

NEWBORNS

– show innate ability to attend to faces: a special mechanism for processing faces?

BUT:

Johnson & Morton (1991)

– there’s a kickstart mechanism making babies attentive to faces: lots of face input

– maybe this is where expertise in recognising faces starts: may or may not be unique

 

INVERSION EFFECT

Yin (1969)

– compared memory for upright and inverted faces and objects

– best memory for faces upright; worse than objects when faces inverted

– suggests something different about face processing

 

Diamond & Carey (1986)

– we are experts when seeing faces upright, but expertise lost when faces inverted

– dog experts recognised upright dog bodies better than inverted dogs: inversion fx not face specific

 

McKone & Robbins  (2007)

– couldn’t replicate Diamond & Carey: no difference bet. experts & amateurs

– so no generic expertise in stimuli recognition that’s the same for faces & objects