Chapter 3 - lm_head_output shape differs from textbook

Hi! Loving the textbook so far :) I've encountered a minor issue though in the chapter 3 section _Choosing a single token from the probability distribution (sampling / decoding)_...

When I run `lm_head_output.shape` I get an output shape of `[1, 5, 32064]`, whereas the source code and textbook states that it should be `[1, 6, 32064]`. I'm not sure why there's a difference, I've kept all the preceding code the same...

Interestingly, running the next line of code returns the expected output ("Paris"): 

`token_id = lm_head_output[0,-1].argmax(-1)
tokenizer.decode(token_id)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chapter 3 - lm_head_output shape differs from textbook #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chapter 3 - lm_head_output shape differs from textbook #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions