Skip to content

Lack of Softmax in this code? #30

@zachary-jablons-okcupid

Description

Hey Geoff,

I know this is 5 year old research code, but I'm a bit confused about something. In the accompanying paper, it seems like the output of temperature scaling is meant to go through a softmax before being used.

image

However, in this implementation as far as I can tell there's no use of Softmax as part of the temperature scaling operation. I'd expect to see it maybe at the forward step or potentially when the output thereof is put into the cross entropy loss here, but it seems like instead the cross entropy is being given the scaled logits without any softmax applied.

I might just be missing something obvious here of course, but I want to make sure my understanding of how temperature scaling is supposed to work is correct.

Thanks in advance for helping me clarify anything I'm missing here

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions