Gonzo ML • 126 implied HN points • 06 Nov 24
- Softmax is widely used in machine learning, especially in transformers, to turn numbers into probabilities. However, it struggles when dealing with new kinds of data that the model hasn't seen before.
- The sharpness of softmax can fade when there's a lot of input data. This means it sometimes can't make clear predictions about which option is best in bigger datasets.
- To improve softmax, researchers suggest using 'adaptive temperature.' This idea helps make the predictions sharper based on the data being processed, leading to better performance in some tasks.