Inspired by Scott Alexander’s monthly links (always an interesting mix of random topics), I decided to curate the interesting blogs and links I find as well into a monthly post. The cited resources might be old, just that I discovered them during this month.
This Jan, it is a mix of technical topics, both related to LLMs (my daily work) and unrelated as well as some non-tech related links.
DRoPE: Extending the Context of Pretrained LLMs by Dropping their Positional Embeddings
https://pub.sakana.ai/DroPE/
Position Embeddings encode relative token positions since attention is position independant. Most popular PE is RoPE embedding. NoPE transformers try to remove Position Embedding completely, but perform worse than RoPE transformers.
The blog shows that PE/RoPE is actually important by showing higher gradient norm for RoPE over NoPE transformers. Low gradient norms result in more uniform token embeddings and therefore more uniform attention maps. RoPE is therefore needed to break “attention uniformity”.
PE is needed for better performance, but breaks during long context. For this, the blog introduces yet another new technique, called DRoPE (Dropping Positional Embeddings). The DRoPE approach drops positional embeddings and runs a few training loops without it (14k tokens with PE and 2k tokens without). They show that the final model matches the fully trained model performance, but that isn’t enough.
The magic is when they show performance on long context tasks, for ex. needle-in-a-haystack task (NIAH) and show the DRoPE outperforms RoPE transformers (even those with scaling) significantly.
(Aside: I don’t know much about RoPE scaling techniques mentioned in the paper (PI, NTK-aware scheduling, YARN), so that will be a topic for a later post).
Negative Temperature in LLMs
https://cavendishlabs.org/blog/negative-temperature/
When applying softmax over the probabilities, we know that setting temperature to 0 will give us the same output deterministically. Higher temperatures smooth out the probability distributions making it more likely to pick random words. A temperature of infinity will result in a completely random output
With negative temperature, the model will sample the token that is least likely to occur next. So we get a deterministic garbage output of the lowest probability tokens. Interestingly he points that those tokens are so low probability that the model will not output them even if you ask them to!
Tinfoil - Privacy preserving inference and training
Apparently TEEs are still a thing: https://tinfoil.sh/technology
Starting from H100s, NVIDIA has introduced Confidential Computing, which includes the CPU and GPU in the trust boundary. CPU secure environment is a done thing, but including GPU in the trust boundary is the hard problem (especially since PCIe is not trusted). Obviously NVIDIA has made a lot of hardware changes to support this.
Tinfoil uses this to make inference and training privacy preserving. It seems they host inference models on the cloud and wrap with TEEs.
As interesting as it sounds, I don’t understand this completely. There is a paper that explains the internals of GPU Confidential Computing in more detail, which I might read if I have time.
Ascii rendering
Link: https://alexharri.com/blog/ascii-rendering
Amazing post on rendering an image in ASCII. Makes me feel like developing a rogue like game again.
Starting from simple black-white to averaging neighbors to defining and using the shape of characters to introducing contrast, the post has everything.
Will definitely implement atleast the first 50% of the article in Rust in the next month, perhaps implementing some of the changes proposed in the corresponding HN post!
The Deep Sea
Link: https://neal.fun/deep-sea/
An amazing visual representation of the deep sea, showing which creatures and what is found at different depths.
Some interesting titbits:
- The emperor penguin dives to a depth of 345 meters, deeper than several fish and the great white shark etc.
- Creepy alert! The Japanese Spider Crab (at a depth of 650 meters) is the largest known crab with a maximum leg span of 3.8m.
- Coelacanths were thought to be extinct until found alive in 1938!
- Midnight zone is where sunlight doesn’t even reach.
- Elephant seal dives upto 2400 meters.
- The Titanic wreckage is at a depth of 3800 meters.
- 10000 meters is the deepest anyone has ever gone, with the Challenger mission.
On Anger - Some food for thought
Saw a message in this open thread on understanding anger which resonated with me.
Contents below are copied verbatim:
”””
When is a response in anger ever justified?
My life experience and all the advice at least I’ve ever received is never. But then what the hell is the point of this emotion? Just as a signal that boundaries have been crossed? Feels like a useless emotion.
“””
”””
Anger was useful in evolutionary settings as a threat display. It signals boundary-crossing and probably was very useful for negotiating boundaries without resorting to actual violence.
Well it may feel useless but it is a fact, so the only thing one can do is channel it and manage it.
“””
Do we get angry when we’re scared? I think I do sometimes. Anger does cloud your judgement, make you do things you would not do if you had more self-control.
Anger does always need to be controlled, don’t we always regret the things done in anger?