Jaieshh

A Lifelong student

Welcome!

This is jaieshh, I hope you are doing good. Welcome to my rough note taking page.
Here I am just going to jot down things that I am curious about and sometimes my own reflection on things.
I fear that most us are just losing our curious selves and are just living for the sake of living.

I have made this to make sure I dont lose my curious self and jot something down everyday.

12th Tue Mar 24

Not having time to write everyday but will try to write as much as poosible.

Since I have been working on fine tuning LLMs (LLama-7b-chat mostly), I have encountered new concepts in this domain that I think are really helpful to understand. Dont know where to jot this down thought this will be the right place to have such a thing for later reference.

Lets start off with

PEFT (Parameter Efficient Fine-Tuning)

Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

LoRA: LOW-RANK ADAPTATION a type of PEFT

We know that the weights matrices of a pretrained neural network are full rank, meaning each weight is unique and can't be made by combining other weights. But in this paper authors showed that when pretrained language models are adjusted to a new task the weights have a lower “intrinsic dimension”. Meaning, that the weights can be represented in a smaller matrix, or that it has a lower rank. This in turn means that during backpropagation, the weight update matrix has a lower rank, as most of the necessary information has already been captured by the pre-training process and only task-specific adjustments are made during fine-tuning.

The above explanations are from hereThere are more techniques but now focusing on understanding LoRA first.

Also I have started off with a 100DayHardWarechallenge, Where I am going to learn the first principles of a computer architecture and hardware. I updating my progress on twitter and not here.

Ok gys enough of slacking from my end and here we go. 100daysofHARDWARE challenge is on from tonight. I am following this repohttps://t.co/ZEosPWd9vq
It’s 12 week thing but I know I will take more time. Was already doing this but I slacked for a week so taking accountability nowhttps://t.co/BKdbtuPAnu
— Jayesh Anil (@JayeshAnil1)March 11, 2024

26th Thurs feb 24

Havent really documented anything for the past 10 days but the building and studying is going strong. I realized a few things over these days:
- Need to learn about hardware used to make computers and how they work.
- When I started this initiative of self learning and building regularly, I did not have a definite plan and was just doing things as they came on my mind. This wont work, just found a link to a repo on github from twitter with a complete plan to learn software with integrated hardware.
https://github.com/geohot/fromthetransistor
- Midjourney is better than Dalle - Found a book on AI statistics reading that regularly to understand ML and AI form first principles.
- Dune is the best sci-fi movie series out there.

16th Thurs feb 24

Facinated by semiconductor chips looking into the manufacturing process for semiconductor chips... Also started working on these LLMs for the first time.

15th Wednesday feb 24

Didnt do much today, but I started to look into the semiconductor market in India. Also started watching these videos from Andrej Karapathy on LLMs and how do they actually work under the hood. Major thing i noted was about vedanta company in India setting up its semiconductor branch in Gujarat and aiming to manufacture chips from 2025. Lead by David Reeds.

14th Tuesday feb 24

There is just soo much to learn and implement and run on the open source platforms MY GOD!!! I dont even know where to begin.

I recently listened to an interview of Jonathan Ross (yes the guy who made the TPU for google). He talked about AI hardware and how speed is the thing that actually matters for these large AI models for deploying them for production.

As a developer, experimenting with OpenAI API I realised how good these were on the localhost but when it comes to deploying them they do take time.

I need to do more experiments on the recent open source models by Huggingface and Meta(Llama)

considering on buying some hardware for deploying a large AI model on it, but first I need to figure out these open source models.

Also I completely forgot about Andrej K. just left OpenAI this is huge for students like me cause this guy knows a lot and his videos are just the best to learn about ML and AI. He tweeted that he already started working on his videos. Apart from his videos I see a lot of people are excited for his contributions on open source models, I mean he is very knowledgable but I think there are a lot of anonymous brilliant people out there who can make similar or even better contributions (just saying might not make a huge difference).

TODO
§ Spin up a basic gpt using Llama, Huggingface model and groq.
§ Look into AI hardware on the internet and see what people are doing on this.

Research papers to go through

§ Attention is all you need
§ Reformer: The Efficient Transformer
§ Linformer: Self-Attention with Linear Complexity
§ "Curriculum Learning" by Bengio et al
§ "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

13th Tuesday feb 24

Recently, reading about a lot of AI stuff on twitter X. Getting FOMO so have a few projects in mind to work on.

I think there have to be more goofy websites on the internet like this one, my eyes are tired of looking at beautiful stuff.

Taking a course on ML in uni as well, to understand the basics, need to catch up on that.