Do computers really learn? or are they just pretending ?
3 minute read
You probably know that algorithms are like recipes or instructions that tell a computer what to do. But sometimes, algorithms need to learn from data or feedback to do their job better. For example, an algorithm that recommends movies to you might learn from your ratings and preferences to suggest movies that you’ll like more.
But what does it mean for an algorithm to learn? How does it change its behavior or understanding based on data or feedback?
Let's look at how we learn. When you were a baby, you didn’t know much about the world, but you were curious and eager to learn. You started by observing things around you and noticing patterns. For example, you see that when you cry, someone comes to feed you or change your diaper. You also see that when you smile, people smile back at you and make funny noises. You begin to associate these actions with certain outcomes and learn how to get what you want.
Humans and animals learn from their environment and interactions. This learning process involves three main components:
- Stimuli: These are the things that we perceive or experience in our environment, such as sounds, sights, smells, etc.
- Responses: These are the actions or reactions that we perform or produce in response to stimuli, such as moving, speaking, feeling, etc.
- Reinforcement: This is the feedback or consequence that we receive after our responses, such as rewards, punishments, satisfaction, etc.
Learning happens when we adjust our responses based on the reinforcement we get. For example, if you touch a hot stove and feel pain, you learn not to touch it again. If you watch a funny movie and laugh, you learn to watch more movies like that. This is called reinforcement learning.
How algorithms learn from data or feedback is based on the idea that learning is a mathematical and logical process that helps us solve problems and optimize outcomes. It involves two main components:
- Data: These are the inputs or outputs that an algorithm receives or produces for a given task or goal, such as images, texts, numbers, etc.
- Model: This is the set of rules or parameters that an algorithm follows or adjusts to process the data and perform the task or goal, such as equations, functions, weights, etc.
The computational perspective says that learning happens when an algorithm adjusts its model based on the data it gets. For example, if an algorithm gets some images of cats and dogs and tries to classify them into two categories, it learns by changing its model to make more accurate predictions. If an algorithm gets some texts of movie reviews and tries to generate summaries of them, it learns by changing its model to make more concise and relevant summaries. This is called machine learning.
Remember how we said that reinforcement learning is when you learn from your own actions and the feedback you get from them? Well, there are different ways to do that, depending on how you choose your actions and how you update your knowledge.
Policy
A policy is like a rule or a strategy that tells you what to do in each situation. For example, if you’re playing a game of tic-tac-toe, your policy might be to always put your mark in the center if it’s empty, or to block your opponent’s winning move if possible. A policy can be deterministic, meaning it always gives you the same action for each situation, or stochastic, meaning it gives you a probability of choosing each action for each situation.
Value Function
A value function is like a rating or a score that tells you how good each action or situation is for you. For example, if you’re playing a game of chess, your value function might give you a high score for capturing your opponent’s queen, or a low score for losing your own king. A value function can be state-based, meaning it only depends on the current situation, or state-action-based, meaning it depends on both the current situation and the action you take.
So, how do you learn a policy or a value function? Well, there are different methods for that too, depending on how much you know about the environment and the feedback you get.
Model-Based Learning
This is when you have a model of the environment, meaning you know how it works and how it changes based on your actions. For example, if you’re playing a game of checkers, your model might tell you how the board looks after each move, and what the rules are for jumping and capturing pieces. With a model, you can plan ahead and simulate different scenarios before choosing an action. You can also update your model based on the feedback you get from the environment.
Model-Free Learning
This is when you don’t have a model of the environment, meaning you don’t know how it works and how it changes based on your actions. For example, if you’re playing a game of poker, your model might not tell you what cards your opponents have, or what they will do next. Without a model, you can’t plan ahead and simulate different scenarios before choosing an action. You can only learn from trial and error and update your policy or value function based on the feedback you get from the environment.
There are also different types of feedback that you can get from the environment. One type is called reward-based feedback. This is when you get a numerical reward or penalty for each action or situation. For example, if you’re playing a game of Pac-Man, your reward might be the number of points you get for eating dots and ghosts, or the penalty you get for being eaten by ghosts. With reward-based feedback, you can learn to maximize your total reward over time.
Another type of feedback is called imitation-based feedback. This is when you get an example or demonstration of what to do in each situation from someone else. For example, if you’re learning to play tennis, your feedback might be watching a video of a professional player hitting the ball with different strokes and angles. With imitation-based feedback, you can learn to mimic or improve upon someone else’s behavior.