Why maximum entropy




















The evolution to ever more likely macro states until the most likely macro state, the equilibrium state, is reached is called the second law of thermodynamics. The decrease of potential energy is the consequence of the first energy conservation and second evolution to more likely macro states law of thermodynamics. As macro states with a lot of energy stored in heat random thermal motion contain many more micro states and are therefore much more likely, energy tends to get transferred from potential energy to thermal energy.

This is observed as a decrease in potential energy. Firstly I would like to point out that the total energy is actually conserved and can't become greater or lesser. What we are talking about here is the potential energy of the system. Secondly, I think that this is a really fundamental question and, although very interesting, such questions tend to end up in philosophy.

Thirdly I will try my best to give some answers. Usually minimum potential energy principle is viewed at least that's the way I see as a basic principle of the world. You can compare the minimum potential energy principle or electrostatic forces with a force that you maybe know a little bit better: - Why does gravity make masses attract? And I believe in energy minimization aswell. However, if you want an explanation, it can also be explained by other laws.

In this case it's the maximum entropy principle you mentioned also known as second law of thermodynamics. This law tells that the system will try to maximize it's entropy. The consequence is that all the energy will try to convert into thermal energy heat thus increasing the entropy. Now you run into next question - why maximum entropy? Well, this is once again usually viewed as a basic principle of nature's mechanisms, but it can actually be derived from statistical physics.

However, by doing this, you might once again run into some principles of physics and mathematics ad infinitum so in the end you'll have to believe in some principles or at least think in the "if this is correct, then this is what follows:" way. Even if a principle is not the principle on which nature bases processes, it surely gives the same results. Well, I'd mention that entropy is significantly more counter intuitive than some may think.

In particular since all microstates have equal probability, or in other words are equivalent, if you were to cut your finger off, the state were the chunk comes back into place all by it self if a perfectly valid assumption.

There is nothing in this respect that ties this state to a specific energy level. Now, to answer the question, why has this never been seen? And not why is this impossible Since our body consists of billions of atoms, you'd need to have them jump back all at once were they came from.

As opposed to simply hop around in uncorrelated moves, or in short So, that's the core of the entropy principle. Macro states are the results of billions of different microstates, and we only see an average value The most probable one is simply the one that has the largest number of compatible microstates.

Hence the chunk is very unlikely to go back to its place on its own. What would be the odds, when assembling dumb atoms, of an intelligent life form? Or in other words how large the sampling experiment should be to witness something else than a dead rock? Now lets say you spray the gas out, and the system of particle now gets spread all around in a Random manner.

Well,now lets see this particles as waves matter waves ,just like the electron clouds. I admit, I don't have any deep understanding of quantum mechanics but from what I have learnt from my intermediate chemistry is that more the electron clouds spread, lesser is their energy. So , when the particles spread uniformly everywhere with maximum entropy, the electron cloud spreads all through out, As such they attain a stable and lower energy state.

Now lets come to your next question of lower energy. It isn't so that nature or systems of nature prefer lower energy states, but rather they can't proceed further when they have the lowest possible energy. Like consider a ball thrown up from the ground when it comes down to ground it lacks any more energy or any external force to be back to air. Now it still remains a mystery to me as to how they can attain this stable state in finite amount of time and can there exist systems which has a stable state but it surpasses it.

A simple Naive Bayes classifier would assume the prior weights would be proportional to the number of times the word appears in the document. However,this ignore correlations between words. I can't think of a simple example to illustrate this but I can think of some correlations.

For example, "the missing " in English should give higher weights to nouns but a Naive Bayes classifier might give equal weight to a verb if its relative frequency were the same as a given noun. A MaxEnt classifier considering missing would give more weight to nouns because they would be more likely in context. Specifically, take a look at chapter 6. There are also explanation what is exactly MaxEnt with math behind. This is formulated as:. Stack Overflow for Teams — Collaborate and share knowledge with a private group.

Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. What is Maximum Entropy? Ask Question. Asked 5 years, 5 months ago. Active 7 months ago. Viewed 7k times. Improve this question. Tchotchke 2, 3 3 gold badges 19 19 silver badges 35 35 bronze badges.

Add a comment. Active Oldest Votes. The so-called MaxEnt classifier, takes the correlations into account. Improve this answer. Siong Thye Goh 3, 10 10 gold badges 19 19 silver badges 29 29 bronze badges. Good explanation. I would add that the MaxEnt principle also has to do with the resulting classifier, it should be the least informative classifier available alternatively with "maximal entropy" that classifies the data well.

Implicitly, this assure you that your classifier is not overfitting your training data. Why do you need D in the Bayes formula? Forces the results, P H E , to be in the interval [0,1], I suppose you could say P E instead but it would be confusing as E is already used for Evidence.

I would change P D to something like Z and just explained it is a sum as well as a normalizing factor. When you introduce a new variable, you make it more confusing for someone who are trying to learn.

I skipped that bit in the first skimming because I know exactly what Bayes' rue means. Reading this again feels confusing.



0コメント

  • 1000 / 1000