I know what normal differential is, dy/dx is the normal differential, where dy is the change in y for a infinitesimal change dx. Here obv, y is dependent of x. It just happens that dy/dx is also the slope of a curve at a point too. And I understand this enough to solve problems using it. Now I don’t understand what total differential is and why it is necessary. I am supposed to know this but I don’t. I don’t ever remember being briefed on this during my education, I didn’t get the memo. Can you guys please explain anew normal differential and total differential like I don’t know anything.
For sanity I will use **a** to denote a function depending on x and y. Also, the following is more of an “explain like I know single variable calculus” than ELI5.
What you call “normal” differential d**a**/d**x** is how fast a changes when **x** does. The usual name is **_partial differential_**. Similarly, we can measure how fast **a** changes with **y**. This leads two two partial differentials d**a**/d**x** and d**a**/d**y**. One for each axis/variable we allow in **a**.
For example, if you were to (locally) maximize/minimize **a**(x,y), you would search for a point where a does not further increase when x changes, hence we need d**a**/dx = 0. The same applies for each variable, giving us another equations d**a**/dy = 0.
However, this is not sufficient, we could have a function such as **a**(x,y) = x²-y² with partial differentials 2x and 2y. Then solving for all of them becoming 0 gives us x = y = 0, but that is neither a maximum nor a minimum: increasing x makes **a**(0,0) = 0 bigger, while increasing y makes it smaller.
The **_total differential_** is the combined vector
Da = ( d**a**/dx , d**a**/dy ).
It combines all the above into one single thing. For every coordinate (x,y) plugged into it, we get 3 individual numbers. Hence it can at least do all of the above by looking at the three entries individually. If we want maxima or minima, we set the entire vector equal to **0**, that is, to (0,0).
But it is more than just the sum of its parts, it describes how the plane tangent to the graph of **a** looks like:
In a simple function **b**(x) only depending on one variable, the derivative **b’** = d**b**/dx is the _rate of change_. This can be interpreted as b'(x) being the _slope_ of a line touching the graph of **b** at this specific x. For example with **b**(x) = x² we have **b’**(x) = 2x, so at x = 1 we find that the _tangent_ at (1,**b**(1)) = (1,1) has slope **b’**(1) = 2. Indeed, the line is functionally given as 2x-1.
Now with **a** depending on two variables, the graph is a 3D image plotting the points (x,y,**a**(x,y)). With x²-y² we get [this image](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Saddle_point.svg/1200px-Saddle_point.svg.png). Lets ask ourself what 2D plane _touches_ this image at the point (1,2,**a**(1,2)) = (1,2,-3). It must be some _linear_ function such as 3x-7y+5.
It tuns out that you can still see this coordinate-wise: the correct function for that _tangent plane_ is 2x-4y+3, where 2 is the partial derivative d**a**/dx = 2x at x=1 (and y=2), and -4 is the corresponding d**a**/dy = -2y at y=2 (and x=1). The additional +3 at the end is just there so the value at x=1, y=2 matches that of **a**(1,2) = -3 = 2·1-4·2+3.
In tighter terms, we can write this plane as D**a**(1,2)·(x,y) + c, where “·” is the dot product of vectors; because by definition we have D**a**(1,2) = (2,-4) being made from the two partial derivatives, hence this expression becomes (2,-4)·(x,y) + c = 2x-4y+c; with c again quickly determined by the value of **a** at (1,2).
Okay, so far we have not done anything that is not immediately boiling down to coordinates and partial derivatives. But you might have seen the above way to describe a plane as [constant vector]·[variables] + c before, for example slightly differently worded as [Hesse normal form](https://en.wikipedia.org/wiki/Hesse_normal_form). Without going into all the details, the gist is that the first vector is _orthogonal_ to that plane. In our case, that’s (2,-4).
Put differently, this means that the vector D**a**(1,2) = (2,-4) tells us in which direction the value of **a**(1,2) changes the most. Indeed, we have (infinitesimally speaking) almost no change if we move around along that tangent plane, by definition of being tangent; hence our best chance for changing the value of **a** is to move perpendicular to it!
There are more applications, but the above is probably the first one where one needs the entire vector, not just the parts on their own. If you go deeper into it, one finds that the _chain rule_ of such multi-variable functions now holds again true, but only with the total differentials and you now have to multiply _matrices_, their entries being the partial derivatives. There are also some important functions such as _curl_ that are defined in terms of the total derivative and which tell you further information about the function(s).
By normal differential, I assume you mean partial differential (which is the usual alternative to a total differential).
If y only depends on x, they’re the same thing.
If y depends on x and z AND z depends on x, well, you could change x and leave z fixed = partial differential. Or you could change x (dy/dx) and add on how x also changes y via other stuff, ie z (dy/dz • dz/dx) = total differential.