eli5: WHY do derivatives and integrals work?

317 views

I’m embarrassed to admit I’m getting my masters degree in a math related subject and I still don’t get this!

I know how to do them, but the way you compute them is almost suspiciously simple. What’s the logic behind converting the exponent to a constant? How does that determine the slope?

In: 0

8 Answers

Anonymous 0 Comments

That’s literally how the derivative is defined. It’s the lim(deltax->0) (f(x+deltax)-f(x))/deltax. Most calc 1 courses go over it, so some supplemental reading might be to just go over “deriving the derivative” or “definition of derivative” in some capacity.

The fact that it simplifies to an easily remembered rule is incidental. All we do is literally find the slopes at more and more small intervals until you’re just at a single point.

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

One thing to recognize is that it is an exception to have a nice derivative. In general, the property of even having a derivative is quite rare (much more rare than having an integral), and having a *simple* derivative is even more rare. The only reason we can get the impression that derivatives are easy and simple is because we mostly just focus on functions with easy derivatives because they are easy. It’s a sample bias.

The reason the simple ones *are* simple is because they are constructed from the simple/basic functions of powers, trig functions, and exponents. Each of *these* have simple derivatives because they have very nice and simple addition formulas. For instance, the binomial theorem tells us what (x+y)^(n), the power of a sum is in terms of other powers. The formula e^(x+y)=e^(x)e^(y) tells us what the exponent of a sum is in terms of other exponents. The angle addition formula tells us what sine/cosine are of a sum of angles in terms of other sines/cosines. These formulas make the computation of the difference quotient manageable. The derivatives of functions which cannot be constructed from these nice-sum functions are much more difficult to work with. Luckily, we have power series and Fourier series which expand what we can construct with these nice functions, so even then it becomes computationally manageable. Which, honestly, is a miracle.

Anonymous 0 Comments

Like others have said, this specific rule is just one that’s notoriously simple. Why it happens is a matter of combinatorics more than anything else. Look at Pascal’s triangle and then a definition of the derivative and you’ll pretty much see why it works out that way.

However, it’s not always that simple, especially for derivatives. Could you, off the bat, tell me what the derivative of arcsin(x) is? That’s still a relatively “simple” derivative but it is almost unrelated to the simple power rule you talked about. There’s really no rhyme or reason that a derivative has a particular form or pattern other than the math just checking out that way.

Integrals are even worse. Integrals are by and large far less simple to compute. There are some seemingly simple integrals that are literally impossible to express in terms of any other normal function..

Integral of x^2 ? Easy, (1/3)x^3 + c

Integral of cos(x^2 )? Good luck, go throw it into wolfram alpha and let me know what you get. It’s not pretty. But that function is so simple, right?

In short, there are certain rules that are known simply because they form patterns. There are no truly deep mathematicalreasons that any derivative/Integral rules work the way they do.

Anonymous 0 Comments

The source of the “2” in the derivative formula (x^(2))’ = 2x is the middle term in the algebraic identity

(a+b)^2 = a^2 + 2ab + b^2

because in the definition of the derivative of x^(2), we compute (for nonzero h)

((x + h)^2 – x^(2))/h = (x^2 + 2xh + h^2 – x^(2))/h = 2x + h

and the limit of that as h tends to 0 is 2x. So the “2” in the derivative formula 2x comes from the second term in the expansion of (x+h)^2 that you saw in algebra. Similarly, the 3 in the formula (x^(3))’ = 3x^2 comes from the second term in the cubic expansion

(x+h)^3 = x^3 + 3x^(2)h + 3xh^2 + h^3

after feeding the right side into the limit definition of the derivative of x^(3).

It’s pretty important that these exponents, 2 in x^2 and 3 in x^(3), are *constant* while the base is the variable x. If you swapped those roles and made the base constant and the exponent x, then everything is totally different: you’re now dealing with exponential functions 2^x and 3^x rather than polynomials x^2 and x^(3). Exponential functions have different properties and different graphs compared to polynomials.

It might be natural to guess that 2^x has derivative x2^(x-1), but that’s wrong. Think about what’s happening at x = 0: the graph of y = 2^x when it passes through the y-axis x = 0 is going up, so (2^(x))’ at x = 0 is positive, in fact (2^(x))’ at all x is positive, but x2^(x-1) vanishes when x = 0 and is negative when x < 0, so the formula x2^(x-1) is not at all like (2^(x))’.

The correct formula for (2^(x))’ is (2^(x))ln(2), and (3^(x))’ = (3^(x))’ln(3): we need logarithms to describe derivatives of exponential functions, which is far from obvious when you first see these things. A reason that 2^(x) has such a different derivative formula than x^2 is the different mathematical properties of 2^x compared to x^(2). For instance, these functions have completely different effects on sums: (a+b)^2 = a^2 + 2ab + b^2 while 2^(a+b) = 2^(a)2^(b).

Anonymous 0 Comments

It is a simple idea. For derivatives, you estimate the slope with two points, then take the limit as those two points get closer together. For why the power rule is the way it is:

let f(x) = x^n
f'(x) = lim[d->0] (f(x + d) – f(x)) / d
= lim[d->0] ((x+d)^n – x^(n)) / d
= lim[d->0] (x^n + nx^(n-1)d^1 + [a bunch of terms with d^2 or higher] – x^(n)) / d
= lim[d->0] (nx^(n-1)d^1 + [a bunch of terms with d^2 or higher]) / d
= lim[d->0] nx^(n-1) + [a bunch of terms with d^1 or higher]
= nx^(n-1)

For (Riemann) integrals, you estimate the area with a bunch of rectangles, then take the limit as the width of the rectangles goes to 0. Usually a little hairier, but the same idea.

Calculus is just geometry + limits.

Anonymous 0 Comments

Consider the function f(x) = x^2. You probably know the derivative of it is f'(x)=2x. You’re asking how this comes about? Someone else mentioned the difference quotient formula already, which is often expressed as:

the limit as h goes to 0 of (f(x+h)-f(x))/h.

Now using f(x)=x^2 and working out the algebra you can quickly get that:

the difference quotient becomes the limit as h goes to 0 of 2x+h, which is just 2x.

However, you might be wondering how the difference quotient even arises in the first place, which is really where the intuition of derivatives is held. Here’s one way to think about.

Consider again f(x)=x^2. You might want to know how far up the function goes vs how far over to the right it goes. With a linear function y=mx +b this is always the slope m, and you can calculate it by doing the change in y over the change in x between any two points.

For f(x)=x^2 the change in y over the change in x will not always be the same, it depends on the 2 points you pick. This change in y over change in x can then be thought of as the “average” rate of change of f(x)=x^2 from the first point to the second point. By looking at the graph of f(x)=x^2 it should be clear that it’s “growing” more slowly around (1,1) than it is around (4,16), so if you find an average rate of change near (1,1) vs an average near (4,16) you should get a smaller number.

The ingenious question to ask is, well sure I can find the average rate of change between two points, like between (1,1) and (3,9) by doing the change in y over the change in x. But what about finding the exact rate of change at (3,9) instead? You can try using (3,9) for both points and calculate the average, but you’ll end up dividing by 0. The sneaky trick is to use limits.

So, we find the average rate of change from (3,9) to (3+h, (3+h)^2 ) instead. The idea being that we find the average rate of change from the x-value 3 to a teeny bit more than 3 by adding +h, then later we make h be infinitely small so that 3+h becomes 3.

Finding the average rate of change then gives us:

((3+h)^2 – 9)/(3+h-3)

which becomes with some algebra:

6+h.

Taking the limit as h goes to 0 gives us 6. Then we realize there was nothing special about the point (3,9) and we could instead just use any old point (x,x^2) and do the exact same thing:

((x+h)^2 – x^2 )/(x+h-x)

which again with some algebra becomes:

2x+h

Taking the limit as h goes to 0 gives us 2x. Then we realize there was nothing special about the function f(x)=x^2 and we could do the same process for any other “nice” function:

(f(x+h)-f(x))/(x+h-x)

then take the limit as h goes to 0 to end up with the derivative of f(x), which is another function telling you the rate of change of the original function f(x).

Anonymous 0 Comments

> the way you compute them is almost suspiciously simple

They are designed that way. If you consider differentiation or integration as operators, OP say, they are both linear (f and g being some functions):

OP(a.f + b.g) = a.OP(f) + b.OP(g)

i. e. “transparent” to sums and multiplication by a scalar/constant. It can’t get much simpler than that.

The derivative, if it exists, embodies the idea that you can get information about a function *locally* (meaning near some point) if you approximate it there by something very simple, a linear function.

The integral, on the other hand, tries to synthesize information about a function’s behavior over a chunk of numbers (e. g. an interval). Up to some technical details it basically computes the *mean* of the function over the chunk. So it gets you a *global* information.