Hello, i’m trying to understand what a file format is, what it means, and what the sorrounding context is.
when people talk about a file format, are they really just referring to any type of file?
mp4 is a file format?
mp3 is a file format?
pdf is a file format?
txt is a file format?
doc is a file format?
is that what people are talking about when they say “file formats”? just different types of files that can or should be opened by different programs?
thank you
In: 0
Yes, your examples are file formats. All files are just sequences of 1’s and 0’s, but if a file is of a certain format, that means that this sequence follows certain rules and has a particular structure according to the format, and that allows a program to interpret it correctly. If you use a program to open a file of a format that it doesn’t support, or if the file doesn’t adhere to the right format, the program will most likely just fail. For example, a pdf file should start with the version number of the used pdf specification. If that version number in the file is invalid (that is, there is actually no such version) then the program opening the file will immediately have a problem with that.
File formats are just public instructions or standarts about how a file is structured. The names at the end are realy just names and can be changed.
A txt is mostly plain tex, that means the file is a long list of 8 bit values that represent letters.
A png file is a list of pixel values.
A word file(docx) is a zip like archive with other files in there.
So you can open a picture with a text editor(if you get around the warnings) and see gibberish, because the text editor tries to parse the puxel values as letters.
Data is just a bunch of bits, `0`s and `1`s. When I make a file, the only real rule for those bits is that they come in groups of 8 called bytes. As far as the file is concerned, nothing else really matters.
So what makes a “format”? What makes a picture a picture vs what makes text just text? We have to give those bits some structure. They have to mean something. The computer must understand that meaning, and we call that the “format” of a file. Just some examples:
* Plain old text (TXT) normally adheres to the ASCII standard. You can check out an [ASCII table](https://www.asciitable.com/) to see how to turn bytes into english text, and how things like byte #10 (LF) represents pressing Enter.
* DOC is Microsoft’s document format, adding more information about the text like formatting (**bold**, *italics*) and information about the page it’s going to be printed on.
* MP4 contains both audio and video, so it needs to specify how to separate them and provide other data like a framerate. Arguably the audio and video are also their own formats with sample sizes, resolutions, etc.
* A PDF is meant to be a digital version of a printed page, so the page has a size and things get drawn on it, sometimes freehand and sometimes text with a font.
* An MP3 is just audio. In fact you could put it into an MP4 file as the audio track under the rules of MP4 formatting.
And so on and so forth.
Some formats are well documented, available as a manual telling you how to read and understand them. You could write your own program to use them if you have the skill. Others are just used internally by the software and not meant to be understood, like the data files of a video game containing the maps; only the game and its developers need to understand that.
When you hear the file’s “format” is “MP4”, you usually think the file’s name ends with `.mp4`, and that’s a convention so that humans and software have expectations. When your video player sees a file with such a name, it goes straight to the MP4 file format reader which will try to understand what the file actually contains.
Guessing formats can work, but it is prone to making mistakes. Many formats intentionally start a file with a special few bytes just to identify “I am a ZIP file”, or “I am a PNG”, so that if guess work is done there is a very obvious hint right at the start of the file, making a snap judgement of “I am probably right” or “I obviously guessed wrong” easier.
*geek flex*
No, those are file extensions, the format is just in the files.
BUT we use the terms interchangeable in normal talking.
Because most of the time they are, but sometimes they are not.
Just because something is named bla.mp3, it does not need to be an audio file.
Another example: Google “how to open a .dat file”:
“Most DAT files contain text, so you can open them with text editors, like Notepad, Notepad++, VS Code, and so on. If you are sure the information contained in the DAT file is a video or audio, then your media player can open it. If it’s a PDF, then Adobe Reader can open it, and so on.”
In my filed of work a .dat file does contain raw data, you’d need another file to describe what’s in there. So non of the upper tools would be able to read anything meaningful out of it.
consider how you can aquire your favorite artist/album/song: will it be on vinyl, 8-track, cassette tape, on CD, or digitial download?
It’s the same ‘song’, in different formats. An argument can then be made about which format preserves the most ‘data’ of the original (aka, what sounds best)
Digital data is often similar — it can be encoded in a variety of different file formats — but just as you wouldn’t have any success cramming an 8-track into a VCR, the specifics of the file format dictate how you interact with that data.
Latest Answers