[ art / civ / cult / cyb / diy / drg / feels / layer / lit / λ / q / r / sci / sec / tech / w / zzz ] archive provided by lainchan.jp

lainchan archive - /λ/ - 14642



File: 1457852627523.png (189.6 KB, 300x297, monitor_by_endling-d7wrd5u.jpg)

No.14642

What's up with filesystems?

I mean, I get that block storage is good for speed, and for drives with high random read latency. But do they still make sense now that ssd's and shit-tons of ram exist?

Binary formats make it so much harder to deal with file formats. Hell, even ascii formats like STL are shit.

I'd like to see something that works like a giant object.

Do selections like

for line in filesystem['home']['anon']['.bashrc']:
do_some_soykaf(line)

Alright, bad example, since bashrc is just a nice pipe-able ascii file.

You could implement something like python's duck-typing. Store an image compressed, but have a daemon take care of presenting it as if it were a pixel array. Do some clever things with shared memory (capnproto?) and you could be editing the same image in different programs, at the same time.

I suppose we could all just save to JSON, but saving to a block is slooow, along with all that serializing.

I'd much rather someone implement a good kbd-tree server. Because then we can let more then one program access the data structure at once, and implement things like callbacks if the data changes.

Binary formats and GUI programs sort of broke the unix way. But we can move back towards letting one program "do one thing well" by letting more then one program access a datastructue at once. Something that block files don't allow.

Throw in some network transparency and you've got a pretty great distributed computing platform.

  No.14644

File: 1457854411600.png (734.63 KB, 177x200, 1396647652659.png)

>>14642
i honestly dont know that much about filesystems per se, but i do know that spinning rust is still relevant in the enterprise (where things like reliability and 24 hour operation actually matter). ssd reliability is also still questionable as they havent been a thing long enough to really compare to good ole hard drives. i still have a few spinning rust drives that work fine after 10+ years (one is even 15 years old) of 24 hour operation but i know of a few friends who are on their second or third ssd already. it only makes sense to me that file systems and file structures would still be rust-focused until ssd's truly become a proven reliable technology.

i do see where you are getting at though, being able to write to/from multiple things at once would be great for imageboards for example (part of 8chan's scaling problem actually). memcached exists to solve some of the problems you wish to solve (using the shitload of ram modern machines have nowadays), but its not perfect.

  No.14645

>>14642
About barely 5% of people even consider buying SSD's. Mostly because the price/storage ratio on SSD's is still terrible.

  No.14648

>>14645

How do cellphone eMMC's compare to SSD?

  No.14960

You may want to consider that the solution to your file system issue is to avoid file systems altogether.

Forth systems write directly to fixed-size storage blocks, this unit is generally called a screen.

BeOS and PilOS use a database for longterm storage.

Some systems make file operations transparent, such as PhantomOS. These systems do this by having a base language and allowing you to manipulate information like you would other data. You are not manipulating a file, but a persistent object.

There are inevitably other systems I am forgetting or am not aware of, but you surely understand what I mean by all of this.

Be wary of UNIX notions and current hardware. You limit your thoughts by thinking in terms of these systems and only these systems.

I believe you unfairly criticize binary formats. If you have issues with them, it is because UNIX makes them more difficult to properly handle. That is the fault of UNIX.

  No.14961

File: 1457949505756.png (477.63 KB, 200x179, rough11___reformation_by_endling-d4vst91.jpg)

>>14960

How do you handle binary formats well? If it's got good universal abstractions, it's not _exactly_ a binary format anymore. You wouldn't say an SQL server is a binary format, even though they mostly store their data in binary.

I see it as more of a political thing. It would be very nice to be able to use BeOs or PhantomOS (or inferno OS), but it's entirely impractical.

Where as something like this? Maybe easier to get adopted. You could sell it as a tool for virtual reality if that happens, or point out that phones already don't use traditional filesystems.

Whatever it is, it needs to run in userspace.

  No.14962

>>14642
> But do they still make sense now that ssd's and shit-tons of ram exist?

How are you going to put data into your SSD? You need to have some conventions about data layout, simply so you can access it again later: and that's exactly what a filesystem is.

  No.14963

>>14962

I'd argue that those conventions are bad. They made a lot of sense when drives were slower, but now days people are dealing with datastructures more then streamable data.

A symptom of developers using data structures more then streamable text is JSON (and also sql, but that addresses a different case).

Anyway, you import that json file and treat it like a native object. Well in most interpreted languages anyway.

But this has a whole bunch of problems. Json files don't have space between the bits, so you have to recreate the file every time you make a change. You can't just append to a list without recreating the entire file.

Json files need to be loaded entirely into memory in order to be parsed. There's no indexing, you can't jump directly to a chunk of data. You can't even append on to the end of a json file without hacks that break the spec.

It's entirely un-streamable.

Both of these painfully limit the size you can make a json file.

More importantly, only one program can access the datastructures in a json file at once.

This is a problem because blah blah blah unix way.

I want something like inotify that works on data, not on files. Rethinkdb does something similar, but rethinkdb is shitty in general.

  No.14969

I have been thinking about a new kind of an OS lately, something in the lines of TempleOS....

The core idea is that the user or programmer shouldn't worry where his data is or even, what format it is. This means that the kernel maps both the HDD/SSD and the RAM, and your mSD card as a one huge address space. The kernel does the swapping, storing and cacheing. For example, a frequently accessed file can be mapped to the RAM just fine, and programs opening the file will be transparently given the RAM address of the data, which is then sync'd with the storage whenever the kernel wants.

This futuristic filesystem also gives me the idea to handle everything as ADS and transparently make every program a callable function.

The ADS approach means that you have:
namespace.thing.helloworld
Which actually contains both the binary data and the source code, just like in Windows.

The program is compiled as a simple function, the details of which are logged somewhere, maybe one part of the ADS. This allows you to write normal code like this:
namespace.thing.helloworld(arg)
It would have and ABI of sorts that again, transparently passes the arguments to the function you want. This could be implemented with dynamically loading the function addresses at run time (easier) or via a link table.

  No.14970

>>14969
Now the cacheing thing could be tricky, but not impossible.

The problem are the imports, if you change function X (it's address, return value or arguments are changed), how can you be sure that all the other functions got the memo? Now, if you always used dynamic imports, it wouldn't be a problem, it would just be slow.

The faster way would be to use hard imports (kinda like static linking) and fix up the imports whenever something is changed. This could also be slow if you change printf and 99% of the functions depend on it.

So I propose a timestamp, which would be easy to store in ADS. When you load up a function compare its timestamp to the timestamps of its imports. If they are ok, run the function. If they are not ok, traverse the imports and fix them up recursively, until the timestamp matches again.

if namespace.thing.helloworld is not ok, fix it
helloworld calls printf and scanf
printf is ok, no need to traverse it because we can be sure its children are also fixed
scanf is not ok, fix it up and again, traverse its children

  No.15066

>>14642
> I mean, I get that block storage is good for speed, and for drives with high random read latency. But do they still make sense now that ssd's and shit-tons of ram exist?
What?

> I suppose we could all just save to JSON, but saving to a block is slooow, along with all that serializing.

What?

> But we can move back towards letting one program "do one thing well" by letting more then one program access a datastructue at once. Something that block files don't allow.

What?

This entire post is vague-as-fuck. What are you actually trying to say?

>>14960
This.

>>14963
> But this has a whole bunch of problems. Json files don't have space between the bits, so you have to recreate the file every time you make a change. You can't just append to a list without recreating the entire file.
>
> Json files need to be loaded entirely into memory in order to be parsed. There's no indexing, you can't jump directly to a chunk of data. You can't even append on to the end of a json file without hacks that break the spec.
>
> It's entirely un-streamable.
If these properties are a problem for your use case, then you should not be using JSON.

> More importantly, only one program can access the datastructures in a json file at once.

>
> This is a problem because blah blah blah unix way.
No, this is a problem with JSON.

  No.15067

1. SSDs are still block oriented. This is the way hardware works, not some conspiracy by anti-unix bigots. A character SSD or whatever it is you're suggesting would be expensive, slow and have poor data density.
2. SSDs are not as fast as you seem to think. They are still many orders of magnitude slower than memory, which is orders of magnitude slower than cache or registers. You can't just write to an SSD like it's memory (although with mmap you can pretend...)
3. Binary file formats are not the same as block filesystems, or even remotely related.
The idea of a more structured filesystem is interesting, but that is an increase in abstraction, not a decrease. You can't just throw out everything and say it should work by magic, it has to be built on something and I see no actual complaints about the existing structure other than "I don't like it."

  No.16177

File: 1462516591107.png (29.57 KB, 200x174, aNK1gWv_460s_v3.jpg)

>>14645
I would use an SSD, but I care about storage space more than speed per se. Also, the idea of a limited number of read/writes scares me somewhat.

  No.16179

>>16177

spinning platter harddrives have unlimited read/writes now? when did that happen?

  No.17328

It seems like the consensus is a filesystem is wholly insufficient for many things they're commonly used for. UNIX has a history of abusing its few abstractions for purposes they're not meant or good for, so this is not surprising.

I'm the fellow mentioning better systems earlier. I'm happy to use alternative systems, despite their supposed lack of practicality.

One very important detail most people forget in conversations such as these is their lack of a real need for what they discuss. There's no need to bemoan something as being inefficient when dealing with large quantities of data when you will, in fact, probably never manipulate such large quantities of data. It's getting entirely ahead of oneself.

I wonder how this conversation will move from here.

  No.17336

I don't get it. You want nothing to be in binary representation? Why?

Or are you complaining about block devices? Because those can't really be made into RAM unless you abstract them with RAM caches.l, which are already a thing.

Or do you want it to be possible to edit one file at the same time? Generally, that's already posaible.

  No.19736

>>14642
>But we can move back towards letting one program "do one thing well" by letting more then one program access a datastructue at once.
>Something that block files don't allow.

Well, more accurately, something that the hardware doesn't allow.

The filesystems you're talking about reflect how the underlying hardware works
(i.e. by organizing clusters of bytes into sectors).

Sure, you could have a more sophisticated abstraction (e.g. what you've described), but
so long as people can access their files, there isn't much motivation (especially considering
what you're talking about would mean scrapping quite literally hundreds of thousands of lines
of functioning code).

Of course, if you were to come up with some compelling, killer feature(s) that couldn't possibly be
implemented with current filesystems, that would be a slightly different story (although this would
almost certainly be implemented as a new layer on top of the current stack, kind of like how
the X Window System tacked graphics onto an operating system written for line printers).

Also, this:
>>14963
>But this has a whole bunch of problems. Json files don't have space between the bits, so you have to recreate the file every time you make a change. You can't just append to a list without recreating the entire file.

Is not nearly as bad as you make it sound. Kernels in general are very, very, very smart about handling I/O, and, of course, nobody is
going to try to commit every change to disk as soon as it happens; Ancient Programmer Wisdom tells us that storage (including SSDs)
is slow, and memory is fast.

  No.19740

>>14963
This isn't an issue with the idea of structured file formats, it's an issue with JSON itself.

What if there was a special JSON mode where the file itself implied an array surrounding all elements? So you'd be able to read line-by-line, parsing each as a single JSON object. This way, we have a JSON object stream rather than a simpler character or octet stream. We can read objects off of it as well as append objects at the end. The only diffefence is we're appending whole lines of data , like a "line device" as opposed to "character device" or "block device".

  No.19741

>>17328

Assuming that one does not, or that few in a population of (this) many, don't manipulate large datasets cannot be asserted from you position.

Atleast with myself, I am tasked with having to optimize large datasinks/sources for my job. From data-analysis to load balancing methodology. Even from the stand point of more consumer projects there's plenty of need to push the envelope.

But instead of citing the numerous places where a layman or near layman would have use for such optimization, I'd rather point out that assuming that one does not need or is incapable of understanding these particulars (through infrequent or non-existent use cases in the OP's position) when the thread was already an explicit attempt to understand, is not arguing wholly on good faith.

Good day sir.

  No.19752

>>19740
Great minds invented this long before I thought about it:

https://en.wikipedia.org/wiki/JSON_Streaming

  No.19755

>>14642
If you have any development experience with Windows, PowerShell lets you treat the filesystem as a series of objects. It's actually quite nice and it seems to be exactly what you're describing.

A text file is an object, which contains various metadata about that object as well as its contents. Likewise with a directory. PowerShell is also dynamically typed. Since files and directory's are objects, there's no need to manually parse with sed/awk. For example, if you want to get a list of processes running above/under a certain amount of memory you use get-process to get the objects representing processes running, pipe it through where-object to filter the objects (as in, where this object runs under less than so-and-so memory) by just examining their metadata, then use sort-object to sort the objects by an arbitrary property they have like memory size, then use select-object to select the first five objects with the highest/lowest memory.

The really nice part of "everything in the filesystem is an object" vs. "everything in the filesystem is a stream of bytes" paradigm is that it turns parsing into a formal property of the system, instead of something you have to keep in your head and do yourself.

  No.19756

>>19755
The other nice bit is that it's easier to go from objects to a bytestream or plaintext than vice versa, PowerShell lets you easily convert any object into JSON or text (like with toString).

  No.19758

>>19740
>>19752
I think i3bar uses this

  No.19972

I felt weird reading some lainons' comments, as other lianon pointed out they seem to misunderstand filesystem's responsibility.

filesystem is nothing special, it provides vnode (which represents file) and associated vnode operations to manipulate them.

to compensate innate slowness of secondary storages, operating systems maintain buffers for node once they are read.

not sure how other operating systems implement one but at least openbsd has large buffer caches with a limited amount of kernel memory mappings. Buffers can be read like any other memory mapped area and are flipped to DMA accessible memory region when needed for writing.

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/vfs_bio.c

http://undeadly.org/cgi?action=article&sid=20140908113732

On the other hand if you do not like vnode api and want to handle some object providing methods you want, you should not rely on unix filesystem. Your application should monopolize (hard allocation) segments of virtual memory and utilize it as you wish. This why unix provides shared memory and semaphores.

If any of lainon is interested in this subject you might want to check out how postgresql is implemented. especially :

https://github.com/postgres/postgres/tree/master/src/backend/storage/ipc

https://github.com/postgres/postgres/tree/master/src/backend/storage/large_object

https://github.com/postgres/postgres/tree/master/src/backend/storage/page

https://github.com/postgres/postgres/tree/master/src/backend/storage/buffer

  No.19983

>>19972
So one can mmap some memory and have it automatically transferred to disk by the kernel? Given some kind of transactional memory mechanism, it sounds like one could implement orthogonal/transparent persistence quite easily.

If there's no file system, how would one identify files and open them later? Does the kernel do this somehow?

  No.19991

>>19983

your question is somewhat vague but I'll try my best to interpret it.

http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html

http://man.openbsd.org/OpenBSD-current/man2/mmap.2

>mmap some memory and have it automatically transferred to disk


if a process mmap a "file" from file descriptor into virtual memory space with MAP_SHARED flag, changes in mapped memory will be reflected to the backing file in filesystem.

if you did not specify file descriptor you just created anonymous object which won't be backed by any file nor have any side effect on filesystem.

>transactional memory mechanism


not sure what language runtime or threading library you are talking about. mmap does not specify any sort of inherent locking mechanism. it's process who mapped virtual memory to manage proper locking btw threads.

> if there's no file system | does the kernel do this somehow


I have absolutely no idea what you are trying to imply here. "file" is implied by "filesystem". it's not meant to be intrinsic self hosting entity.

  No.19992

>>19983
>If there's no file system, how would one identify files and open them later? Does the kernel do this somehow?
If there is no filesystem there are no files.

  No.20004

>>19992
s/files/objects/
it's a valid question.

  No.20005

>>20004

I'll appreciate bit of clarification of >>19983 then

  No.20010

>>19991

Can I have a file descriptor that points to the whole disk as a block device? Can I somehow open a 4 TB disk, get a fd and mmap that to get a 4 TB address space of virtual memory that automatically gets swapped between volatile and non-volatile memomy?

I should be able to use that space for memory allocation too, since several malloc implementations use the mmap syscall rather than [s]brk. The kernel gets to decide when to actually write that stuff out to disk.

>not sure what language runtime or threading library you are talking about.


I'm talking about a memory model where changing data creates a new copy-on-write reference to it, exclusive to the mutator code at first. The reference gradually becomes available to the rest of the system as the program determines it is safe for them to access the new version of the data.

Kind of like how git works.

>>20005
The point is to persist the language's data structures directly, not "files" which are lists of potentially non-contiguous blocks of bytes. Files require you to serialize data structures.

This notion isn't so strange by the way. I know of several database systems which can operate in a mode where they take control of the full disk directly and just write whatever they want to write however they want.

  No.20014

>>20010
>file descriptor that points to the whole disk

no you can't.

why do you think we have filesystem in the first place.

>gets swapped btw vol, non vol memory


no that's kernel's job that happens as side effect of caching vnode and should not be visible to process. All process can do is giving hints with flags.

> creates new cow ref, provides access control


> database systems


as I mentioned in >>19972 word you are looking for is not file.

that being said, the whole purpose unix has file is to provide absolutely basic interface that provides good enough abstraction that "works". and problem begins when programmer starts premature optimization forgetting what was the point from the beginning.

That's why postgre devs did not start their own os and just developed backend for dealing storages.

  No.20015

>>20014
>no you can't.

doesn't unix have /dev/sdN block device files? that's used to access disks before you even install an FS in them; thats how you run programs to create the FS actually.

  No.20024

>>20015
>block device file

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sbin/newfs/mkfs.c

not sure which implementation you are talking about but most of them don't get to mmap entire disk while populating hard drive with filesystem and for good reasons.

I was quoting >automatically gets swapped part from >>20014 not file descriptor part. that's my bad for not clarifying.

  No.20025

>>20010
> Files require you to serialize data structures.

Storing things on the disk requires serialization, not just "files."

Some things can be represented on-disk exactly the same way they are in memory (e.g. an integer, as long as you don't plan to share that integer with any other system).

Databases use serialization too, the difference being a database's serialization format isn't necessarily human-readable.

> Can I have a file descriptor that points to the whole disk as a block device?

Yes, but unless you chose a solid serialization format, you would have a hell of a difficult time trying to reload it.

In particular, any pointers you dumped to the disk would be invalid; you'd have to come up with some kind of scheme to either:

a.) Guarantee that your runtime objects keep their addresses indefinitely (particularly across reboots). You could, for example, dump the entire system memory to a swap file, then reload this image when the computer starts (this is roughly how hibernation works).

or

b.) Serialize everything so that it could be reloaded without worrying about addresses

  No.20027

File: 1478473882662.png (247.49 KB, 200x200, death_of_block_devices.pdf)

>>20015
>>20024
>>20025

also it's called "block device" for reason. if one wishes to write said program by directly talking to disk controller's interface :

(pages: 512 to 4096 bytes per page
blocks: 64 to 256 pages per block)

1. reads and writes are performed at the granularity of a page

2. a block must be erased before any of the pages it contains can be overwritten

3. writes must be sequential within a block.

which is not issues while creating filesystem or using created filesystem but sever limitation to
>persist language's data structures .

again, check out how real db implementations deals with them(for block devices), they are not exactly human friendly structures.

BUT

if you can throw this "hardware" side limitation out of the picture, you get locality of RAM; there's no functional difference between ram and secondary storage in terms of accesses.

in theory, there's no reason ssd to act like block device and there's lots of talk about what should be next abstraction for them. or do we need abstractions at all.

attached paper is something all we'll have to consider to some point.

  No.20047

>>14969

I've also had this idea. It's particularly attractive now that NVDIMM is a thing, since your disk cache can be non-volatile.

>>14970

It'd be simpler to just have groups of functions (i.e. programs), and a jump table for each program that you update whenever you alter a function. The hard part isn't fixing up imports, it's figuring out how to update your data structures on the fly. If you rewrite your program to use a linked list instead of a growable array, how do you cleanly manage the two data structure versions? What if you add a 'creation time' firld, or something similar that can't be back-calculated from existing data? You can handle it, just not transparently.

On a semi-related note, since 64-bit memory is so damn big I've been thinking about a microkernel design where all software/hardware is mapped to a single address space, with the MMU being used to provide security. Message passing is then just address passing, and RPC is just a matter of jumping to the right address. The NX bit of the MMU can be used to map privileges to regions of memory, so that program-local memory can only be read by the appropriate code.