OneSwarm Forum » Bugs

Some Issues I've Identified

(9 posts)
  1. Tristor
    Member
    Login to Send PM

    I started using OneSwarm shortly after release yesterday and have been evangelizing and trying to build a sizable mesh. During this, I've noticed a few issues which I think you should be made aware of. Some of them may qualify as bugs, some may be known issues that don't have ready solutions.

    1. Hashing performance

    The data shares I set in my watch list upon installation total 1959GB. While I didn't expect it to has quickly, after 24 hours had passed hashing still had not been completed when running on a system with quad core processor running at 3.6GHz with 8GB of RAM. It appears to me that hashing performance is CPU bound and could be significantly improved by threading the hashing operation so that multi-core CPUs could be properly utilized.

    2. As a corollary to the first issue, hashing randomly stops after some time. Exiting OneSwarm and restarting causes hashing to begin again without issue, but this recurs several times. I believe this may related to performance issues with hashing and perhaps performance with internal data structures used to store information about the swarms.

    3. Remote Interface

    Requests to the remote interface don't take precedence in the application over using outgoing bandwidth for transferring data. This is a flaw, because if you are not throttling outgoing bandwidth (and you shouldn't have to) you are unable to remotely control the application to change any settings regarding bandwidth usage or anything else. It simply times out, even if the host has reasonable pings (I was able to ping from an outside line at average of 30ms but could not connect to the remote interface). For the moment the workaround has been to implement QoS internally on my network and to give precedence for HTTPS traffic going to the IP of the box running OneSwarm, this seems to have alleviated the issue.

    4. Scalability

    As more nodes/friends are added to my mesh and I add more data to my shares, performance issues and general issues become more frequent and more likely to be catastrophic, requiring me to restart OneSwarm for everything to work properly once again. I've received errors about tracker failure after I had more than 3000 swarms loaded, and as mentioned before severe performance issues with hashing. As more nodes are added, it can't seem to keep track of connection states either, sometimes seeing LAN nodes as disconnected, even though there is gobs of bandwidth available between them (2GbE on both nodes). Performing a Force Connect on LAN nodes that show as disconnected (when they should not be) does fix this issue, but it is troublesome.

    5. Friend adding

    I've seen multiple instances in which either the Gtalk or LAN friend adding doesn't work properly, or even after adding a friend (trying more than once even) it doesn't believe that the person accepted and keeps them grayed out. In all instances, however, it can be worked around by doing a manual key exchange and adding that way. These problems seem to occur more frequently with the more friends that you have added. In addition if a single user has multiple nodes on their network that are Internet accessible running OneSwarm, the Gtalk bot service appears to get severely confused if you try to connect to your Gtalk account on both. The workaround for the moment to make Gtalk adds work is to have a separate Gtalk account for each node internally, luckily this doesn't affect the LAN adding and doing a manual key exchange still works fine of course.

    All in all, I'm quite optimistic about the future of OneSwarm and find it very interesting. It does seem to perform well under smaller loads and I understand that with the size of data that I'm sharing from the nodes and the type of setup I'm using is probably not what was conceived as the norm during development. Hopefully as the project matures some of the above items will be addressed, especially with better threading for the application which I believe may solve many of the scalability issues.

    Posted 1 year ago #
  2. piatek
    Administrator
    Login to Send PM

    Thanks for the suggestions. You're right that we're just starting to test things at the scale you describe. We're working on most of the improvements you suggest, but with respect to multicore hashing, it's a bit of a tricky problem. Hashing often creates a bottleneck at the disk (especially on laptop disks), and doing multiple concurrent hashes can make a machine basically unusable. We'll probably add a configuration option for concurrent hashing in the future, but that's the rationale for why it's not there right now.

    Posted 1 year ago #
  3. Tristor
    Member
    Login to Send PM

    I appreciate your reply. Your reasoning is sound, I think making that configurable as an advance option is the way to go as each of my nodes has a RAID storage subsystem so shouldn't see bottlenecks with disk I/O, but that could be an issue on an average desktop computer for sure.

    I have noticed that as I've got now over 5000 swarms and all 2TB of data has finished hashing on this node, that I even encounter issues with rendering the page and it managing the number of torrents. I do believe that multithreading even the core functions of the application is essential for good scalability into the future, especially considering how predominant dual-core processors have become, and quad cores will soon be.

    Posted 1 year ago #
  4. piatek
    Administrator
    Login to Send PM

    Yes, I think that's right. I have to say, 5000 swarms with 2 TB of data is way more than we've ever tried to share in our testing as yet 8).

    Posted 1 year ago #
  5. Tristor
    Member
    Login to Send PM

    Well, I've succeeded in making OneSwarm totally shit bricks. I've got 2.23TB of data with over 12000 swarms now on this node. Multiple aspects of the application on this node are now failing to work properly. When doing manual key friend adds, for instance, it's now impossible to do so as the public key for myself shown is blank, cannot be modified, and the add friend button is grayed out until a public key is in both boxes. It's no longer possible to use preview mode, even when limiting the number of items shown per page as it grinds to a halt and can't generate the page, etc.

    Based on my experiences thus far, because of the scalability issues, especially with threading and maybe internal data structures (with relation to number of swarms), OneSwarm seems to achieve best working conditions under load when limited to about 2500 to 3000 swarms per node, with those swarms containing a total off less than 500GB of data. With my current amount of data/swarms on my network, I really need to have about 12 nodes to have good performance, even those those nodes don't need to have nearly as high end hardware as what my current 3 nodes have, as that hardware isn't currently utilized.

    Hopefully my experiences will be helpful with further development, as it seems I'm one of few people to have really pushed the limits based on what I've read on the forums.

    Posted 1 year ago #
  6. piatek
    Administrator
    Login to Send PM

    You might simply be running out of memory. (Do you see out of memory exceptions in the error.log file?) If so, let me know what platform you're on and you can increase it without too much trouble.

    Posted 1 year ago #
  7. Tristor
    Member
    Login to Send PM

    Hmm, I do not see any out of memory exceptions in the error.log. I've yet to see memory usage exceed 800MB, and this box is running on Win 7 64bit with 8GB of RAM. While I realize OneSwarm is a 32bit app, it's got 4 gig to play with without any trouble, and shouldn't be running out of memory. The other node has 4GB on 32bit XP Pro and has similar perf issues, and has not had memory usage exceed 600MB.

    Posted 1 year ago #
  8. Tristor
    Member
    Login to Send PM

    Wow, I can see why its an issue by looking in the Classic UI. The problem seems to be that while the f2f swarms are trackerless, as a torrent they specify the tracker as tracker.invalid/announce. For every single swarm the application still tries to make a connection to that announce url at semi-regular intervals, and at startup, for every single swarm. This causes it to request name resolution, and then open a socket. It's basically opening at least one socket per swarm at startup and then at regular intervals, which as the number of swarms goes up soon approaches a rather large number.

    The sheer overhead of handling that many sockets/sessions open both at the node and at my firewall is causing huge lag internally. It's also making the error log go crazy with not having connections of course.

    Is there any way to improve this by implementing the trackerless torrents for f2f swarms in such a way as that it never queries the announce url at all (or doesn't even require one). Is there something you could enter in that field that you could rewrite the scrape algorithm to see as a flag not to create a socket. That's a huge waste of system resources right there, trying to query a tracker that doesn't exist.

    Posted 1 year ago #
  9. piatek
    Administrator
    Login to Send PM

    You're right, these are unnecessary connections, and we'll skip them in a future version.

    Posted 1 year ago #

RSS feed for this topic

Reply

You must log in to post.