radmon.org down 19/04/20

More
2 years 3 months ago #5637 by mw0uzo
Hi all
Sorry, radmon.org was down for maybe a day today. I am away dealing with some family stuff so don't have physical access to the server and could not connect to it over the internet.
Luckily, my wife managed to help me with it so I didn't have to drive all the way back. I am amazed at the things we had to do to get it all working again:
- Power cycle radmon server

break

- Power cycle network router
- Power cycle radmon server.

break

- Check all cables plugged into ethernet switch. Lights, unplug, check, plug back in.
- Steal a USB keyboard and mouse from another PC.
- Grab a 32" TV off the desk and move it over by the Pi. Find the HDMI cable and power and plug in. No video
- Find the actual HDMI lead tucked away down the side
- Get video, discover ethernet disconnected on network monitor. Cabled all plugged in though
- Muck about with wicd network manager. Can't get wired connection back.
- Get a nano wifi out of pc on desk and put it in Pi USB hub
- Muck about with wicd to try to get that working. It didn't.
- Open up a terminal and type some commands
- Reboot again.
- This time network connected, but no download.
- Muck about with wicd more
- Reboot again via terminal
- Network connection established! Yay can finally log in to control server.

.. but it's still not working

break

- Figure out no connection to USB hard disk.
- Power cycle USB drive
- Reboot server.

it works!

Crazy isn't it.
The following user(s) said Thank You: FSM19, MrBRad

Please Log in or Create an account to join the conversation.

More
2 years 3 months ago #5638 by Bert490
Replied by Bert490 on topic radmon.org down 19/04/20
Many thanks to you and your wife for that tortuous troubleshooting journey!

I dream of a day when our personal AI sensory enhancement modules will be able to point out that single LED indicator or whirring sound that is OFF among a roomful of other indicators.

Please Log in or Create an account to join the conversation.

More
2 years 3 months ago - 2 years 3 months ago #5640 by mw0uzo
Replied by mw0uzo on topic radmon.org down 19/04/20
I've pretty much worked out why it has failed like this in an annoying way. Well I think I have anyway.

There is a memory leak in mate-panel, the desktop I use on the Pi. It has been fixed ages ago, but that version has not made it into Raspbian yet.
mate-panel slowly leaks over the course of a few weeks, consuming a few Gb of memory.
The backup scripts kick in over a few days each week. These push the server over the edge overnight while I'm asleep. Pi goes unresponsive as all memory is exhausted and everything grinds to a halt.

USB drive goes, hey, no activity. Goes into deep sleep, power off mode. I've seen this happen.

Pi is rebooted. USB drive stays asleep as bug in its firmware doesn't notice the USB interface come back up.

MariaDB doesn't start because the database is on the USB drive

radmon.org displays Error because no db.

Why are you running a desktop on a server, you might ask?

Well I use the automount feature in it to mount the USB drive, rather than specifying it in fstab, due to risk of server not booting up to be accessible over the network if the USB drive is inaccessible.
And also, I use the Pi desktop for web browsing and checking email if I don't want to turn the main PC on.

So what do I do? Got a few things to work out

Sort out fstab properly so USB drive is mounted at boot, but also doesn't fall over if the drive is not present. Then I don't have to run a desktop if it's not in use.

Pull in a new version of mate desktop. This risks some breakage.

Bodge it with crontab entry to restart mate panel every 12 hours.

I am running into a lot of bugs due to no fixes in Raspbian. There are serious issues with Apache falling over, fixed ages ago but not yet in Raspbian. Maybe I need to run something more up to date, or use something that makes Raspbian a bit better. Some research needed here.
Last edit: 2 years 3 months ago by mw0uzo.

Please Log in or Create an account to join the conversation.

More
2 years 3 months ago #5641 by mw0uzo
Replied by mw0uzo on topic radmon.org down 19/04/20
There is also a big question as to why the networking fell over this time. I don't have an explanation for this yet. But I know it needs to be 100% rock solid.

Please Log in or Create an account to join the conversation.

More
2 years 3 months ago #5642 by mw0uzo
Replied by mw0uzo on topic radmon.org down 19/04/20
I am thinking of beefing things up here in terms of server capability and reliability.
I need to run off solar, so small efficient computers are the way forward.

Perhaps I could run a few Pis, with full 64 bit up to date distros. There are a few now available, Arch etc.

Run NAS and DB activities from a dedicated Pi attached to the USB disks. With a better USB HD controller that doesn't go to sleep and not wake up. Two disks, with backup between the two.

Run 2 Pis for serving radmon.org, load balanced. Also I can use it to bring forward serious upgrades to radmon.org by bringing one offline. I could also double the detailed graph update rates with 2 Pis.

This could be all installed into a rack case with network switch and fan which can go in the 19in rack unit.

Thoughts, please - I may have to appeal for some donations to do it.

It may also be unnecessary, as I could add another HD and 2 better USB HD controllers and work on this server for reliability, as it seems quite capable so far at running radmon.org, aside from the niggles.

A first step might be to get another Pi, SD and USB HD controller and make a radmon server based off Arch 64 and swap it in when its ready. Then convert the old Pi to a dedicated storage and DB unit and then join the two up.

Please Log in or Create an account to join the conversation.

More
2 years 3 months ago #5645 by mw0uzo
Replied by mw0uzo on topic radmon.org down 19/04/20
Down again 23-24/04/20
Will be working on this as soon as I get back
The following user(s) said Thank You: FSM19

Please Log in or Create an account to join the conversation.

Time to create page: 0.231 seconds
Powered by Kunena Forum
Solar powered Raspberry Pi 4 server stats: CPU 52% Memory 16% Swap 17% CPU temp=74.0'C Uptime 65 Days