Loading ...
Sorry, an error occurred while loading the content.

Re: [ts-7000] ts-7250 network failing

Expand Messages
  • Walter Marvin
    try rebooting sockets can go stale after some time period ________________________________ From: Jason Stahls To: ts-7000@yahoogroups.com
    Message 1 of 8 , Mar 31, 2012
    View Source
    • 0 Attachment
      try rebooting sockets can go stale after some time period


      From: Jason Stahls <jason@...>
      To: ts-7000@yahoogroups.com
      Sent: Saturday, March 31, 2012 2:10 PM
      Subject: Re: [ts-7000] ts-7250 network failing

       
      On 3/31/2012 4:31 AM, j.chitte wrote:
      > Hi,
      >
      > I have a 7250 with 2.6.32.11 kernel and basic system build from scratch. A minimal linux-from-scratch.
      >
      > It has been working reliably for about 18 months but recently, about every 10 days, the network (wired ethernet) fails to respond.
      >
      > I login via serial link and enter /etc/init.d/network restart and all comes back to order.
      >
      > The rest of the system seems to be running fine through all this and I don't see any other dysfunction.
      >
      > Is this the first signs of flash burn out or other hardware problems?
      >
      > Can anyone suggest how to pin it down?

      I've never had the issue with TS boards but I have with older e100
      drivers (Intel Pro/100) failing out after a while. I ended up setting
      up a cron job that rmmod/insmod'd the module every couple hours, not
      exactly ideal but it worked.

      Before thinking hardware failure tho I'd try a different kernel, it's
      probably the driver crapping out. Check dmesg, see if the module tossed
      any errors. Also, are you running anything CPU intensive that might
      cause a ISR to not be serviced and cause a buffer overflow in the MAC?
      --
      Jason Stahls


    • j.chitte
      ... Thanks for the replies,all. What seems odd is that this installation has been working perfectly for 18mth. Same kernel, no updates. Exactly the same thing.
      Message 2 of 8 , Apr 1 7:57 AM
      View Source
      • 0 Attachment
        --- In ts-7000@yahoogroups.com, Jason Stahls <jason@...> wrote:
        >
        > On 3/31/2012 4:31 AM, j.chitte wrote:
        > > Hi,
        > >
        > > I have a 7250 with 2.6.32.11 kernel and basic system build from scratch. A minimal linux-from-scratch.
        > >
        > > It has been working reliably for about 18 months but recently, about every 10 days, the network (wired ethernet) fails to respond.
        > >
        > > I login via serial link and enter /etc/init.d/network restart and all comes back to order.
        > >
        > > The rest of the system seems to be running fine through all this and I don't see any other dysfunction.
        > >
        > > Is this the first signs of flash burn out or other hardware problems?
        > >
        > > Can anyone suggest how to pin it down?
        >
        > I've never had the issue with TS boards but I have with older e100
        > drivers (Intel Pro/100) failing out after a while. I ended up setting
        > up a cron job that rmmod/insmod'd the module every couple hours, not
        > exactly ideal but it worked.
        >
        > Before thinking hardware failure tho I'd try a different kernel, it's
        > probably the driver crapping out. Check dmesg, see if the module tossed
        > any errors. Also, are you running anything CPU intensive that might
        > cause a ISR to not be serviced and cause a buffer overflow in the MAC?
        > --
        > Jason Stahls
        >

        Thanks for the replies,all.

        What seems odd is that this installation has been working perfectly for 18mth. Same kernel, no updates. Exactly the same thing.

        That is what made me think it was hardware.

        The network exposure is minimal. It is connected to one desktop PC on a different subnet to the wired PC-WAN link.

        setting up a cronjob is a good work around but I'd like to know at least where the problem lies before sweeping it under the carpet.


        Nothing notable in dmesg:


        yaffs: dev is 32505857 name is "mtdblock1"
        yaffs: passed flags ""
        yaffs: Attempting MTD mount on 31.1, "mtdblock1"
        yaffs: auto selecting yaffs2
        yaffs: restored from checkpoint
        hub 1-0:1.0: state 7 ports 3 chg 0000 evt 0000
        yaffs_read_super: isCheckpointed 1
        VFS: Mounted root (yaffs filesystem) on device 31:1.
        Freeing init memory: 104K
        cfg80211: Calling CRDA to update world regulatory domain
        usbcore: registered new interface driver rt73usb
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        PHY: 0:01 - Link is Down
        PHY: 0:01 - Link is Up - 100/Full
        # uptime
        14:59:34 up 82 days, 5:19, load average: 1.13, 1.14, 1.10


        Link is Up/Down would be the handful of times I've had to restart the network since the reboot 82 days ago. The rest looks clean.

        I do have one "heavy" job calling gnuplot to produce and svg. this can take about 20-30s near the end of the day with a lot of data.

        Again, it's been doing that for 18m as well. No problem.

        IIRC the reboot was my first attempt at fixing broken link. I did not realise it was just the network down at first.

        # uname -a
        Linux arm26 2.6.32.11-m4 #19 PREEMPT Mon Jun 7 00:12:11 CEST 2010 armv4tl GNU/Linux

        The software has not been touched since the full rebuild : 7th June 2010.
      • al
        You should contact TS support to see if the Errata section on the 75xx/45xx series applies to the 7250. I see no errata listed for the 7250, but some of the
        Message 3 of 8 , Apr 2 6:49 AM
        View Source
        • 0 Attachment
          You should contact TS support to see if the Errata section on the 75xx/45xx series applies to the 7250. I see no errata listed for the 7250, but some of the HW is shared on the boards. There is a known HW problem in the 75xx series boards that causes Ethernet connectivity to be dropped for up to 30 seconds. I have no idea if this applies to the 7250 board, but you may want to contact TS support to be certain. Here is the URL to the errata:

          http://www.embeddedarm.com/wiki/index.php/TS-7500#Errata

          I hope this helps.


          --- In ts-7000@yahoogroups.com, "j.chitte" <j.chitte@...> wrote:
          >
          >
          >
          > --- In ts-7000@yahoogroups.com, Jason Stahls <jason@> wrote:
          > >
          > > On 3/31/2012 4:31 AM, j.chitte wrote:
          > > > Hi,
          > > >
          > > > I have a 7250 with 2.6.32.11 kernel and basic system build from scratch. A minimal linux-from-scratch.
          > > >
          > > > It has been working reliably for about 18 months but recently, about every 10 days, the network (wired ethernet) fails to respond.
          > > >
          > > > I login via serial link and enter /etc/init.d/network restart and all comes back to order.
          > > >
          > > > The rest of the system seems to be running fine through all this and I don't see any other dysfunction.
          > > >
          > > > Is this the first signs of flash burn out or other hardware problems?
          > > >
          > > > Can anyone suggest how to pin it down?
          > >
          > > I've never had the issue with TS boards but I have with older e100
          > > drivers (Intel Pro/100) failing out after a while. I ended up setting
          > > up a cron job that rmmod/insmod'd the module every couple hours, not
          > > exactly ideal but it worked.
          > >
          > > Before thinking hardware failure tho I'd try a different kernel, it's
          > > probably the driver crapping out. Check dmesg, see if the module tossed
          > > any errors. Also, are you running anything CPU intensive that might
          > > cause a ISR to not be serviced and cause a buffer overflow in the MAC?
          > > --
          > > Jason Stahls
          > >
          >
          > Thanks for the replies,all.
          >
          > What seems odd is that this installation has been working perfectly for 18mth. Same kernel, no updates. Exactly the same thing.
          >
          > That is what made me think it was hardware.
          >
          > The network exposure is minimal. It is connected to one desktop PC on a different subnet to the wired PC-WAN link.
          >
          > setting up a cronjob is a good work around but I'd like to know at least where the problem lies before sweeping it under the carpet.
          >
          >
          > Nothing notable in dmesg:
          >
          >
          > yaffs: dev is 32505857 name is "mtdblock1"
          > yaffs: passed flags ""
          > yaffs: Attempting MTD mount on 31.1, "mtdblock1"
          > yaffs: auto selecting yaffs2
          > yaffs: restored from checkpoint
          > hub 1-0:1.0: state 7 ports 3 chg 0000 evt 0000
          > yaffs_read_super: isCheckpointed 1
          > VFS: Mounted root (yaffs filesystem) on device 31:1.
          > Freeing init memory: 104K
          > cfg80211: Calling CRDA to update world regulatory domain
          > usbcore: registered new interface driver rt73usb
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > PHY: 0:01 - Link is Down
          > PHY: 0:01 - Link is Up - 100/Full
          > # uptime
          > 14:59:34 up 82 days, 5:19, load average: 1.13, 1.14, 1.10
          >
          >
          > Link is Up/Down would be the handful of times I've had to restart the network since the reboot 82 days ago. The rest looks clean.
          >
          > I do have one "heavy" job calling gnuplot to produce and svg. this can take about 20-30s near the end of the day with a lot of data.
          >
          > Again, it's been doing that for 18m as well. No problem.
          >
          > IIRC the reboot was my first attempt at fixing broken link. I did not realise it was just the network down at first.
          >
          > # uname -a
          > Linux arm26 2.6.32.11-m4 #19 PREEMPT Mon Jun 7 00:12:11 CEST 2010 armv4tl GNU/Linux
          >
          > The software has not been touched since the full rebuild : 7th June 2010.
          >
        • Jason Stahls
          ... Just as a FYI (thanks for the info tho :) ) the 72xx and 75xx/45xx hardware is completely different, and a different design philosophy. TS support is
          Message 4 of 8 , Apr 2 7:34 AM
          View Source
          • 0 Attachment
            On 4/2/2012 9:49 AM, al wrote:
            > You should contact TS support to see if the Errata section on the 75xx/45xx series applies to the 7250. I see no errata listed for the 7250, but some of the HW is shared on the boards. There is a known HW problem in the 75xx series boards that causes Ethernet connectivity to be dropped for up to 30 seconds. I have no idea if this applies to the 7250 board, but you may want to contact TS support to be certain. Here is the URL to the errata:
            >
            > http://www.embeddedarm.com/wiki/index.php/TS-7500#Errata

            Just as a FYI (thanks for the info tho :) ) the 72xx and 75xx/45xx
            hardware is completely different, and a different design philosophy. TS
            support is definitely worth calling to see if they've had the issue tho.

            --
            Jason Stahls
          • j.chitte
            ... Not much joy from TS on this. Not a known problem apparently. Just suggested is do an RMA on my ARM. I have found that ping localhost works on the board
            Message 5 of 8 , Apr 18 3:39 AM
            View Source
            • 0 Attachment
              --- In ts-7000@yahoogroups.com, Jason Stahls <jason@...> wrote:
              >
              > On 4/2/2012 9:49 AM, al wrote:
              > > You should contact TS support to see if the Errata section on the 75xx/45xx series applies to the 7250. I see no errata listed for the 7250, but some of the HW is shared on the boards. There is a known HW problem in the 75xx series boards that causes Ethernet connectivity to be dropped for up to 30 seconds. I have no idea if this applies to the 7250 board, but you may want to contact TS support to be certain. Here is the URL to the errata:
              > >
              > > http://www.embeddedarm.com/wiki/index.php/TS-7500#Errata
              >
              > Just as a FYI (thanks for the info tho :) ) the 72xx and 75xx/45xx
              > hardware is completely different, and a different design philosophy. TS
              > support is definitely worth calling to see if they've had the issue tho.
              >
              > --
              > Jason Stahls
              >

              Not much joy from TS on this. Not a known problem apparently. Just suggested is do an RMA on my ARM.


              I have found that ping localhost works on the board but ping eth0 fails. ifconfig eth0 down; ifconfig eth0 up fixes it.

              Also pinging the ARM from LAN blinks the led of the 7250 Ethernet socket, thought I think that's electrical and does not even need linux driver to be loaded.

              Any ideas how I can dig deeper on this?

              Thx
            Your message has been successfully submitted and would be delivered to recipients shortly.