Skip to content
Snippets Groups Projects
  1. Dec 21, 2017
  2. Dec 12, 2017
  3. Dec 06, 2017
  4. Dec 01, 2017
  5. Nov 23, 2017
  6. Nov 10, 2017
    • Karsten Weiss's avatar
      cpu: Support processor-less (memory-only) NUMA nodes (#734) · a8d7d110
      Karsten Weiss authored
      * cpu: Support processor-less (memory-only) NUMA nodes
      
      Processor-less (memory-only) NUMA nodes exist e.g. in systems that use
      Intel Optane drives for RAM expansion using Intel Memory Drive
      Technology (IMDT).
      
      IMDT RAM expansion supports two modes:
      
      * "Unify Remote Memory domains": present a processor-less (memory-only)
        NUMA domain, which is the default
      * "Expand local memory domains": to expand each processor’s memory domain
        with a portion of the memory made available by Optane and IMDT
      
      This commit fixes a crash in the first case (when "cpulist" is empty).
      
      Here's an example of such a system:
      
      $ numastat -m|head -n5
      
      Per-node system memory usage (in MBs):
                                Node 0          Node 1          Node 2           Total
                       --------------- --------------- --------------- ---------------
      MemTotal               118239.56       130816.00       464384.00       713439.56
      
      $ for i in {0..2}; do echo -n "$i: " ; cat /sys/bus/node/devices/node$i/cpulist ; done
      0: 0-7,16-23
      1: 8-15,24-31
      2:
      
      $ /opt/vsmp/bin/vsmpversion -vvv
      Memory Drive Technology: 8.2.1455.74 (Sep 28 2017 13:09:59)
      System configuration:
          Boards:      3
             1 x Proc. + I/O + Memory
             2 x NVM devices (Intel SSDPED1K375GAQ)
          Processors:  2, Cores: 16, Threads: 32
              Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz Stepping 01
          Memory (MB): 713472 (of 977450), Cache: 251416, Private: 12562
             1 x 249088MB   [262036/   678/12270]
             1 x 232192MB   [357707/125369/  146]  82:00.0#1
             1 x 232192MB   [357707/125369/  146]  83:00.0#1
      
      * cpu: rename some variables (pkg => node)
      
      * cpu: Use %v not %q in log.Debugf() format strings
      a8d7d110
  7. Nov 07, 2017
  8. Nov 02, 2017
  9. Oct 31, 2017
  10. Oct 25, 2017
    • Derek Marcotte's avatar
      Correct buffer_bytes > INT_MAX on BSD/amd64. (#712) · 0eecaa95
      Derek Marcotte authored
      * Correct buffer_bytes > INT_MAX on BSD/amd64.
      
      The sysctl vfs.bufspace returns either an int or a long, depending on
      the value.  Large values of vfs.bufspace will result in error messages
      like:
      
        couldn't get meminfo: cannot allocate memory
      
      This will detect the returned data type, and cast appropriately.
      
      * Added explicit length checks per feedback.
      
      * Flatten Value() to make it easier to read.
      
      * Simplify per feedback.
      
      * Fix style.
      
      * Doc updates.
      0eecaa95
  11. Oct 21, 2017
  12. Oct 20, 2017
  13. Oct 19, 2017
  14. Oct 18, 2017
    • Pontus Leitzler's avatar
      Remove unnecessary select statement (#692) · 0b676388
      Pontus Leitzler authored
      * Remove unnecessary select statement
      
      * Remove unnecessary if-statement
      0b676388
    • Ben Kochie's avatar
      Fix smartmon.sh textfile script (#700) · 1824ac3b
      Ben Kochie authored
      When there are no SMART compatible devices (Raspberry Pi for example) an
      error is returned, but the return code is still 0.
      
      `# scan_smart_devices: glob(3) aborted matching pattern /dev/discs/disc*`
      
      * Remove unused `disks` variable.
      * Filter for only valid `/dev` devices.
      1824ac3b
  15. Oct 14, 2017
    • Siavash Safi's avatar
      Add `collect[]` parameter (#699) · f3a70226
      Siavash Safi authored
      * Add `collect[]` parameter
      
      * Add TODo comment about staticcheck ignored
      
      * Restore promhttp.HandlerOpts
      
      * Log a warning and return HTTP error instead of failing
      
      * Check collector existence and status, cleanups
      
      * Fix warnings and error messages
      
      * Don't panic, return error if collector registration failed
      
      * Update README
      f3a70226
  16. Oct 11, 2017
  17. Oct 06, 2017
  18. Oct 05, 2017
    • Ben Kochie's avatar
      Update vendoring (#685) · deadfef4
      Ben Kochie authored
      * Update vendor github.com/coreos/go-systemd/dbus@v15
      
      * Update vendor github.com/ema/qdisc
      
      * Update vendor github.com/godbus/dbus
      
      * Update vendor github.com/golang/protobuf/proto
      
      * Update vendor github.com/lufia/iostat
      
      * Update vendor github.com/matttproud/golang_protobuf_extensions/pbutil@v1.0.0
      
      * Update vendor github.com/prometheus/client_golang/...
      
      * Update vendor github.com/prometheus/common/...
      
      * Update vendor github.com/prometheus/procfs/...
      
      * Update vendor github.com/sirupsen/logrus@v1.0.3
      
      Adds vendor golang.org/x/crypto
      
      * Update vendor golang.org/x/net/...
      
      * Update vendor golang.org/x/sys/...
      
      * Update end to end output.
      deadfef4
    • Tobias Schmidt's avatar
      Merge pull request #682 from derekmarcotte/dm-386-native · ba96b656
      Tobias Schmidt authored
      Only enable race detector when GOHOSTARCH is amd64.
      ba96b656
  19. Oct 04, 2017
  20. Oct 03, 2017
  21. Sep 28, 2017
    • Calle Pettersson's avatar
      Replace --collectors.enabled with per-collector flags (#640) · 859a825b
      Calle Pettersson authored
      * Move NodeCollector into package collector
      
      * Refactor collector enabling
      
      * Update README with new collector enabled flags
      
      * Fix out-of-date inline flag reference syntax
      
      * Use new flags in end-to-end tests
      
      * Add flag to disable all default collectors
      
      * Track if a flag has been set explicitly
      
      * Add --collectors.disable-defaults to README
      
      * Revert disable-defaults flag
      
      * Shorten flags
      
      * Fixup timex collector registration
      
      * Fix end-to-end tests
      
      * Change procfs and sysfs path flags
      
      * Fix review comments
      859a825b
  22. Sep 19, 2017
    • Sami Kerola's avatar
      Add timex collector (#664) · 3762191e
      Sami Kerola authored
      This collector is based on adjtimex(2) system call.  The collector returns
      three values, status if time is synchronised, offset to remote reference,
      and local clock frequency adjustment.
      
      Values are taken from kernel time keeping data structures to avoid getting
      involved how the synchronisation is implemented.  By that I mean one should
      not care if time is update using ntpd, systemd.timesyncd, ptpd, and so on.
      Since all time sync implementation will always end up telling to kernel what
      is the status with time one can simply omit the software in between, and
      look results of the syncing.  As a positive side effect this makes collector
      very quick and conceptually specific, this does not monitor availability of
      NTP server, or network in between, or dns resolution, and other unrelated
      but necessary things.
      
      Minimum set of values to keep eye on are the following three:
      
          The node_timex_sync_status tells if local clock is in sync with a remote
          clock.  Value is set to zero when synchronisation to a reliable server
          is lost, or a time sync software is misconfigured.
      
          The node_timex_offset_seconds tells how much local clock is off when
          compared to reference.  In case of multiple time references this value
          is outcome of RFC 5905 adjustment algorithm.  Ideally offset should be
          close to zero, and it depends about use case how large value is
          acceptable.  For example a typical web server is probably fine if offset
          is about 0.1 or less, but that would not be good enough for mobile phone
          base station operator.
      
          The node_timex_freq tells amount of adjustment to local clock tick
          frequency.  For example if offset is one second and growing the local
          clock will need instruction to tick quicker.  Number value itself is not
          very important, and occasional small adjustments are fine.  When
          frequency is unusually in stable one can assume quality of time stamps
          will not be accurate to very far in sub second range.  Obviously
          explaining why local clock frequency behaves like a passenger in roller
          coaster is different matter.  Explanations can vary from system load, to
          environmental issues such as a machine being physically too hot.
      
      Rest of the measurements can help when debugging.  If you run a clock server
      do probably want to collect and keep track of everything.
      
      Pull-request: https://github.com/prometheus/node_exporter/pull/664
      3762191e
    • Leonid Evdokimov's avatar
      Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check (#655) · c169b4b1
      Leonid Evdokimov authored
      * Add metrics from SNTPv4 packet to ntp collector & add ntpd sanity check
      
      1. Checking local clock against remote NTP daemon is bad idea, local
      ntpd acting as a  client should do it better and avoid excessive load on
      remote NTP server so the collector is refactored to query local NTP
      server.
      
      2. Checking local clock against remote one does not check local ntpd
      itself. Local ntpd may be down or out of sync due to network issues, but
      clock will be OK.
      
      3. Checking NTP server using sanity of it's response is tricky and
      depends on ntpd implementation, that's why common `node_ntp_sanity`
      variable is exported.
      
      * `govendor add golang.org/x/net/ipv4`, it is dependency of github.com/beevik/ntp
      
      * Update github.com/beevik/ntp to include boring SNTP fix
      
      * Use variable name from RFC5905
      
      * ntp: move code to make export of raw metrics more explicit
      
      * Move NTP math to `github.com/beevik/ntp`
      
      * Make `golint` happy
      
      * Add some brief docs explaining `ntp` #655 and `timex` #664 modules
      
      * ntp: drop XXX comment that got its decision
      
      * ntp: add `_seconds` suffix to relevant metrics
      
      * Better `node_ntp_leap` comment
      
      * s/node_ntp_reftime/node_ntp_reference_timestamp_seconds/ as requested by @discordianfish
      
      * Extract subsystem name to const as suggested by @superq
      c169b4b1
  23. Sep 07, 2017
    • Karsten Weiss's avatar
      cpu: Metric 'package_throttles_total' is per package. (#657) · b0d5c008
      Karsten Weiss authored
      * cpu: Metric 'package_throttles_total' is per package.
      
      'package_throttles_total' is per package, not per cpu. This also reduces
      the total number of cpu time series a lot (esp for multi core cpus).
      
      * cpu: Better handling of a cpulist edge-case.
      
      * cpu: Extract the package number from the directory name.
      
      Do not rely on the range index.
      
      * cpu: Add package_throttle_count for node0 cpu1
      
      This file must be ignored by the cpu collector.
      b0d5c008
  24. Aug 31, 2017
    • Alexey Palazhchenko's avatar
      Test with Go 1.9.x (#667) · abb58a31
      Alexey Palazhchenko authored
      abb58a31
    • Matt Bostock's avatar
      Always try to return smartmon_device_info metric (#663) · 89a2f21f
      Matt Bostock authored
      * Always try to return smartmon_device_info metric
      
      Sometimes the 'model family' field is not returned by `smartctl' because
      a disk is not in the disk database for the version of smartmontools
      installed on the system.
      
      In those cases, the device model and serial number is still returned (at
      least as far as I have observed.
      
      Re-work the logic to prefer the 'vendor' field first, and if not
      present, always output a `smartmon_device_info` metric even if some
      labels have empty values.
      
      On the box I'm testing this on, where previously no metric was returned,
      it now returns:
      
          # HELP smartmon_device_info SMART metric device_info
          # TYPE smartmon_device_info gauge
          smartmon_device_info{disk="/dev/sda",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
          smartmon_device_info{disk="/dev/sdb",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
          smartmon_device_info{disk="/dev/sdc",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
          smartmon_device_info{disk="/dev/sdd",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
          smartmon_device_info{disk="/dev/sde",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
          smartmon_device_info{disk="/dev/sdf",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1
      
      * Add trailing newline
      
      Because POSIX:
      https://stackoverflow.com/a/729795
      89a2f21f
  25. Aug 24, 2017
Loading