Naraeon SSD Tools internals – 7. Get diagnostic information(SMART) from SATA devices

Necessity

I think this part doesn’t need any further explanation about necessity. Because this part is the reason of existence of disk management tool. With this, we can manage devices’ status, and catches subtle failures that can be easily ignored by users, to prevent big failure.

S.M.A.R.T: Self-Monitoring, Analysis, and Reporting Technology

As the name says, this is a technology that monitors storage itself and checks whether it has problem or not. It doesn’t mean advanced things like deep learning. It is rather a function that checks itself by gathered information from sensor inputs.

You may remember problems what old hard disks have. Without parking programs, we cannot turn off PC immediately. Because of the hard disks those fail without any notice, vendors start to develop drive diagnosis technologies. So PFA(Predictive Failure Analysis) was developed by IBM and IntelliSafe was developed by Seagate, Quantum, etc. But they were vendor specific technology, they need vendor’s software to get the information. Finally, IntelliSafe adapted by ATA standard and it became S.M.A.R.T technology from that time.

However there was a problem. As you can see in the IntelliSafe link above, it doesn’t contain any result format table. I said you the technology shows you ‘gathered information from sensor inputs’. So ATA standard became catastrophic to tool developers.

‘[0..361] Vendor specific’ is the most important part of this table (From: ACS-3 standard)

Vendor specific means ‘implement as you like’. And all information is in that field. So, the standard means nothing but just display any information in any format. If you think SMART information contains ID, current value, worst value, RAW value, it is wrong. The standard defines nothing about that. In real, some controllers use 6 byte RAW value, other controllers use 4 byte RAW value. Even there are some controllers have fields that are shared by two items. This case make tool developers crazy, CrystalDiskInfo also cannot get the information correctly.

In fact, there is some kind of rules. Because vendors who make complete PCs(Dell, HP, etc.) need to make unified diagnostic system. And fragmentation makes things hard. So I think minimum guidelines about fields like ID, current value, worst value, RAW value created by that.

Diagnostic system of Dell (From: Dell)

May OEM devices are different, but retail devices have no common things other than byte format. What each ID means? How we convert RAW values into human readable units? Questions like them must be solved who need to develop the tool. Because it’s some kind of sensitive data, vendor doesn’t provide anything about it. If you are lucky enough, like Samsung or Intel case, you would get the datasheet that contains the meaning of each ID. And since even the documents don’t have any information about conclusion from them, that must be drawn from functions that made by tool developers.

Input

In this chaotic situation, input is only one thing that is fully defined. Empty PreviousTaskFile and data buffer, and set other like next list. (N/A also means 0)

  • AtaFlags of Passthrough structure: ATA_FLAGS_DATA_IN
  • CurrentTaskFile[0]: 0xD0 (Threshold: 0xD1)
  • CurrentTaskFile[1]: N/A
  • CurrentTaskFile[2]: N/A
  • CurrentTaskFile[3]: 0x4F
  • CurrentTaskFile[4]: 0xC2
  • CurrentTaskFile[5]: N/A
  • CurrentTaskFile[6]: 0xB0
  • CurrentTaskFile[7]: N/A

There’s one weird thing. In CurrentTaskFile[0](Feature), what is D0 and what is D1? With D1, you can get threshold value. If current value is below threshold value, it is not good. In the standard, there is a table like next picture.

feature field values (From: ACS-3)

It shows D1h(=0xD1) is Obsolete. So you shouldn’t implement this because it is no longer in standard. But maybe by OEM request or de facto standard CrystalDiskInfo implementation, latest SSDs also implement this. So we should issue all two commands to get them.

When you issue commands by ioctl, the real problem starts.

Output

Standard only defines ‘[0..361] Vendor specific’. Interpretation must be implemented by yourself. Fortunately, many products follows next structure. Structure source gotten from CrystalDiskInfo source.

    typedef    struct _SMART_ATTRIBUTE
    {
        BYTE    Id;
        WORD    StatusFlags;
        BYTE    CurrentValue;
        BYTE    WorstValue;
        BYTE    RawValue[6];
        BYTE    Reserved;
     } SMART_ATTRIBUTE;

    typedef    struct _SMART_THRESHOLD
    {
        BYTE    Id;
        BYTE    ThresholdValue;
        BYTE    Reserved[10];
    } SMART_THRESHOLD;

But most important thing is, ‘It shouldn’t be followed’, and there are many devices those don’t follow it. So you should be cautious when you implement this. I don’t recommend the implementation only casts the buffer into the array of that structure. You must use each product and check this common interpretation is valid. If you overcome this part, and you need to check each Id means what. And it should be analyzed by yourself.

Implementation

Naraeon SSD Tools issues two commands in SetBufferAndSMARTReadData and SetBufferAndSMARTReadThreshold of TATACommandSet. Details are explained in Input section.

procedure TATACommandSet.SetBufferAndSMARTReadData;
begin
  SetInnerBufferToSMARTReadData;
  SetOSBufferByInnerBuffer;
  IoControl(TIoControlCode.ATAPassThroughDirect, IoOSBuffer);
end;

procedure TATACommandSet.SetBufferAndSMARTReadThreshold;
begin
  SetInnerBufferToSMARTReadThreshold;
  SetOSBufferByInnerBuffer;
  IoControl(TIoControlCode.ATAPassThroughDirect, IoOSBuffer);
end;

It interprets the results in BufferToSMARTValueList and BufferToSMARTThresholdValueList of TATABufferInterpreter. Details are explained in Output section.

function TATABufferInterpreter.BufferToSMARTValueList(
  const Buffer: TSmallBuffer): TSMARTValueList;
const
  SMARTStartPadding = 2;
  SMARTValueLength = 12;
  function CalculateRow(const CurrentRow: Integer): Integer;
  begin
    result := (CurrentRow * SMARTValueLength) + SMARTStartPadding;
  end;
var
  CurrentRow: Integer;
  MaxRow: Integer;
begin
  SMARTValueList := TSMARTValueList.Create;
  BufferInterpreting := Buffer;
  MaxRow :=
    (Length(BufferInterpreting) - SMARTStartPadding) div SMARTValueLength;
  for CurrentRow := 0 to MaxRow do
    if not IfValidSMARTAddToList(CalculateRow(CurrentRow)) then
      break;
  result := SMARTValueList;
end;

function TATABufferInterpreter.BufferToSMARTThresholdValueList(
  const Buffer: TSmallBuffer): TSMARTValueList;
const
  SMARTStartPadding = 2;
  SMARTValueLength = 12;
var
  CurrentRow: Integer;
begin
  SMARTValueList := TSMARTValueList.Create;
  BufferInterpreting := Buffer;
  for CurrentRow := 0 to
    (Length(BufferInterpreting) - SMARTStartPadding) div SMARTValueLength do
    IfValidSMARTThresholdAddToList(
      (CurrentRow * SMARTValueLength) + SMARTStartPadding);
  result := SMARTValueList;
end;

In this time, we talked about SMART values, and we would talk about NVMe next time. You may have questions about interpretation of each Ids of Naraeon SSD Tools. But it is a thing that can’t be easily explained, so what I can say is ‘each device have its’ own interpretation’.

Leave a Reply

Your email address will not be published. Required fields are marked *