01 June 2016

Login Forensics: Login History plus for auditing user logins

The Salesforce App Cloud platform has important auditing capabilities built in to ensure that you can focus on what's most important: your business. One of these foundational audit tools is Login History.  The Login History audit trail enables administrators to download the last six months of logins to the Force.com platform, either via a CSV download link in the setup user interface or via the API. With Login History, you can track login successes and failures by user, IP, application, API, or browser to name a few key attributes. In addition, Event Monitoring provides access to the Login log lines as well. As you can tell, we consider Login an important event to keep track of!

We're proud to announce the general availability of a premium add-on service on top of our Event Monitoring product line that goes beyond both the Login log line as well as Login History by tracking login information for ten years!

Here's a breakdown of how the three compare:

Login History Login Forensics Login Log Line
Data Duration until Deleted 6 months 10 years 30 days
Storage Oracle Hbase Oracle
Access Setup UI, API API only API Download only
Permissions Manage Users View Platform Events View Event Log Files
Extensibility No Yes, via Additional Information No
Packaging Included with every org Included with Event Monitoring add-on Included with Event Monitoring add-on
Name of sObject or File LoginHistory LoginEvent Login Event Type

How is it possible to store this critical data for so long? Salesforce recently adopted an open-source NoSQL database called HBase. HBase is the same database that we use to store up to 10 years of Field Audit Trail data.



Who cares? Well, I do. As does anyone who wants to maintain an audit trail of login information either for regulatory reasons or to track down anomalous login activity. For instance, imagine that a user always logs in from the same IP address, or during the same login hours, or using the same Chrome browser on Windows. Well, wouldn't it be strange if all of a sudden those behaviors changed over the course of a day, a week, a month, a year, or even a decade? 

All of this is possible with SOQL because of the HBase rowkeys we’ve defined. An HBase rowkey defines how we index these objects for fast queries. Imagine if you had to query a billion rows of LoginEvent records from the past decade in less than 120 seconds! That’s fast and furious query performance. 

The LoginEvent object, which stores the raw login data, has a rowkey consisting only of EventTime (in a descending sort) and the unique record id. And the PlatformEventMetric object, which stores the hourly roll-up metrics, has rowkey consisting of EventType and then EventTime (in a descending sort). These simple rowkeys enable fast response using standard SOQL. You just need to know the time frame you want to query and in the case of metrics, which metric and for which time frame.

SELECT Application, Browser, EventDate, Id, LoginUrl, UserId
FROM LoginEvent
WHERE EventDate>Yesterday
LIMIT 10

This works because EventDate is the first field in the rowkey and the sort works because of the way we store the rows in descending sort order. This is powerful for querying the last ten Login Events that happened in near real-time. 

It’s also powerful for integrating. You can create a polling app that queries every minute in the case of the raw events, and every hour in the case of the metrics, in order to easily integrate the last set of login data since the last query.

Alternatively, you can use the Asynchronous SOQL solution outlined in my previous blog post: Using Asynchronous SOQL with Event Monitoring.

Events are captured in near real-time. What does that mean? Well, there can be a minor delay from when the event occurred and when you can query it. If you want, you can self-monitor the near real-time nature of our events. If you take the average difference between the EventDate and the CreatedDate fields, you’ll see how near real-time your events have been captured.

Near Real Time Example

There's even the ability to introduce your own metadata into the login flow to further fingerprint user’s login profile and identify anomalies in the login process. We call it Additional Info. It's the ability to introduce your own data through a HTTP Header. This can be done via the browser, a proxy, or API authentication. For instance, you might want to register header name (e.g. "x-sfdc-addinfo-correlationid") and value (e.g. "d18c5a3f-4fba-47bd-bbf8-6bb9a1786624"). Then when you look at your login events, you just need to look for any logins that do not have this identifier to investigate further.



Finally, there's a transaction dye that's important to the process. Every Login Event can be traced back to a single Login History Id. This is useful for a couple of reasons. The first one is that Login History connects to Login Geo which captures geographical information like latitude and longitude of your users. As a result, you can use the composite API to orchestrate un-related queries in order to generate the location of every user onto a mapping service like Google Maps. Secondly, with each subsequent activity where the user interacts with data like looking at accounts, you'll be able to track each interaction back to a single Login on both the Login Event and Login History objects. For example, when tracking down which records were viewed from an API query (Data Leakage blog post where this is explained). And after six months, when Login History is deleted, you'll continue to be able to track every interaction back to a single login for nine and one half years more. So even if you login via your iPhone, your Nexus tablet, your Chrome browser on your Mac, and Salesforce for Outlook, we'll be able to separate each set of transactions and link them back to a single login for the next ten years.



All of the screen shots in this post can be recreated using the sample code found in my Github repo.

Login Forensics ushers in a new age of storing near real-time system generated user activity on the Salesforce platform. 

2 comments:

  1. Hi Adam,
    Thanks for providing this useful information.
    I have few questions:
    1) in https://help.salesforce.com/articleView?id=event_monitoring_faq.htm&type=0 it states that user access data includes statistics and ubnormal behavioral indicaors. Where this more advanced information is kept?
    2) Is the LoginHistoryId is used as a session ID?
    3) Other than the extended retention period and the more reasonable privilege required, what do you consider as the added value of LoginEvent over the good old LoginHistory?
    Appreciate your help,
    Uri Ben-Dor
    SkyFormation

    ReplyDelete
    Replies
    1. Hi Uri - great questions. Let me see if I can help:

      1) in https://help.salesforce.com/articleView?id=event_monitoring_faq.htm&type=0 it states that user access data includes statistics and ubnormal behavioral indicaors. Where this more advanced information is kept? [AT - Not sure I understand your question- is it a storage question or an attribute question?]
      2) Is the LoginHistoryId is used as a session ID? [AT - For all intensive purposes, yes. It's not your actual sID nor is it your actual session access token. However, it's a session ID in the form of a correlation of related rows or events that can be used to determine how it relates to other events and objects.]
      3) Other than the extended retention period and the more reasonable privilege required, what do you consider as the added value of LoginEvent over the good old LoginHistory? [AT - That's really the main value proposition. However, there are a number of other features in the work that will utilize information here for determining what a user did after the login. Those are in pilot right now.]

      Delete