02 November 2015

These are a few of my favorite things


Event Monitoring provides easily downloadable CSV(Comma Separated Value) log files. The value of this data is really in the insights that you gain from rolling up and aggregating the data based on questions like:
  1. how many accounts were accessed in the last week?
  2. how many reports were downloaded by a specific user in the last 6 months?
  3. how many total executions did I have in the past year?
  4. how big are the files that I need to store on an ongoing basis for regulatory reasons?
Normally you would spend time to integrate, enrich, and analyze log data using an analytics platform. We have a great app ecosystem of ISV (Independent Software Vendors) that provide pre-built insights into this data.

But sometimes you just want a quick and easy way to answer this data without first creating an integration or enriching the data. In particular, you may want to analyze the data when it ranges over extremely long periods of time and significant volumes of data that you may not already have integrated into your analytics platform of choice. This actually came up in conversation with a customer administrator who told me that he was downloading Event Log Files and using a command line tool called grep to ask simple questions about the data; in effect he was trying to find a needle in the log file haystack without using an analytics platform.

Event Log Files generate CSV log file content. As a result, it's easy to report on it using piping in a Unix based CLI (Command Line Interface or Terminal).  This is definitely not for the button click crowd but it's incredibly fast and efficient for processing large amounts of data quickly. And since it's a good idea to keep a source system of files for a long period of time, you can always write quick piping commands to answer ad-hoc questions that may come up where you don't want to first stage or munge the data into an analytics platform.

If you haven't ever worked with a CLI (Command Line Interface or Terminal), I recommend first reading this great blog post from Lifehacker or this DataVu blog post. Also, if you're a windows only user, I wouldn't really continue with this post unless your files are on a unix based system like Mac or Linux that you can SSH into.

Here a few of my favorite things when it comes to commands to CLI prompts:

Use case: login to the file server where you're keeping the source log files
Example Prompt: ssh username@boxname
Notes: you'll need a password which means if you don't own that server, you'll need to request access

Use case: navigate to the directory where the log files are for a specific date (assuming you're storing them by date rather than by type or something else)
Example Prompt: cd Documents/logs/2015-09-30
Notes: if you don't manage this server, you'll need to find out where the files are first

Use case: list the files based on last modified date
Example Prompt: ls -tr
Example Output:
ApexCallout-2015-09-30.csv
ApexExecution-2015-09-30.csv
ApexSoap-2015-09-30.csv
Notes: I love this command. It's super simple and assumes you're dealing with the latest change first.

Use case: list the files based on size sorted from largest to smallest
Example Prompt: ls -less | sort -n -r
Example Output:
4116672 -rw-r--r--  1 auser  SFDC\Domain Users  2107733241 Oct  1 21:43 ApexTrigger-2015-09-30.csv
3512792 -rw-r--r--  1 auser  SFDC\Domain Users  1798546132 Oct  1 21:53 UITracking-2015-09-30.csv
3437816 -rw-r--r--  1 auser  SFDC\Domain Users  1760159289 Oct  1 21:58 URI-2015-09-30.csv
Notes: This is really helpful for finding the size of files, for instance if you just want to view the smallest file as a sample data set.

Use case: view the entire file in the terminal
Example Prompt: cat PackageInstall-2015-09-30.csv
Example Output:
"EVENT_TYPE","TIMESTAMP","REQUEST_ID","ORGANIZATION_ID","USER_ID","RUN_TIME","CPU_TIME","CLIENT_IP","URI","OPERATION_TYPE","IS_SUCCESSFUL","IS_PUSH","IS_MANAGED","IS_RELEASED","PACKAGE_NAME","FAILURE_TYPE","TIMESTAMP_DERIVED","USER_ID_DERIVED"
"PackageInstall","20150930175506.083","4-YHvN7eC7w3t6H5TippD-","00D000000000062","0053000000BqwWz","175","","","","INSTALL","0","0","0","0","SOS Quick Setup & Dashboard","OTHER","2015-09-30T17:55:06.083Z","0053000000BqwWzAAJ"
"PackageInstall","20150930175953.848","4-YIEBVUUMqV2bH5Tiluk-","00D000000000062","0053000000BqwWz","96","","","","INSTALL","0","0","0","0","SOS Quick Setup
Dashboard","OTHER","2015-09-30T17:59:53.848Z","0053000000BqwWzAAJ"
Notes: This is really helpful for small files but can be overwhelming when you have a larger one unless you first reduce the output using a command like grep.

Use case: only view the first 10 lines of a file
Example Prompt: head -10 ApexTrigger-2015-09-30.csv
Example Output:
"EVENT_TYPE","TIMESTAMP","REQUEST_ID","ORGANIZATION_ID","USER_ID","RUN_TIME","CPU_TIME","CLIENT_IP","URI","REQUEST_STATUS","DB_TOTAL_TIME","TRIGGER_ID","TRIGGER_NAME","ENTITY_NAME","TRIGGER_TYPE","TIMESTAMP_DERIVED","USER_ID_DERIVED"
"ApexTrigger","20151101000000.414","408evS044zRUwrH5Tipt5-","00D000000000062","00530000003ffnX","","","","","","","01q3000000008Xn","OrderTrigger","Order","BeforeUpdate","2015-11-01T00:00:00.414Z","00530000003ffnXAAQ"
"ApexTrigger","20151101000000.496","408evS044zRUwrH5Tipt5-","00D000000000062","00530000003ffnX","","","","","","","01q3000000008Xn","OrderTrigger","Order","AfterUpdate","2015-11-01T00:00:00.496Z","00530000003ffnXAAQ"
Notes: This is really helpful for really large files when you want to quickly validate the file headers with some sample data without having to load all of the data to the terminal or first reduce the data using a command like grep.

Use case: only view the last 10 lines of a file
Example Prompt: tail -10 ApexTrigger-2015-09-30.csv
Example Output:
"ApexTrigger","20151101084720.661","4096gnLuAlCzC6H5Tilrk-","00D000000000062","00530000001fAyR","","","","","","","01q300000000014","eimContractTrigger","Contract","AfterUpdate","2015-11-01T08:47:20.661Z","00530000001fAyRAAU"
"ApexTrigger","20151101084720.661","4096gnLuAlCzC6H5Tilrk-","00D000000000062","00530000001fAyR","","","","","","","01q3000000007mq","RenewalsContractTrigger","Contract","AfterUpdate","2015-11-01T08:47:20.661Z","00530000001fAyRAAU"
Notes: This is really helpful for really large files to find when and where the file ends without having to load all of the data to there terminal. Several times, I had questions where the file ended and when the last events were processed for the day. Tail helps me to find out where the file and day ends.

Use case: find out how many total transactions you had for a day, sorted from highest to lowest
Example Prompt: wc -cl /Users/atorman/Documents/logs/2015-09-30/*.csv | sort -n -r 
Example Output:
31129864 9577494169 total
8904594 2107733241 /Users/auser/Documents/logs/2015-09-30/ApexTrigger-2015-09-30.csv
5655267 1760159289 /Users/auser/Documents/logs/2015-09-30/URI-2015-09-30.csv
Notes: This is really helpful for capacity planning or for quick questions around total transactions by file type or date. I'm often asked how many log lines were generated or how big the files are that we're collecting. This is also really easy to output to a CSV so you can provide the information to another person. Just change the prompt to wc -cl /Users/atorman/Documents/logs/ | sort -n -r >> lineCount.csv.

Use case: simple report on number of lines by file type (*.csv)
Example Prompt: du -a *.csv | sort -n -r 
Example Output:
1676752 UITracking-2015-11-01.csv
1094160 ApexExecution-2015-11-01.csv
554040 ApexSoap-2015-11-01.csv
443408 RestApi-2015-11-01.csv
276112 URI-2015-11-01.csv
Notes: Similar to wc -l, du helps with capacity planning and enables you to answer quick questions around total transactions for a file or event type.

Use case: simple report on size by file type (*.csv)
Example Prompt: du -hs *.csv | sort -n
Example Output:
1.4M ApexCallout-2015-11-01.csv
1.7M TimeBasedWorkflow-2015-11-01.csv
3.2M QueuedExecution-2015-11-01.csv
3.6M Logout-2015-11-01.csv
5.7M Report-2015-11-01.csv
Notes: Similar to wc -c, helps with capacity planning and to answer quick questions around total transactions for a file or event type.

Use case: merge multiple CSV files together into a new CSV
Example Prompt: Merge cat *.csv > new.csv
Notes: This is really helpful when merging multiple files of the same type (e.g. API) that span multiple days into a single CSV prior to loading to an analytics platform.

Use case: get all of the report export log lines for a specific user (e.g. '00530000000h51Z')
Example Prompt: grep -r '00530000000h51Z' 2015*/ReportExport*.csv
Example Output:
2015-10-13/ReportExport-2015-10-13.csv:"ReportExport","20151013205624.313","4-nPScfLnMsalbH5Tipnt-","00D000000000062","00530000000h51Z","","","102.14.229.01","/00O30000008ZXB4","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
Notes: I <3 grep! Grep is basically a simple search, and as a result, is a good tool for finding a needle in the haystack of log lines. Grep is really helpful for providing a quick audit report for your auditors on everything a specific user did. It's also really powerful to combine this with other commands by using a pipe (i.e. '|'). For instance, by adding '| wc -l' to the end of the above command, you can get the total number of reports the user exported instead of the specific report log lines. This is similar to performing a count() in SOQL and filtering by a specific user Id. Now you're using the command line for reporting purposes!

Use case: get all of the events where an account was accessed
Example Prompt: grep 'EVENT_TYPE\|/001' URI*.csv | head -10
Example Output:
"URI","20151101000017.742","408ewGnZuAVzcrH5Tipu4-","00D000000000062","00530000003jh6y","3597","695","102.14.229.01","/0013000001J4uxa","S","1801149015","8408","430","https-//na2-salesforce-com/00630000010Pxtw?srPos=0-srKp=006","2015-11-01T00:00:17.742Z","00530000003jh6yAAA"
Notes: This is really helpful for providing a quick audit report for your auditors on account access. This is also another good prompt to add '| wc -l' to in order to find out how many times a specific account was accessed.

Use case: convert timestamp from a number to a string prior to importing to a reporting application
Example Prompt: awk -F ','  '{ if(NR==1) printf("%s\n",$0); else{ for(i=1;i<=NF;i++) { if(i>1&& i<=NF) printf("%s",","); if(i == 2) printf "\"%s-%s-%sT%s:%s:%sZ\"", substr($2,2,4),substr($2,6,2),substr($2,8,2),substr($2,10,2),substr($2,12,2),substr($2,14,2); else printf ("%s",$i);  if(i==NF) printf("\n")}}}' "${eventTypes[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" > "${eventTypes[$i]}/${eventTypes[$i]}-${logDates[$i]}.csv"
Notes: this one takes a lot more work but is really helpful for transforming data before loading it into a system that has specific formatting requirements.

There are many more great utilities available on the command line, which when coupled with pipes (|) and shell scripts makes for an easy way to automate many simple tasks or perform ad-hoc queries against the raw log files.

And those are a few of my favorite things!

2 comments:

  1. Hi Adam,
    Thanks for all the useful information you provide.
    I have a high level question - seems like more and more log information is becoming accessible via the API and that's great. It becomes, however, unclear, which endpoint should be used to get each of the many events. To extract security related information we already extract data from SetupAuditTrail and LoginHistory. This however doesn't cover application data CRUD. If I understand correctly, the data leakage pilot should provide all read operations events (currently via API, in the future also via the GUI). Field history partially cover the create update and delete operations (partially since it is not applicable over all SObjects, limited to 20 fields per SObject, fields sized 255 ch and more and not tracked by value etc). Event monitoring - covers no create/update/delete operations, partially cover read operations, provide Apex callout information (outgoing communication) and upload/download/delete of files.
    Is that a proper responsibilities distribution (security perspective only) between the different endpoints? Is the data leakage also provide upload/download/delete file events?
    Highly appreciate your point of view here,
    Uri

    ReplyDelete
    Replies
    1. Hi Uri,

      I agree - it's a bit confusing. There are a lot of different options, many of which were created over a very long period of time. Setup Audit Trail and Login History have been around forever. However, they didn't meet every use case, so we introduced event monitoring logs which helped; however, that didn't meet all of the use cases. So we introduced data leakage to fill the gaps. Some of these audit capabilities are built into the price of the platform, others are add-ons due to the non-trivial cost of them. All of this is adds to the idea that Salesforce isn't a turnkey app but a solution to meet a variety of different business cases.

      Delete