This reads in a small-ish wireshark dump in CSV format:
shark <- read.csv("https://gist.githubusercontent.com/hrbrmstr/42480de78a3ee1ed39bb/raw/ffe5c62ce91a8b40b77a7b12c28254e4b3c11325/smallshark.csv")
Here’s how to turn PCAPs into CSV: https://ask.wireshark.org/questions/2935/creating-a-csv-file-with-tshark
There are tons of fields you can add to packet dumps.
Take a look at it:
str(shark)
#> 'data.frame': 347 obs. of 8 variables:
#> $ frame.number: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ frame.time : Factor w/ 347 levels "Feb 9, 1970 02:04:19.354593000 EST",..: 1 2 3 4 5 6 7 8 9 10 ...
#> $ eth.src : Factor w/ 3 levels "52:54:00:12:34:56",..: 1 2 1 2 1 1 1 1 1 1 ...
#> $ eth.dst : Factor w/ 9 levels "01:00:5e:00:00:16",..: 9 6 7 9 3 1 1 3 3 1 ...
#> $ ip.src : Factor w/ 19 levels "","10.0.2.15",..: 1 1 2 3 1 2 2 1 1 2 ...
#> $ frame.len : int 42 64 342 590 90 54 54 90 90 54 ...
#> $ ip.dst : Factor w/ 23 levels "","10.0.2.15",..: 1 1 3 21 1 16 16 1 1 16 ...
#> $ ip.proto : int NA NA 17 17 NA 2 2 NA NA 2 ...
summary(shark)
#> frame.number frame.time
#> Min. : 1.0 Feb 9, 1970 02:04:19.354593000 EST: 1
#> 1st Qu.: 87.5 Feb 9, 1970 02:04:19.354615000 EST: 1
#> Median :174.0 Feb 9, 1970 02:04:19.360721000 EST: 1
#> Mean :174.0 Feb 9, 1970 02:04:19.360738000 EST: 1
#> 3rd Qu.:260.5 Feb 9, 1970 02:04:19.617335000 EST: 1
#> Max. :347.0 Feb 9, 1970 02:04:19.623168000 EST: 1
#> (Other) :341
#> eth.src eth.dst ip.src
#> 52:54:00:12:34:56:207 52:54:00:12:34:56:139 10.0.2.15 :176
#> 52:55:0a:00:02:02:138 52:55:0a:00:02:02:126 : 36
#> 52:55:0a:00:02:03: 2 ff:ff:ff:ff:ff:ff: 19 10.0.2.3 : 17
#> 52:55:0a:00:02:03: 18 131.253.34.240 : 14
#> 01:00:5e:00:00:fc: 12 134.170.184.137: 14
#> 33:33:00:01:00:03: 12 23.76.195.70 : 13
#> (Other) : 21 (Other) : 77
#> frame.len ip.dst ip.proto
#> Min. : 42.0 10.0.2.15 :134 Min. : 2.000
#> 1st Qu.: 54.0 : 36 1st Qu.: 6.000
#> Median : 66.0 10.0.2.3 : 17 Median : 6.000
#> Mean : 253.5 134.170.184.137: 17 Mean : 8.103
#> 3rd Qu.: 257.5 131.253.34.240 : 15 3rd Qu.: 6.000
#> Max. :1514.0 93.184.215.200 : 15 Max. :17.000
#> (Other) :113 NA's :36
It’s a data.frame (basically an Excel spreadsheet).
“Cool” plots are easier with ggplot2. You have no idea what ggplot2 is but perhaps this code can help a bit.
library(ggplot2)
library(dplyr) # you have no idea what this is but google is ur friend
dest <- count(shark, ip.dst)
str(dest)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 23 obs. of 2 variables:
#> $ ip.dst: Factor w/ 23 levels "","10.0.2.15",..: 1 2 3 4 5 6 7 8 9 10 ...
#> $ n : int 36 134 1 12 17 5 6 6 6 15 ...
ggplot(dest) +
geom_bar(aes(x=reorder(ip.dst, n), y=n), stat="identity") +
coord_flip() +
labs(title="Count of packets by destination address")
src <- count(shark, ip.src)
str(src)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 19 obs. of 2 variables:
#> $ ip.src: Factor w/ 19 levels "","10.0.2.15",..: 1 2 3 4 5 6 7 8 9 10 ...
#> $ n : int 36 176 1 17 3 6 6 6 14 14 ...
ggplot(src) +
geom_bar(aes(x=reorder(ip.src, n), y=n), stat="identity") +
coord_flip() +
labs(title="Count of packets by source address")
Histogram & density plots of packet lengths:
ggplot(shark, aes(x=frame.len)) +
geom_histogram(alpha=0.5, fill="steelblue")
ggplot(shark, aes(x=frame.len)) +
geom_density(alpha=0.5, fill="steelblue")
shark$sec <- format(as.POSIXct(as.character(shark$frame.time),
format="%b %d, %Y %H:%M:%OS EST"), "%H%M%S")
by_sec <- count(shark, sec)
by_sec$ts <- as.POSIXct(sprintf("1970-02-09 %s", by_sec$sec), format="%Y-%m-%d %H%M%S")
ggplot(by_sec, aes(x=ts, y=n)) +
geom_line(size=0.25) +
geom_point(size=0.75)