diff --git a/docs/src/features.md b/docs/src/features.md
index 4f98bd2..1a36674 100644
--- a/docs/src/features.md
+++ b/docs/src/features.md
@@ -117,7 +117,7 @@ julia> using AbstractTrees; children(ts[1])
 ```
 
 ## Setting a Custom Objective Function
-Xgboost uses a second order approximation, so to provide a custom objective functoin first and
+XGBoost uses a second order approximation, so to provide a custom objective functoin first and
 second order derivatives must be provided, see the docstring of [`updateone!`](@ref) for more
 details.
 
@@ -148,7 +148,7 @@ bst = xgboost((X, y), ℓ′, ℓ″, max_depth=8)
 ```
 
 ## Caching Data From External Memory
-Xgboost can be used to cache memory from external memory on disk, see
+XGBoost can be used to cache memory from external memory on disk, see
 [here](https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html).  In the Julia
 wrapper this is facilitated by allowing a `DMatrix` to be constructed from any Julia iterator with
 [`fromiterator`](@ref).  The resulting `DMatrix` holds references to cache files which will have
diff --git a/docs/src/index.md b/docs/src/index.md
index 04d4038..a540b62 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -22,10 +22,13 @@ ŷ = predict(bst, X)
 
 
 using DataFrames
-df = DataFrame(randn(100,3), [:a, :b, :y])
+df = DataFrame(randn(200,3), [:a, :b, :y])
+
+train = DMatrix(df[1:150, [:a, :b]], df[1:150, :y])
+test = DMatrix(df[151:end, [:a, :b]], df[151:end, :y])
 
 # can accept tabular data, will keep feature names
-bst = xgboost((df[!, [:a, :b]], df.y))
+bst = xgboost(train, watchlist=Dict("train"=>train, "test"=>test), num_round=10, max_depth=3, η=0.1, objective="reg:squarederror")
 
 # display importance statistics retaining feature names
 importancereport(bst)
@@ -57,15 +60,13 @@ X = [0 missing 1
 isequal(DMatrix(X), x)  # nullity is preserved
 ```
 
-!!! note
-
-    `DMatrix` must allocate new arrays when fetching values from it.  One therefore should avoid
-    using `DMatrix` directly except with `XGBoost`; retrieving values from this object should be
-    considered useful mostly only for verification.
+`DMatrix` must allocate new arrays when fetching values from it.  One therefore should avoid
+using `DMatrix` directly except with `XGBoost`; retrieving values from this object should be
+considered useful mostly only for verification.
 
 
 ### Feature Naming and Tabular Data
-Xgboost supports the naming of features (i.e. columns of the feature matrix).  This can be useful
+XGBoost supports feature naming (i.e. names of columns of the feature matrix).  This can be useful
 for inspecting trained models.
 ```julia
 X = randn(10,3)
@@ -80,14 +81,9 @@ XGBoost.setfeaturenames!(dm, ["a", "b", "c"])  # can also set after construction
 `AbstractVector`s or a `DataFrame`).
 ```julia
 using DataFrames
-df = DataFrame(randn(10,3), [:a, :b, :c])
-
-y = randn(10)
-
-DMatrix(df, y)
 
-df[!, :y] = y
-DMatrix(df, :y)  # equivalent to DMatrix(df, y)
+df = DataFrame(randn(10,4), [:a, :b, :c, :y])
+dm = DMatrix(df, :y)  # equivalent to DMatrix(df[!, Not(:y)], df[!, :y])
 ```
 
 When constructing a `DMatrix` from a table the feature names will automatically be set to the names
@@ -134,7 +130,7 @@ this is always a `DMatrix` but arguments will be automatically converted.
 ### [Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html)
 Keyword arguments to `Booster` are xgboost model parameters.  These are described in detail
 [here](https://xgboost.readthedocs.io/en/stable/parameter.html) and should all be passed exactly as
-they are described in the main xgbosot documentation (in a few cases such as Greek letters we also
+they are described in the main xgboost documentation (in a few cases such as Greek letters we also
 allow unicode equivalents).
 
 ### Training
@@ -156,7 +152,7 @@ using Statistics
 mean(ŷ - y)/std(y)
 ```
 
-Xgboost expects `Booster`s to be initialized with training data, therefore there is usually no need
+XGBoost expects `Booster`s to be initialized with training data, therefore there is usually no need
 to define `Booster` separate from training.  A shorthand for the above, provided by
 [`xgboost`](@ref) is
 ```julia
diff --git a/src/booster.jl b/src/booster.jl
index eaa263b..36d57ba 100644
--- a/src/booster.jl
+++ b/src/booster.jl
@@ -396,14 +396,14 @@ end
     xgboost(data; num_round=10, watchlist=Dict(), kw...)
     xgboost(data, ℓ′, ℓ″; kw...)
 
-Creates an xgboost gradient booster object on training data `data` and runs `nrounds` of training.
+Creates an xgboost gradient booster object on training data `data` and runs `num_round` of training.
 This is essentially an alias for constructing a [`Booster`](@ref) with `data` and keyword arguments
-followed by [`update!`](@ref) for `nrounds`.
+followed by [`update!`](@ref) for `num_round`.
 
-`watchlist` is a dict the keys of which are strings giving the name of the data to watch
-and the values of which are [`DMatrix`](@ref) objects containing the data.
+`watchlist` is a Dict of form key=>[`DMatrix`](@ref) and is used to specify a data to evaluate a model on.
+If omitted `watchlist` will be initialized with the training data.
 
-All other keyword arguments are passed to [`Booster`](@ref).  With few exceptions these are model
+All other keyword arguments are passed to [`Booster`](@ref). With few exceptions these are model
 training hyper-parameters, see [here](https://xgboost.readthedocs.io/en/stable/parameter.html) for
 a comprehensive list.
 
@@ -412,9 +412,10 @@ See [`updateone!`](@ref) for more details.
 
 ## Examples
 ```julia
-(X, y) = (randn(100,3), randn(100))
+train = DMatrix(randn(100,3), randn(100))
+test = DMatrix(randn(100,3), randn(100))
 
-b = xgboost((X, y), 10, max_depth=10, η=0.1)
+b = xgboost(train, watchlist=Dict("train"=>train, "test"=>test), num_round=10, max_depth=5, η=0.1)
 
 ŷ = predict(b, X)
 ```