标签 Go语言下的文章

谈谈Go语言：Map-Reduce

作者: XYZ
时间: 2022-01-03
分类: 默认分类,Go语言
2 条评论

导语

函数式编程的中非常重要的Map、Reduce、Filter的三种操作，这三种操作可以让我们非常方便灵活地进行一些数据处理——我们的程序中大多数情况下都是在到倒腾数据，尤其对于一些需要统计的业务场景，Map/Reduce/Filter是非常通用的玩法。

例子

Map示例
下面的程序代码中，我们写了两个Map函数，这两个函数需要两个参数：

一个是字符串数组 []string，说明需要处理的数据一个字符串
另一个是一个函数func(s string) string 或 func(s string) int

func MapStrToStr(arr []string, fn func(s string) string) []string {
    var newArray = []string{}
    for _, it := range arr {
        newArray = append(newArray, fn(it))
    }
    return newArray
}

func MapStrToInt(arr []string, fn func(s string) int) []int {
    var newArray = []int{}
    for _, it := range arr {
        newArray = append(newArray, fn(it))
    }
    return newArray
}

整个Map函数运行逻辑都很相似，函数体都是在遍历第一个参数的数组，然后，调用第二个参数的函数，然后把其值组合成另一个数组返回。
于是我们就可以这样使用这两个函数：

var list = []string{"Hao", "Chen", "MegaEase"}

x := MapStrToStr(list, func(s string) string {
    return strings.ToUpper(s)
})
fmt.Printf("%v\n", x)
//["HAO", "CHEN", "MEGAEASE"]

y := MapStrToInt(list, func(s string) int {
    return len(s)
})
fmt.Printf("%v\n", y)
//[3, 4, 8]

我们可以看到，我们给第一个 MapStrToStr() 传了函数做的是转大写，于是出来的数组就成了全大写的，给MapStrToInt() 传的是算其长度，所以出来的数组是每个字符串的长度。
我们再来看一下Reduce和Filter的函数是什么样的。

Reduce示例

func Filter(arr []int, fn func(n int) bool) []int {
    var newArray = []int{}
    for _, it := range arr {
        if fn(it) {
            newArray = append(newArray, it)
        }
    }
    return newArray
}

var intset = []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
out := Filter(intset, func(n int) bool {
   return n%2 == 1
})
fmt.Printf("%v\n", out)

out = Filter(intset, func(n int) bool {
    return n > 5
})
fmt.Printf("%v\n", out)

Filter示例

func Filter(arr []int, fn func(n int) bool) []int {
    var newArray = []int{}
    for _, it := range arr {
        if fn(it) {
            newArray = append(newArray, it)
        }
    }
    return newArray
}

var intset = []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
out := Filter(intset, func(n int) bool {
   return n%2 == 1
})
fmt.Printf("%v\n", out)

out = Filter(intset, func(n int) bool {
    return n > 5
})
fmt.Printf("%v\n", out)

下图是一个比喻，其非常形象地说明了Map-Reduce是的业务语义，其在数据处理中非常有用。
map-reduce例子.png

业务示例
通过上面的一些示例，你可能有一些明白，Map/Reduce/Filter只是一种控制逻辑，真正的业务逻辑是在传给他们的数据和那个函数来定义的。是的，这是一个很经典的“业务逻辑”和“控制逻辑”分离解耦的编程模式。下面我们来看一个有业务意义的代码，来让大家强化理解一下什么叫“控制逻辑”与业务逻辑分离。

首先，我们一个员工对象，以及一些数据。

type Employee struct {
    Name     string
    Age      int
    Vacation int
    Salary   int
}

var list = []Employee{
    {"Hao", 44, 0, 8000},
    {"Bob", 34, 10, 5000},
    {"Alice", 23, 5, 9000},
    {"Jack", 26, 0, 4000},
    {"Tom", 48, 9, 7500},
    {"Marry", 29, 0, 6000},
    {"Mike", 32, 8, 4000},
}

泛型Map-Reduce

我们可以看到，上面的Map-Reduce都因为要处理数据的类型不同而需要写出不同版本的Map-Reduce，虽然他们的代码看上去是很类似的。所以，这里就要带出来泛型编程了，Go语言在本文写作的时候还不支持泛型（注：Go开发团队技术负责人Russ Cox在2012年11月21golang-dev上的mail确认了Go泛型(type parameter)将在Go 1.18版本落地，即2022.2月份）。

简单版 Generic Map
所以，目前的Go语言的泛型只能用 interface{} + reflect来完成，interface{} 可以理解为C中的 void*，Java中的 Object ，reflect是Go的反射机制包，用于在运行时检查类型。
下面我们来看一下一个非常简单不作任何类型检查的泛型的Map函数怎么写。

func Map(data interface{}, fn interface{}) []interface{} {
    vfn := reflect.ValueOf(fn)
    vdata := reflect.ValueOf(data)
    result := make([]interface{}, vdata.Len())

    for i := 0; i < vdata.Len(); i++ {
        result[i] = vfn.Call([]reflect.Value{vdata.Index(i)})[0].Interface()
    }
    return result
}

上面的代码中

通过 reflect.ValueOf() 来获得 interface{} 的值，其中一个是数据 vdata，另一个是函数 vfn；
然后通过 vfn.Call() 方法来调用函数，通过 []refelct.Value{vdata.Index(i)}来获得数据。

Go语言中的反射的语法还是有点令人费解的，但是简单看一下手册还是能够读懂的。我这篇文章不讲反射，所以相关的基础知识还请大家自行Google相关的教程。
于是，我们就可以有下面的代码——不同类型的数据可以使用相同逻辑的Map()代码。

square := func(x int) int {
  return x * x
}
nums := []int{1, 2, 3, 4}

squared_arr := Map(nums,square)
fmt.Println(squared_arr)
//[1 4 9 16]

upcase := func(s string) string {
  return strings.ToUpper(s)
}
strs := []string{"Hao", "Chen", "MegaEase"}
upstrs := Map(strs, upcase);
fmt.Println(upstrs)
//[HAO CHEN MEGAEASE]

但是因为反射是运行时的事，所以，如果类型什么出问题的话，就会有运行时的错误。比如：

x := Map(5, 5)
fmt.Println(x)

上面的代码可以很轻松的编译通过，但是在运行时就出问题了，还是panic错误……

panic: reflect: call of reflect.Value.Len on int Value

goroutine 1 [running]:
reflect.Value.Len(0x10b5240, 0x10eeb58, 0x82, 0x10716bc)
        /usr/local/Cellar/go/1.15.3/libexec/src/reflect/value.go:1162 +0x185
main.Map(0x10b5240, 0x10eeb58, 0x10b5240, 0x10eeb60, 0x1, 0x14, 0x0)
        /Users/chenhao/.../map.go:12 +0x16b
main.main()
        /Users/chenhao/.../map.go:42 +0x465
exit status 2

健壮版的Generic Map

所以，如果要写一个健壮的程序，对于这种用interface{} 的“过度泛型”，就需要我们自己来做类型检查。下面是一个有类型检查的Map代码：

func Transform(slice, function interface{}) interface{} {
  return transform(slice, function, false)
}

func TransformInPlace(slice, function interface{}) interface{} {
  return transform(slice, function, true)
}

func transform(slice, function interface{}, inPlace bool) interface{} {
 
  //check the <code data-enlighter-language="raw" class="EnlighterJSRAW">slice</code> type is Slice
  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("transform: not slice")
  }

  //check the function signature
  fn := reflect.ValueOf(function)
  elemType := sliceInType.Type().Elem()
  if !verifyFuncSignature(fn, elemType, nil) {
    panic("trasform: function must be of type func(" + sliceInType.Type().Elem().String() + ") outputElemType")
  }

  sliceOutType := sliceInType
  if !inPlace {
    sliceOutType = reflect.MakeSlice(reflect.SliceOf(fn.Type().Out(0)), sliceInType.Len(), sliceInType.Len())
  }
  for i := 0; i < sliceInType.Len(); i++ {
    sliceOutType.Index(i).Set(fn.Call([]reflect.Value{sliceInType.Index(i)})[0])
  }
  return sliceOutType.Interface()

}

func verifyFuncSignature(fn reflect.Value, types ...reflect.Type) bool {

  //Check it is a funciton
  if fn.Kind() != reflect.Func {
    return false
  }
  // NumIn() - returns a function type's input parameter count.
  // NumOut() - returns a function type's output parameter count.
  if (fn.Type().NumIn() != len(types)-1) || (fn.Type().NumOut() != 1) {
    return false
  }
  // In() - returns the type of a function type's i'th input parameter.
  for i := 0; i < len(types)-1; i++ {
    if fn.Type().In(i) != types[i] {
      return false
    }
  }
  // Out() - returns the type of a function type's i'th output parameter.
  outType := types[len(types)-1]
  if outType != nil && fn.Type().Out(0) != outType {
    return false
  }
  return true
}

上面的代码一下子就复杂起来了，可见，复杂的代码都是在处理异常的地方。我不打算Walk through 所有的代码，别看代码多，但是还是可以读懂的，下面列几个代码中的要点：

代码中没有使用Map函数，因为和数据结构和关键有含义冲突的问题，所以使用Transform，这个来源于 C++ STL库中的命名。
有两个版本的函数，一个是返回一个全新的数组 – Transform()，一个是“就地完成” – TransformInPlace()；
在主函数中，用 Kind() 方法检查了数据类型是不是 Slice，函数类型是不是Func；
检查函数的参数和返回类型是通过 verifyFuncSignature() 来完成的，其中：NumIn() – 用来检查函数的“入参”，NumOut() 用来检查函数的“返回值”；
如果需要新生成一个Slice，会使用 reflect.MakeSlice() 来完成。
好了，有了上面的这段代码，我们的代码就很可以很开心的使用了：

可以用于字符串数组：

list := []string{"1", "2", "3", "4", "5", "6"}
result := Transform(list, func(a string) string{
    return a +a +a
})
//{"111","222","333","444","555","666"}

可以用于整形数组：

list := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
TransformInPlace(list, func (a int) int {
  return a*3
})
//{3, 6, 9, 12, 15, 18, 21, 24, 27}

可以用于结构体：

var list = []Employee{
    {"Hao", 44, 0, 8000},
    {"Bob", 34, 10, 5000},
    {"Alice", 23, 5, 9000},
    {"Jack", 26, 0, 4000},
    {"Tom", 48, 9, 7500},
}

result := TransformInPlace(list, func(e Employee) Employee {
    e.Salary += 1000
    e.Age += 1
    return e
})

健壮版的 Generic Reduce
同样，泛型版的 Reduce 代码如下：

func Reduce(slice, pairFunc, zero interface{}) interface{} {
  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("reduce: wrong type, not slice")
  }

  len := sliceInType.Len()
  if len == 0 {
    return zero
  } else if len == 1 {
    return sliceInType.Index(0)
  }

  elemType := sliceInType.Type().Elem()
  fn := reflect.ValueOf(pairFunc)
  if !verifyFuncSignature(fn, elemType, elemType, elemType) {
    t := elemType.String()
    panic("reduce: function must be of type func(" + t + ", " + t + ") " + t)
  }

  var ins [2]reflect.Value
  ins[0] = sliceInType.Index(0)
  ins[1] = sliceInType.Index(1)
  out := fn.Call(ins[:])[0]

  for i := 2; i < len; i++ {
    ins[0] = out
    ins[1] = sliceInType.Index(i)
    out = fn.Call(ins[:])[0]
  }
  return out.Interface()
}

健壮版的 Generic Filter

同样，泛型版的 Filter 代码如下（同样分是否“就地计算”的两个版本）：
func Filter(slice, function interface{}) interface{} {
  result, _ := filter(slice, function, false)
  return result
}

func FilterInPlace(slicePtr, function interface{}) {
  in := reflect.ValueOf(slicePtr)
  if in.Kind() != reflect.Ptr {
    panic("FilterInPlace: wrong type, " +
      "not a pointer to slice")
  }
  _, n := filter(in.Elem().Interface(), function, true)
  in.Elem().SetLen(n)
}

var boolType = reflect.ValueOf(true).Type()

func filter(slice, function interface{}, inPlace bool) (interface{}, int) {

  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("filter: wrong type, not a slice")
  }

  fn := reflect.ValueOf(function)
  elemType := sliceInType.Type().Elem()
  if !verifyFuncSignature(fn, elemType, boolType) {
    panic("filter: function must be of type func(" + elemType.String() + ") bool")
  }

  var which []int
  for i := 0; i < sliceInType.Len(); i++ {
    if fn.Call([]reflect.Value{sliceInType.Index(i)})[0].Bool() {
      which = append(which, i)
    }
  }

  out := sliceInType

  if !inPlace {
    out = reflect.MakeSlice(sliceInType.Type(), len(which), len(which))
  }
  for i := range which {
    out.Index(i).Set(sliceInType.Index(which[i]))
  }

  return out.Interface(), len(which)
}

后记

还有几个未尽事宜：

使用反射来做这些东西，会有一个问题，那就是代码的性能会很差。所以，上面的代码不能用于你需要高性能的地方。怎么解决这个问题，我们会在本系列文章的下一篇文章中讨论。
上面的代码大量的参考了 Rob Pike的版本，他的代码在 https://github.com/robpike/filter
其实，在全世界范围内，有大量的程序员都在问Go语言官方什么时候在标准库中支持 Map/Reduce，Rob Pike说，这种东西难写吗？还要我们官方来帮你们写么？这种代码我多少年前就写过了，但是，我从来一次都没有用过，我还是喜欢用“For循环”，我觉得你最好也跟我一起用“For循环”。

我个人觉得，Map/Reduce在数据处理的时候还是很有用的，Rob Pike可能平时也不怎么写“业务逻辑”的代码，所以，对他来说可能也不太了解业务的变化有多么的频繁……

当然，好还是不好，由你来判断，但多学一些编程模式是对自己的帮助也是很有帮助的。

谈谈Go语言：函数式选项模式

作者: XYZ
时间: 2021-12-25
分类: Go语言
2 条评论

导语

今天，我们来探讨下Go编程模式-函数式选项模式，英文叫Functional Optional。这是一个函数式编程的应用案例，目前在Go语言中比较流行的一种编程模式。某些业务对象在初始化时，如果属性多，大部分属性没有默认值，初始化的函数会非常冗长，代码也不好维护，下面就来看下函数式选项模式（Functional Optional）怎么解决这个问题。

业务对象的初始化

在实际的业务开发过程中，我们会经常性地需要对一个对象（或是业务实体）进行初始化。比如下面这个业务实体Company，又很多待初始化的列表：

type Company struct {
    Name      string
    ID        string
    Owner     string
    Addr      string
    Industry  string
    Product   string
    License   string
    Date      string
}

针对上面业务实体Company，有不少的属性字段（随便举的例子哈，不一定真实，用来理解下面的编程模式）

Company下必填的属性是Name和ID，除了这两个之外，其他都是选填；
剩下的字段都是选填，必须有默认值；

我们需要有多种不同的创建 Company 的函数签名，如下所示：

func NewDefaultCompany(name string, id string) (*Company, error) {
    return &Company{name, id, "none", "china", "all", "none", "none", "none"}, nil
}

func NewCompanyWithOwner(name string, id string, owner string) (*Company, error) {
    return &Company{name, id, owner, "china", "all", "none", "none", "none"}, nil
}

func NewCompanyWithIndustry(name string, id string, owner string, industry string) (*Company, error) {
    return &Company{name, id, owner, "china", industry, "none", "none", "none"}, nil
}

func NewCompanyFull(name string, id string, owner string, addr string, industry string, product string, license string, date string) (*Company, error) {
    return &Company{name, id, owner, addr, industry, product, license, date}, nil
}

因为Go语言不支持重载，所以看到实际研发过程中，就会产生过多的签名，久而久之，代码的维护成本，需要初始化业务对象时，还得仔细看，究竟使用哪一个比较合理，或者需要再增加一个函数签名。

显然，上面的编码风格是很差的，要解决上面的问题，最常见的方法就是再抽象一个配置对象，比如：

type Config struct {
    Owner     string
    Addr      string
    Industry  string
    Product   string
    License   string
    Date      string
}

我们把那些非必输的选项都移到一个结构体里，于是 Company 对象变成了：

type Company struct {
    Name      string
    ID        string
    Conf      *Config
}

创建 Company 对象的函数签名可以收敛为一个：

func NewCompany(name string, id string, conf *Config) (*Company, error) {
    //...
}

这段代码算是不错了，大多数情况下，我们可能就止步于此了。但是，对于有洁癖的有追求的程序员来说，他们能看到其中有一点不好的是，Config 并不是必需的，所以，你需要判断是否是 nil 或是 Empty – Config{}这让我们的代码感觉还是有点不是很干净。

Builder模式

Builder设计模式是可以解决上面引入 Config 之后仍然存在的问题，引入Builder模式后，写出来的代码如下：

Company company = new Company.Builder()
  .name("ABC")
  .id("1")
  .owner("zhang san")
  .addr("china")
  .industry("all")
  .product("none")
  .license("1")
  .date("2020.2.2")
  .build();

仿照上面这个模式，我们可以把上面代码改写成如下的代码（注：下面的代码没有考虑出错处理，其中关于出错处理的更多内容，请参看《谈谈Go语言：出错处理》）

type CompanyBuilder struct {
    Company
}
func (cb *CompanyBuilder) Create(name string, id string) *CompanyBuilder {
    cb.Company.Name = name
    cb.Company.ID = id
    return cb
}
func (cb *CompanyBuilder) WithOwner(owner string) *CompanyBuilder {
    cb.Company.Owner = owner
    return cb
}
func (cb *CompanyBuilder) WithIndustry(industry string) *CompanyBuilder {
    cb.Company.Industry = industry
    return cb
}

// 其他属性类似实现
……

func (cb *CompanyBuilder) Build() (Company) {
    return cb.Company
}

如此，就可以使用最上面的写法去初始化 Company 对象了，上面这样的方式也很清楚，不需要额外的 Config 类，使用链式的函数调用的方式来构造一个对象，只需要多加一个 CompanyBuilder 类，这个 CompanyBuilder 类似乎有点多余，我们似乎可以直接在 Company 上进行这样的 CompanyBuilder 构造，的确是这样的。但是在处理错误的时候可能就有点麻烦（需要为 Company 结构增加一个 error 成员，破坏了 Company 结构体的“纯洁”），不如一个包装类更好一些。

如果我们想省掉这个包装的结构体，那么就轮到我们的Functional Options上场了，函数式编程。

Functional Option

首先，定义一个函数类型：

type Option func(*Company)

然后，使用函数式编程的方式定义一组这样的初始化函数：

func Owner(owner string) Option {
    return func(c *Company) {
        c.Owner = owner
    }
}

func Addr(addr string) Option {
    return func(c *Company) {
        c.Addr = addr
    }
}

func Industry(industry string) Option {
    return func(c *Company) {
        c.Industry = industry
    }
}

func Product(product string) Option {
    return func(c *Company) {
        c.Product = product
    }
}

func License(license string) Option {
    return func(c *Company) {
        c.License = license
    }
}

func Date(date string) Option {
    return func(c *Company) {
        c.Date = date
    }
}

上面这组代码传入一个参数，然后返回一个函数，返回的这个函数会设置自己的 Company 参数。例如：当我们调用其中的一个函数用 Owner(“li si”) 时，其返回值是一个 func(c* Company) { c.Owner = “li si” } 的函数。

好了，现在我们再定一个 NewCompany()的函数，其中，有一个可变参数 options 其可以传出多个上面上的函数，然后使用一个for-loop来设置我们的 Company 对象。

func NewCompany(name string, id string, options ...func(*Company)) (*Company, error) {
    company := Company{
        Name:     name,
        ID:       id,
        Owner:    "zhang san",
        Addr:     "china",
        Industry: "all",
        Product:  "none",
        License: "",
        Date: "2020.2.2",
    }

    for _, option := range options {
        option(&company)
    }

    //...
    return &company, nil
}

于是，创建一个 Company 对象的时候，使用方法如下：

company,_ := NewCompany("ABC", "1", Owner("li si"))

怎么样，是不是高度的整洁和优雅？不但解决了使用 Config 对象方式的需要有一个config参数，但在不需要的时候，是放 nil 还是放 Config{}的选择困难，也不需要引用一个 CompanyBuilder 的控制对象，直接使用函数式编程的试，在代码阅读上也很优雅。

所以，以后，大家在要玩类似的代码时，强烈推荐使用Functional Options这种方式，这种方式至少带来了如下的好处：

直觉式的编程
高度的可配置化
很容易维护和扩展
自文档，整个初始化的过程是按属性解耦的，不是按属性组合的
对于新来的人很容易上手
没有什么令人困惑的事（是nil 还是空）